How Vision and Hearing Work Together in Speech Perception
Author: Association for Psychological Science
Published: 2009/02/11 - Updated: 2026/02/01
Publication Details: Peer-Reviewed, Research, Study, Analysis
Category Topic: Deaf Communication - Related Publications
Page Content: Synopsis - Introduction - Main - Insights, Updates
Synopsis: This research from the Association for Psychological Science, published in Current Directions in Psychological Science, examines the multisensory nature of human speech perception. The information proves valuable for clinicians, educators, and individuals with hearing differences because it demonstrates that speech comprehension relies not solely on auditory input but on integrated visual signals from lip movements, tongue position, and facial features. The research is authoritative due to its peer-reviewed status and its explanation of the McGurk Effect, which shows how visual and auditory speech signals merge so completely that they cannot be consciously separated. This understanding has direct clinical applications for rehabilitation programs treating autism, brain injury, and schizophrenia, while also helping people with hearing loss maximize their communication abilities through enhanced visual speech strategies - Disabled World (DW).
- Definition: McGurk Effect
The McGurk effect is a perceptual phenomenon where what you see overrides what you hear when processing speech. It happens when there's a mismatch between auditory and visual information - like when you hear the syllable "ba" while watching someone's lips form "ga," and your brain splits the difference, making you perceive "da" instead. Named after Harry McGurk and his research assistant John MacDonald, who discovered it in 1976, the effect demonstrates that speech perception isn't just about sound waves hitting your eardrums. Your brain automatically integrates visual cues from a speaker's mouth movements with the acoustic signal, and when these sources conflict, vision often wins. You can know the effect is happening and still fall for it - watching the same video clip repeatedly won't stop your brain from fusing the conflicting information into a unified (but incorrect) percept. It's a powerful reminder that we don't experience the world directly; instead, our brains construct reality by combining information from multiple senses, and sometimes that construction process reveals itself in strange ways.
Introduction
Multiple Senses Are Used in Speech Perception
When someone speaks to you, do you see what they are saying? We tend to think of speech as being something we hear, but recent studies suggest that we use a variety of senses for speech perception - that the brain treats speech as something we hear, see and even feel.
In a new report in Current Directions in Psychological Science, a journal of the Association for Psychological Science, psychologist Lawrence Rosenblum describes research examining how our different senses blend together to help us perceive speech.
We receive a lot of our speech information via visual cues, such as lip-reading, and this type of visual speech occurs throughout all cultures. And it is not just information from lips - when someone is speaking to us, we will also note movements of the teeth, tongue and other non-mouth facial features.
Main Content
It's likely that human speech perception has evolved to integrate many senses together. Put in another way, speech is not meant to be just heard, but also to be seen.
The McGurk Effect is a well-characterized example of the integration between what we see and what we hear when someone is speaking to us. This phenomenon occurs when a sound (such as a syllable or word) is dubbed with a video showing a face making a different sound. For example, the audio may be playing "ba", while the face looks as though it is saying "va". When confronted with this, we will usually hear "va" or a combination of the two sounds, such as "da".
Interestingly, when study participants are aware of the dubbing or told to concentrate only on the audio, the McGurk Effect still occurs. Rosenblum suggests that this is evidence that once senses are integrated together, it is not possible to separate them.
Recent studies indicate that this integration occurs very early in the speech process, even before phonemes (the basic units of speech) are established.
Rosenblum suggests that physical movement of speech (that is, our mouths and lips moving) create acoustic and visual signals which have a similar form. He argues that as far as the speech brain is concerned, the auditory and visual information are never really separate. This could explain why we integrate speech so readily and in such a way that the audio and visual speech signals become indistinguishable from one another.
Rosenblum concludes that visual-speech research has a number of clinical implications, especially in the areas of autism, brain injury and schizophrenia and that "rehabilitation programs in each of these domains have incorporated visual-speech stimuli."
Insights, Analysis, and Developments
Editorial Note: The recognition that speech operates as a fundamentally multisensory experience rather than a purely auditory phenomenon challenges long-held assumptions about communication and disability. For individuals navigating hearing loss, this research validates what many have intuitively understood: watching a speaker's face isn't compensatory behavior but an integral part of how all humans process language. The clinical implications extend beyond rehabilitation settings, suggesting that educational approaches, assistive technologies, and even everyday communication strategies should acknowledge and support the visual dimensions of speech. As our understanding of neurological conditions like autism and schizophrenia continues to grow, these findings about sensory integration offer promising pathways for developing more effective therapeutic interventions that work with the brain's natural processing systems rather than against them - Disabled World (DW).Attribution/Source(s): This peer reviewed publication was selected for publishing by the editors of Disabled World (DW) due to its relevance to the disability community. Originally authored by Association for Psychological Science and published on 2009/02/11, this content may have been edited for style, clarity, or brevity.