How Lip-Reading Errors Happen, Revealed by Network Science
Author: University of Kansas
Published: 30 Jun 2026
Publication Details: Peer-Reviewed | Research, Study, Analysis
Contents: Synopsis - Definition - Introduction - Main - Insights, Updates - Related Publications
Synopsis: This research, published in the peer-reviewed Journal of the Acoustical Society of America, uses network science to map roughly 20,000 English words by how they look on the mouth rather than how they sound, helping to explain why lip-reading goes wrong and why certain words are so easily confused. The findings carry real value for people who are deaf or hard of hearing, for seniors coping with age-related hearing loss, and for the trainers and technologists who support them, since understanding the pattern of errors points toward better lip-reading instruction and toward speech-to-text systems that could read faces as well as listen.*
At a Glance
- 1 - When spoken, about a third of words in English look like at least one other word.
- 2 - People are more likely to mistake a word for another word that is used more commonly.
- 3 - Words such as vet, fit and fuzz sound nothing alike yet look the same on the lips, while kit, cat and cut both sound similar and look similar.
- Topic Definition: Lip Reading
Lip reading, also called speechreading, is the skill of understanding spoken language by watching the movements of a speaker's lips, jaw, tongue and facial expressions instead of relying on sound. It is widely used by people who are deaf or hard of hearing, including many seniors with age-related hearing loss, and it depends on recognizing visemes, the visible mouth shapes that correspond to speech sounds. Because many different sounds can produce nearly identical mouth shapes, lip reading is far harder than most people assume and is usually combined with context, residual hearing and other cues to fill in the gaps.
Introduction
Study Reveals What People Really See When They Read Lips
New research from the University of Kansas uses network science to determine why people make mistakes when lip reading.
Michael Vitevitch, professor of speech-language-hearing at KU, and his co-authors created a visual map of around 20,000 words in English, hoping to better grasp why some words are more difficult to lip-read than others.
The results appear in the Journal of the Acoustical Society of America. Findings could improve training for lip readers and boost the capacity for artificial intelligence to read lips and provide transcription and other digital services.
"What we looked at in this study is how people basically read lips, how accurate they are and, more specifically, what kinds of mistakes they make," Vitevitch said. "A lot of previous work looked at how accurate people were and didn't necessarily look at the characteristics of the errors themselves. There's a lot to be learned from the mistakes you make, and that was the approach we took."
Main Content
While previous work on lip reading examined errors, much of that research was done by spoken-language researchers who focused on phonemes - the sounds in a language - and on how close participants were to the word as it sounds.
Vitevitch took a different approach.
"We focused on the visual characteristics," he said. "Instead of looking at how many sounds of the word people got, we looked at how many of the visual characteristics, which we call 'visemes' (the visual equivalent of a phoneme), they got. We focused on what you're getting from the lips, jaw and mouth without using auditory sound. You're just trying to get the information from what you're seeing."
"How does that sound look when it's spoken? We don't care what it sounds like; we care about how it looks when it's spoken," he said. "Sometimes words sound similar and look similar, such as 'kit,' 'cat' and 'cut.' Other times words don't sound alike but still look similar like 'vet,' 'fit' and 'fuzz.' In both cases if you're just looking at my face, you couldn't tell one word from the other."
Through analysis of the word map, researchers determined that people are more likely to mistake a word for another word used more commonly; that, when spoken, about a third of words in English look like at least one other word; that if a word has many visual look-alikes, it is consistently harder to lip-read; and that lip-reading mistakes do not happen randomly, but are more likely when visually similar words occupy the same region in the visual network.
"One surprise was that people aren't that good at this," Vitevitch said. "We think we are, but we're really not. Most of the errors show that you're one or two visual characteristics - one or two visemes - off. You're getting a good amount of it, but perhaps not enough to get by."
The researchers' visual map allowed them to understand how words are distributed throughout the landscape, according to Vitevitch. In the map, words were close when they looked similar and farther apart when the words appeared visually unalike.
"Certain areas become more compressed than you might expect," he said. "The landscape stretches and compresses in ways we hadn't anticipated. That stretching and compression has implications for how accurate you're going to be when trying to lip-read. Does it give you more competitors than you would otherwise have? Or does it move things farther apart and make them more perceptually distinct?"
The KU researcher said his group hopes to move into lip-reading training.
"The idea is that if you track people's errors over time, those errors should start shrinking toward the target word," Vitevitch said. "Instead of being far away, people begin picking up the information they need and making more accurate guesses."
An additional application of the research is in training automatic transcription.
"Systems such as Zoom already do a reasonable job transcribing speech," Vitevitch said. "Could they do better if they used not only audio but also visual information from a speaker's face? Computers are very good at finding patterns, and sometimes they're the same patterns humans use. We may be able to train computers to do things in a more humanlike way."
Vitevitch said his group will continue to follow up on this work in different ways.
"We're continuing to explore how people do this, potentially moving toward machine-learning applications and finding ways to help people who need assistance understanding speech," he said.
Vitevitch's co-authors were KU graduate students Maia Flynn and Reid Kelly, along with Lorin Lachs of California State University, Fresno.
Insights, Analysis, and Developments
Editorial Note: What makes this work worth a second look is that it treats lip-reading as a problem of geometry rather than guesswork, charting where words sit in a vast visual landscape and showing that the spots most crowded with look-alikes are exactly where readers stumble. That shift in thinking is quietly practical, because it suggests lip-reading can be taught more deliberately and that the automatic captions millions now rely on every day might one day grow sharper by watching a speaker's face as closely as they follow a speaker's voice.*Attribution/Source(s): This peer reviewed publication was selected for publishing by the editors of Disabled World (DW) due to its relevance to the disability community. Originally authored by University of Kansas and published on 30 Jun 2026, this content may have been edited for style, clarity, or brevity.
* Editorial additions by Ian C. Langtree.