Print Page

AI Headphones Can Isolate a Single Person's Voice in a Crowd

Author: University of Washington
Published: 2024/05/23
Publication Type: Reports & Proceedings
Topic: Hearing Aids and Devices (Publications Database)

Page Content: Synopsis Introduction Main Item

Synopsis: Target Speech Hearing cancels all other sounds in the environment and plays just the speaker's voice even as the listener moves around in noisy places and no longer faces the speaker.

With our devices you can now hear a single speaker clearly even if you are in a noisy environment with lots of other people talking.

To use the system, a person wearing off-the-shelf headphones fitted with microphones taps a button while directing their head at someone talking.

Introduction

Noise-canceling headphones have gotten very good at creating an auditory blank slate. But allowing certain sounds from a wearer's environment through the erasure still challenges researchers. The latest edition of Apple's AirPods Pro, for instance, automatically adjusts sound levels for wearers - sensing when they're in conversation, for instance - but the user has little control over whom to listen to or when this happens.

Main Item

A University of Washington team has developed an artificial intelligence system that lets a user wearing headphones look at a person speaking for three to five seconds to "enroll" them. The system, called "Target Speech Hearing," then cancels all other sounds in the environment and plays just the enrolled speaker's voice in real time even as the listener moves around in noisy places and no longer faces the speaker.

The team presented its findings May 14 in Honolulu at the ACM CHI Conference on Human Factors in Computing Systems. The code for the proof-of-concept device is available for others to build on. The system is not commercially available.

"We tend to think of AI now as web-based chatbots that answer questions," said senior author Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering. "But in this project, we develop AI to modify the auditory perception of anyone wearing headphones, given their preferences. With our devices you can now hear a single speaker clearly even if you are in a noisy environment with lots of other people talking."

To use the system, a person wearing off-the-shelf headphones fitted with microphones taps a button while directing their head at someone talking. The sound waves from that speaker's voice then should reach the microphones on both sides of the headset simultaneously; there's a 16-degree margin of error. The headphones send that signal to an on-board embedded computer, where the team's machine learning software learns the desired speaker's vocal patterns. The system latches onto that speaker's voice and continues to play it back to the listener, even as the pair moves around. The system's ability to focus on the enrolled voice improves as the speaker keeps talking, giving the system more training data.

Continued below image.
Prototype of the headphone system with binaural microphones attached to off-the-shelf noise canceling headphones.
A University of Washington team has developed an artificial intelligence system that lets a user wearing headphones look at a person speaking for three to five seconds and then hear just the enrolled speaker’s voice in real time even as the listener moves around in noisy places and no longer faces the speaker. Pictured is a prototype of the headphone system: binaural microphones attached to off-the-shelf noise canceling headphones - Image Credit: Kiyomi Taguchi/University of Washington.
Continued...

The team tested its system on 21 subjects, who rated the clarity of the enrolled speaker's voice nearly twice as high as the unfiltered audio on average.

This work builds on the team's previous "semantic hearing" research, which allowed users to select specific sound classes - such as birds or voices - that they wanted to hear and canceled other sounds in the environment.

Currently the TSH system can enroll only one speaker at a time, and it's only able to enroll a speaker when there is not another loud voice coming from the same direction as the target speaker's voice. If a user isn't happy with the sound quality, they can run another enrollment on the speaker to improve the clarity.

The team is working to expand the system to earbuds and hearing aids in the future.

Co-authors

Additional co-authors on the paper were Bandhav Veluri, Malek Itani and Tuochao Chen, UW doctoral students in the Allen School, and Takuya Yoshioka, director of research at AssemblyAI.

Funding

This research was funded by a Moore Inventor Fellow award, a Thomas J. Cabel Endowed Professorship and a UW CoMotion Innovation Gap Fund.

Attribution/Source(s):

This quality-reviewed publication was selected for publishing by the editors of Disabled World due to its significant relevance to the disability community. Originally authored by University of Washington, and published on 2024/05/23, the content may have been edited for style, clarity, or brevity. For further details or clarifications, University of Washington can be contacted at washington.edu. NOTE: Disabled World does not provide any warranties or endorsements related to this article.

Explore Similar Topics

1 - - Target Speech Hearing cancels all other sounds in the environment and plays just the speaker's voice even as the listener moves around in noisy places and no longer faces the speaker.

2 - - Using semantic hearing the headphones stream captured audio to a connected smartphone, which cancels unwanted environmental sounds.

3 - - Investigating whether AirPods and other earbuds can be used as an alternative to expensive hearing aid devices.


Page Information, Citing and Disclaimer

Disabled World is a comprehensive online resource that provides information and news related to disabilities, assistive technologies, and accessibility issues. Founded in 2004 our website covers a wide range of topics, including disability rights, healthcare, education, employment, and independent living, with the goal of supporting the disability community and their families.

Cite This Page (APA): University of Washington. (2024, May 23). AI Headphones Can Isolate a Single Person's Voice in a Crowd. Disabled World. Retrieved December 10, 2024 from www.disabled-world.com/assistivedevices/hearing/target-speech-hearing.php

Permalink: <a href="https://www.disabled-world.com/assistivedevices/hearing/target-speech-hearing.php">AI Headphones Can Isolate a Single Person's Voice in a Crowd</a>: Target Speech Hearing cancels all other sounds in the environment and plays just the speaker's voice even as the listener moves around in noisy places and no longer faces the speaker.

While we strive to provide accurate and up-to-date information, it's important to note that our content is for general informational purposes only. We always recommend consulting qualified healthcare professionals for personalized medical advice. Any 3rd party offering or advertising does not constitute an endorsement.