AI Headphones Allow You To Listen to One Person in a Crowd

In a crowded, noisy environment, have you ever wished you could tune out all the background chatter and focus solely on the person you’re trying to listen to? While noise-canceling headphones have made great strides in creating an auditory blank slate, they still struggle to allow specific sounds from the wearer’s surroundings to filter through. But what if your headphones could be trained to pick up on and amplify the voice of a single person, even as you move around a room filled with other conversations?

Contents

How Target Speech Hearing Works Testing AI Headphones with TSH Improving AI Headphones and Overcoming Limitations

Target Speech Hearing (TSH), a groundbreaking AI system developed by researchers at the University of Washington, is making progress in this area.

How Target Speech Hearing Works

To use TSH, a person wearing specially-equipped headphones simply needs to look at the individual they want to hear for a few seconds. This brief “enrollment” period allows the AI system to learn and latch onto the unique vocal patterns of the target speaker.

Here’s how it works under the hood:

The user taps a button while directing their head towards the desired speaker for 3-5 seconds.
Microphones on both sides of the headset pick up the sound waves from the speaker’s voice simultaneously (with a 16-degree margin of error).
The headphones transmit this audio signal to an onboard embedded computer.
The machine learning software analyzes the voice and creates a model of the speaker’s distinct vocal characteristics.
The AI system uses this model to isolate and amplify the enrolled speaker’s voice in real-time, even as the user moves around in a noisy environment.

The longer the target speaker talks, the more training data the system receives, allowing it to better focus on and clarity the desired voice. This innovative approach to “selective hearing” opens up a world of possibilities for improved communication and accessibility in challenging auditory environments.

Shyam Gollakota is the senior author of the paper and a UW professor in the Paul G. Allen School of Computer Science & Engineering

“We tend to think of AI now as web-based chatbots that answer questions. But in this project, we develop AI to modify the auditory perception of anyone wearing headphones, given their preferences. With our devices you can now hear a single speaker clearly even if you are in a noisy environment with lots of other people talking.” – Gollakota

Testing AI Headphones with TSH

To put Target Speech Hearing through its paces, the research team conducted a study with 21 participants. Each subject wore the TSH-enabled headphones and enrolled a target speaker in a noisy environment. The results were impressive – on average, the users rated the clarity of the enrolled speaker’s voice as nearly twice as high compared to the unfiltered audio feed.

This breakthrough builds upon the team’s earlier work on “semantic hearing,” which allowed users to filter their auditory environment based on predefined sound classifications, such as birds chirping or human voices. TSH takes this concept a step further by enabling the selective amplification of a specific individual’s voice.

The implications are significant, from enhancing personal conversations in loud settings to improving accessibility for those with hearing impairments. As the technology develops, it could fundamentally change how we experience and interact with our auditory world.

Improving AI Headphones and Overcoming Limitations

While Target Speech Hearing represents a major leap forward in auditory AI, the system does have some limitations in its current form:

Single speaker enrollment: As of now, TSH can only be trained to focus on one speaker at a time. Enrolling multiple speakers simultaneously is not yet possible.
Interference from similar audio sources: If another loud voice is coming from the same direction as the target speaker during the enrollment process, the system may struggle to isolate the desired individual’s vocal patterns.
Manual re-enrollment: If the user is unsatisfied with the audio quality after the initial training, they must manually re-enroll the target speaker to improve the clarity.

Despite these constraints, the University of Washington team is actively working on refining and expanding the capabilities of TSH. One of their primary goals is to miniaturize the technology, allowing it to be seamlessly integrated into consumer products like earbuds and hearing aids.

As the researchers continue to push the boundaries of what’s possible with auditory AI, the potential applications are vast, from enhancing productivity in distracting office environments to facilitating clearer communication for first responders and military personnel in high-stakes situations. The future of selective hearing looks bright, and Target Speech Hearing is poised to play a pivotal role in shaping it.