Categories: BlogsTechnology

Meta update: Introducing AI Models That Understand How the World Around Us Sounds

Meta Blog posted yesterday, “Today, our artificial intelligence (AI) researchers and audio specialists from our Reality Labs team, in collaboration with researchers from the University of Texas at Austin, are making three new models for audio-visual understanding open to developers. These models, which focus on human speech and sounds in video, are designed to push us toward a more immersive reality at a faster rate. “

Whether it’s mingling at a party in the metaverse or watching a home movie in your living room through augmented reality (AR) glasses, acoustics play a role in how these moments will be experienced. We’re building for mixed reality and virtual reality experiences like these, and we believe AI will be core to delivering realistic sound quality.

All three models tie into our AI research around audio-visual perception. We envision a future where people can put on AR glasses and relive a holographic memory that looks and sounds the exact way they experienced it from their vantage point or feel immersed by not just the graphics but also the sounds as they play games in a virtual world.

These models are bringing us even closer to the multimodal, immersive experiences we want to build in the future.

Visual-Acoustic Matching

Anyone who has watched a video where the audio isn’t consistent with the scene knows how disruptive this can feel to human perception. However, getting audio and video from different environments to match has previously been a challenge.

To address this, we created a self-supervised Visual-Acoustic Matching model, called AViTAR, which adjusts audio to match the space of a target image. The self-supervised training objective learns acoustic matching from in-the-wild web videos, despite their lack of acoustically mismatched audio and unlabeled data.

One future use case we are interested in involves reliving past memories. Imagine being able to put on a pair of AR glasses and see an object with the option to play a memory associated with it, such as picking up a tutu and seeing a hologram of your child’s ballet recital. The audio strips away reverberation and makes the memory sound just like the time you experienced it, sitting in your exact seat in the audience.

Also Read: Instagram extends Reels API Access to Third-Party Platforms

VisualVoice

VisualVoice learns in a way that’s similar to how people master new skills — multimodally — by learning visual and auditory cues from unlabeled videos to achieve audio-visual speech separation.

For example, imagine being able to attend a group meeting in the metaverse with colleagues from around the world, but instead of people having fewer conversations and talking over one another, the reverberation and acoustics would adjust accordingly as they moved around the virtual space and joined smaller groups. VisualVoice generalizes well to challenging real-world videos of diverse scenarios.

Social Nation

See Full Bio

Social Nation

Next Rukaiya Khan conquering the art of lip-syncing! »

Previous « Ten times Ruhee Dosani proved she's the Queen Bee of Content Collaboration!

Inside the Viral Rowhi Rai Rohit Yadav Proposal Story

Okay, so we need to talk about what just happened. Rohit Yadav got down on…

2 days ago

Creators

Pranjal Nehete Gives Argentina’s Football Jersey A Whole New Look In Honour Of The World Cup! Know More About His 11 Looks

As football continues to influence global fashion, creator Pranjal Nehete is showing that a team…

3 weeks ago

Meta update: Introducing AI Models That Understand How the World Around Us Sounds

Visual-Acoustic Matching

Also Read: Instagram extends Reels API Access to Third-Party Platforms

VisualVoice

Leave a Comment

Recent Posts

Inside the Viral Rowhi Rai Rohit Yadav Proposal Story

Unnati and Manav Join Prime Video’s Obsessed Fest! Here’s Everything You Need to Know

Prajakta Koli Spoke At VidCon Anaheim 2026! Here’s Everything You Need To Know

Why Dolly Javed And Vanshaj Singh Could Be “The Alliance’s” Biggest Surprise! Read Here To Find Out

“Lock Upp: Sach Ya Sazaa” Is Back! Here’s Everything You Need to Know

Pranjal Nehete Gives Argentina’s Football Jersey A Whole New Look In Honour Of The World Cup! Know More About His 11 Looks

Meta update: Introducing AI Models That Understand How the World Around Us Sounds

Visual-Acoustic Matching

Also Read: Instagram extends Reels API Access to Third-Party Platforms

VisualVoice

Leave a Comment

Related Post

Recent Posts

Inside the Viral Rowhi Rai Rohit Yadav Proposal Story

Unnati and Manav Join Prime Video’s Obsessed Fest! Here’s Everything You Need to Know

Prajakta Koli Spoke At VidCon Anaheim 2026! Here’s Everything You Need To Know

Why Dolly Javed And Vanshaj Singh Could Be “The Alliance’s” Biggest Surprise! Read Here To Find Out

“Lock Upp: Sach Ya Sazaa” Is Back! Here’s Everything You Need to Know

Pranjal Nehete Gives Argentina’s Football Jersey A Whole New Look In Honour Of The World Cup! Know More About His 11 Looks