Nvidia claims new AI model can generate new sounds

26 Nov 2024

Image: © BINGJHEN/Stock.adobe.com

The company’s newest model comes at a time when Big Tech is under fire for how its AI technology affects creative industries.

Tech giant Nvidia has unveiled its latest AI model, which it describes as “the world’s most flexible sound machine”.

Fugatto, which is short for Foundational Generative Audio Transformer Opus 1, can generate any mix of music, voices or sounds described with prompts using a combination of text and audio files, the company said.

The model was built on Nvidia’s previous work around speech modelling, voice encoding and audio understanding.

Nvidia said the Fugatto was created by people from around the world, including India, Brazil, China, Jordan and South Korea, making its “multi-accent and multilingual capabilities stronger”.

Rafael Valle, a manager of applied audio research at Nvidia, said the company wanted to create a model that understands and generates sounds like humans do.

“Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale,” he said.

Nvidia also claims that Fugatto allows users to create soundscapes “it’s never seen before”, setting it apart from other models.

However, it’s important to take a company’s claims about its own models with a pinch of salt. Earlier this year, the Stanford AI Index claimed that robust evaluations for large language models are “seriously lacking” and there is a lack standardisation in responsible AI reporting.

And last year, the Foundational Model Transparency Index created by US researchers suggested that companies in the foundational AI model space are becoming less transparent about their creations.

Under scrutiny in more ways than one

Nvidia has been investing heavily in the AI space, along with many other tech giants and has thus far managed to reap the benefits. In May 2023, it became the first chipmaker to reach a $1trn valuation and in June of this year, it became the world’s most valuable company.

But the company has come under fire for stifling competition, both in the chips market and in the AI market.

And outside of competition investigations, the AI chipmaker has also come under fire for allegedly using copyrighted books to train AI as questions over artificial intelligence’s threat to creative industries rumbles on.

Last year also saw thousands sign a letter written by the US Authors Guild, calling on the likes of OpenAI, Alphabet and Meta to stop using their work to train AI models without “consent, credit or compensation”.

Earlier this year, hundreds of musicians – including Billie Eilish and Katy Perry – signed an open letter calling on developers to stop using AI to “devalue the rights of human artists”.

And in May, Sony wrote to more than 700 tech companies asking them to refrain from using its content to train AI models.

Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.

Jenny Darmody is the editor of Silicon Republic

editorial@siliconrepublic.com