ChatGPT can now talk and listen after major upgrade

14 May 2024

Image: © Panuwat/Stock.adobe.com

OpenAI claims its new flagship model GPT-4o can interact with text, audio and images ‘in real time’ and is being integrated with ChatGPT.

ChatGPT creator OpenAI has revealed a new upgrade that aims to combine the various functions of generative AI into a single chatbot, as AI competition heats up.

The company has released its new flagship AI model called GPT-4o, which is able to interact with text, audio and images “in real time”. This model is being rolled out to users through ChatGPT as a free upgrade, along with extra benefits for ChatGPT Plus users.

OpenAI says the model’s ‘o’ is short for ‘omni’, due to its ability to both accept and respond with text, audio and images. The company claims GPT-4o can respond to audio inputs at a similar speed to human response times and is a step towards “much more natural human-computer interaction”.

“With GPT-4o, we trained a single new model end-to-end across text, vision and audio, meaning that all inputs and outputs are processed by the same neural network,” OpenAI said in a blogpost. “Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.”

OpenAI CEO Sam Altman wrote a blog about the model and claims that the new voice and video mode is “the best computer interface I’ve ever used” and that it felt “natural” speaking to it.

“The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different,” Altman said. “It is fast, smart, fun, natural and helpful.”

OpenAI claims it has added safety systems on the model’s voice outputs and that GPT-4o had “extensive external red teaming” with more than 70 external experts to identify risks with the model’s new functions.

“We used these learnings to build out our safety interventions in order to improve the safety of interacting with GPT-4o,” the company said. “We will continue to mitigate new risks as they’re discovered.”

OpenAI also noted some limitations with the model in a short video, but still claims its latest flagship model exceeds competitors in most areas, based on evaluations using the company’s “traditional benchmarks”.

These comparisons can be difficult to verify, however. A report from the AI Index last month claimed robust evaluations for large language models are “seriously lacking” and there is a lack standardisation in responsible AI reporting.

“Leading developers, including OpenAI, Google and Anthropic, primarily test their models against different responsible AI benchmarks,” the report said. “This practice complicates efforts to systematically compare the risks and limitations of top AI models.”

ChatGPT was arguably the key product that led to the focus on generative AI witnessed across various sectors, as it had a surge in users in the months following its launch. But OpenAI is dealing with a more crowded market now, as rival companies such as Anthropic have sprouted up with significant funding to showcase their own AI models.

Find out how emerging tech trends are transforming tomorrow with our new podcast, Future Human: The Series. Listen now on Spotify, on Apple or wherever you get your podcasts.