Technology

OpenAI announces GPT-4o, a multimodal voice assistant

OpenAI CEO Sam Altman speaking onstage at the company's developer conference

OpenAI has unveiled GPT-4o, a new AI model that combines text, vision, and audio.

At its highly anticipated livestream event, OpenAI CTO Mira Murati shared that GPT-4o can process text, audio, and vision in one model. GPT-4o will be available to all users, including free users, in ChatGPT and the API.

The announcement confirmed previous rumors of a voice assistant. Previously, there were separate models for voice and image modalities. Now the GPT-4o brings the modalities together, decreasing lag, and making it real-time responsive. That means you can interrupt the model. It can also sense emotions and tones and is capable of expressing its own emotions and tones, making it sound extremely dramatic or very robotic. It can also sing (if you want it to).

Another demo showcased GPT-4o’s ability to help with math problems using its vision modality.

Murati started the event by sharing the availability of a new desktop app.

Previously, OpenAI was rumored to announce a ChatGPT search engine or a new transformer model GPT-5 ahead of Google I/O. CEO Sam Altman shot down those rumors ahead of Monday’s event, but they are still believed to be in development.

This story is developing…

Mashable