ChatGPT evolves: Speaks better and now translates conversations in real time

OpenAI has taken a new step in the evolution of ChatGPT, substantially improving its ability to speak with a more fluid, human, and emotional voice. The new function, called “Advanced Voice Mode,” promises not only to improve intonation and pauses but also to include emotional nuances such as empathy or even sarcasm. In addition, it incorporates simultaneous translation, which means that it can function as a real-time interpreter in multilingual dialogues.

Instant translation between languages and more natural voice

The system is now capable of translating conversations between pairs of languages selected by the user. Once activated, it interprets both sides of the conversation without interruption, until otherwise indicated. This makes the assistant a useful tool, for example, in labor negotiations between people of different nationalities or when ordering food in restaurants during a trip abroad.

According to OpenAI, paid subscribers can now activate this function from the chat interface by pressing the language icon. The new voice version has been deployed on all platforms and adapts to multiple pre-configured voices, allowing users to customize the listening experience according to the use case.

In addition, the conversion of text to speech has been refined, resulting in clearer understanding and more accurate pronunciation, even in scenarios with ambient noise. The intention is to offer an AI conversation experience that comes ever closer to real human interactions.

Current limitations and pending challenges

Although the advances are significant, the technology still has room for improvement. OpenAI has admitted that there may be drops in audio quality, such as sudden changes in tone or volume. These irregularities vary depending on the voice selected by the user.

Another frequently reported problem is the so-called sonic “hallucinations.” This refers to when ChatGPT emits strange sounds without being asked, as if they were background effects, random noises, or even fragments of music or possible advertisements. In one particular case, a user claimed to have heard what appeared to be a commercial during their conversation, even though OpenAI does not include advertising in ChatGPT.

Advanced Voice Mode was introduced by OpenAI in May 2024 and, later, its availability was extended to the European Union. Along with this improvement, the company has added functionalities such as screen sharing and camera activation. This allows ChatGPT not only to listen and speak but to “see” the user’s environment in real time and comment on visible objects. Its main competitor, Google Gemini, already offers similar functions.

With these improvements, ChatGPT accelerates its consolidation as a versatile conversational AI that can serve as a virtual assistant, interpreter, tour guide, and much more. Advances in AI voice allow us to glimpse a future where natural interaction between humans and machines is a regular part of everyday life.