OpenAI has taken an important step in AI-based voice technologies. The company announced its models that can produce more natural speech compared to its previous generation models. These models will allow AIs to communicate with people more intuitively and fluently.
OpenAI introduced its new generation voice models
The new speech model introduced by the company, gpt-4o-mini-tts, offers more realistic and flexible speech compared to previous speech synthesis technologies. Developers can direct the speech style of this model and make the AI speak in a certain tone or character. For example, when the model is given the command “Speak like a medieval knight”, a speech emerges accordingly.
On the other hand, OpenAI also announced the gpt-4o-transcribe and gpt-4o-mini-transcribe models that will replace the Whisper model. These models were trained with various and high-quality audio data to better understand different accents and speech patterns. It is stated that Whisper’s previous error rates have been significantly reduced, and that the new systems have increased transcription accuracy.
However, OpenAI announced that it will not release the new transcription models as open source. The company had previously offered Whisper as open source, but this time it announced that it will only offer open source solutions for certain use cases, stating that the models are more complex.
The new generation voice models are available to all developers via OpenAI’s API platform. So what do you think about this? You can easily share your views with us in the comments section below.