ShiftDelete.Net Global

OpenAI announces its new speech model: gpt-realtime

Ana sayfa / News

OpenAI has officially introduced its latest innovation in artificial intelligence, gpt-realtime, a new speech model. This new model comes with both more advanced and more affordable features.

OpenAI reported that thousands of developers have created natural speech experiences in their applications since the release of the Realtime API in October 2024. Now, it has announced the gpt-realtime model, which takes this experience to the next level.

One of the most notable features of the new model is its ability to better understand and execute complex commands. The company states that the model’s error rate has decreased in tasks like calling a ride. It is also said that the voices it produces are more natural and expressive. It interprets system messages and commands used by developers more accurately than previous models.

The Realtime API initially offered six voice options. OpenAI has now added two new voices, Marin and Cedar, bringing the total number of voices to ten. All eight existing voices have been updated to provide a more natural and fluid speech experience.

The new model also outperforms the previous version in performance tests. In the Big Bench Audio test, it achieved 82.8% accuracy, exceeding the previous model’s 65.6% score. In the MultiChallenge Audio Benchmark test, it exceeded the 20.6% score and reached 30.5%.

The new model also includes updates to the Realtime API. It now supports remote MCP servers, visual input, and phone calls over SIP (Session Initiation Protocol). Developers can now save and reuse frequently used commands.

Despite all these improvements, OpenAI has lowered the price of the Realtime API. The gpt-realtime model is 20% cheaper than the previous gpt-4o-realtime-preview. 1 million voice input tokens are now $32, and 1 million voice output tokens are $64.

Yorum Ekleyin