OpenAI Launches Three New Voice AI Models for Realtime API
- •OpenAI introduced three next-generation audio models for the Realtime API focused on reasoning, translation, and transcription
- •GPT-Realtime-2 features enhanced reasoning capabilities, support for tool calling, and a 128K context window
- •New dedicated models for translation and streaming transcription enable multilingual support and low-latency processing
OpenAI announced three next-generation audio models for its Realtime API on May 7, 2026: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.
GPT-Realtime-2 provides reasoning capabilities comparable to GPT-5, including in-conversation tool calling. The model expands its context window from 32K to 128K, allowing for more complex task handling, while adding tone control and support for parallel tool execution.
GPT-Realtime-Translate performs real-time translation across 70 input languages and 13 output languages. GPT-Realtime-Whisper is a streaming speech recognition model designed for low-latency transcription during ongoing speech. All three models are available via the Realtime API and include built-in safety features to detect harmful content.