OpenAI Unveils Three New Exciting Voice Models With Advanced Offerings

The makers of ChatGPT may have gotten into a lot of trouble in the past through its voice AI models for mimicking actress Scarlett Johansson’s but that’s not stopping them from working on better offerings.

The company is bettering this category by rolling out new proprietary voice models such as gpt-40-mini-tts, gpt-40-transcribe, and gpt-40-mini-transcribe. For now, they will be up for grabs through the company’s API for third-party developers to design their own apps. However, they will also be up for grabs on a more customized demo site where individuals can get access to performing limited testing and have fun.

The new gpt-40-mini-tts model gets customized from different pre-sets through text prompts to alter accents, tone, pitch, and different voice qualities. This includes portraying any kind of emotion that the user requests them to engage in. It certainly goes a long way to addressing serious concerns that OpenAI deliberately imitates at any given moment in time in the user’s voice. So at the end of it all, it’s up to the user to determine how they want the AI voice to sound when you’re speaking back.

In another demo shared with VentureBeat, one staff member from OpenAI displayed how using text alone on demo sites might get the voice to sound like it’s some kind of cackling mad nerd or another extreme of a mind therapist performing yoga. So yes, the variations are certainly there.

The company says the models are designed to refine all new capabilities found inside the GPT-4o base.

These models are different versions of any existing GPT-40 models rolled out in May of 2024. They are currently powering the text and voice commands for users but the company took base models and decided to post-train it with more data to make it excel with speech and transition.

The company failed to specify when these models could come to ChatGPT. We know how the tech giant has different requirements when it comes to pricing and performance. So while many expect these models to get better with time, the launch will solely focus on API developers right now.

This version is designed to supersede all other text-to-speech models and provide lower word error rates. They will also provide greater competitive features in noisy environments, acclimatize to diverse accents, and can even better speech speeds across different languages.

The organization shared a chart on the website showing how much less the GPT-40-transcribe model's error rates have become when identifying terms across different languages.

Such models entail features like noise cancellation, detectors for voice activity, and even help users finish thoughts when speaking. You can even find better transcription accuracy rates here. The latest GPT-40-transcribe model is designed to receive one input and respond to inputs using a single output voice in such interactions, no matter how long it might take.

The organization is also rolling out a competition for the public to find the most creative examples of utilizing demo voice sites and sharing those online by adding a specific tag. It’s a gold mine for audio apps and we feel the enhancements are what differentiates them from others. These give rise to more suitable apps like client call centers, AI assistants, and even any kind of note transcriptions.

Read next: Telegram CEO Refers to WhatsApp As ‘Cheap Version of Telegram’

OpenAI Unveils Three New Exciting Voice Models With Advanced Offerings

Dr. Hura Anwar

You might like