OpenAI just unveiled its astonishing new voice cloning AI that is able to replicate a human voice accurately with just a 15-second audio clip. Similar to Elvenlabs, the results are quite realistic and OpenAI says that this poses a notable threat.
This new AI is called Voice Engine and OpenAI has shared a few samples created by it to showcase its prowess. As mentioned earlier, the voices created by Voice Engine are eerily similar to the original, which is both impressive and concerning at the same time. The AI is also able to produce natural-sounding human voices with brief text prompts.
Voice Engine is not an entirely new technology. It is based on OpenAI’s existing Text-to-Speech API and has been in the works since 2022. The same API is also used for ChatGPT’s Voice and Read Aloud features.
Since the results are so realistic, OpenAI is concerned about the damage Voice Engine could cause in the wrong hands. Since the previous year’s conclusion, OpenAI has conducted private experiments with the Voice Engine alongside a select group of collaborators. Early implementations have demonstrated its potential in various fields:
- Enhancing accessibility for individuals unable to read and for young children by employing natural and expressive vocal reproductions.
- Enabling video and podcast translations to help creators engage a global audience in their mother tongue.
- Augmenting essential services in isolated regions.
- Assisting individuals with speech impairments, including applications in speech therapy.
- Restoring the voices of patients who have experienced sudden or progressive loss of their ability to speak.
OpenAI has acknowledged the considerable hazards presented by its Voice Engine technology, notably its ability to sway voter opinions during an election period. The company has implemented stringent guidelines for its testing partners, explicitly banning any form of impersonation without prior consent.
These partners are required to obtain clear authorization from individuals whose voices they wish to replicate, and the creation of voices by users themselves is strictly prohibited. Additionally, it is imperative that voices synthesized by the AI are accurately identified as such. AI generated voices created by Voice Engine will also be clearly labeled as AI made.