Social media giant Meta is making it much easier for people from different cultures to communicate with one another without any hitches. The company’s AI wing has just released its latest suite of AI models called Seamless Communication.
These models are designed to facilitate more authentic and natural cross-language communication, effectively bringing the idea of a Universal Speech Translator closer to reality. This week, the research papers and associated data for these models were made publicly available.
This new model combines Meta’s three other AI models called SeamlessExpressive, SeamlessStreaming, and SeamlessM4T v2 into a single system that works together. The research paper for Seamless Communication describes it as: “the first publicly available system that unlocks expressive cross-lingual communication in real-time.”
Seamless Communication can translate over 100 spoken and written languages in real-time, representing a new frontier in the use of AI for communication in the industry. Better yet, it is even able to preserve the speaker’s vocal style, emotion, and prosody without making it sound robotic, thanks to Seamless Expressive.
The research paper adds: “Translations should capture the nuances of human expression. While existing translation tools are skilled at capturing the content within a conversation, they typically rely on monotone, robotic text-to-speech systems for their output.”
Seamless Streaming has a minimal translation delay of only 2 seconds, making it one of the fastest live translation AI models with support for over 100 languages.
The third and last model in the system, SeamlessM4T v2, acts as the cornerstone for the other two models. It represents an enhanced iteration of the original SeamlessM4T model introduced last year. The updated architecture, as described in the paper, enhances the coherence between text and speech output.
The researchers wrote: “In sum, Seamless gives us a pivotal look at the technical foundation needed to turn the Universal Speech Translator from a science fiction concept into a real-world technology.”
The researchers also recognize the potential misuse of this technology in the wrong hands, which is why they have implemented several safety measures such as audio watermarking and new techniques to reduce hallucinated toxic outputs.