Meta has introduced a new AI model named SeamlessM4T. This is in pursuit of advancing artificial intelligence capable of comprehending multiple languages. This innovative AI model can translate and transcribe over 100 languages, encompassing both text and speech formats, according to Meta.
SeamlessM4T represents a significant breakthrough in the field of AI-driven speech-to-speech and speech-to-text capabilities. In a blog post shared with TechCrunch, Meta explains that this single model facilitates instantaneous translation, enabling effective communication between individuals speaking different languages without requiring a separate language identification model.
The development of SeamlessM4T follows Meta’s earlier text-to-text machine translation model called Perceptron AI, as well as the Universal Speech Translator, notable for its support of the Hokkien language in direct speech-to-speech translation systems.
This innovation builds upon Massively Multilingual Speech, Meta’s framework that offers technology for speech recognition, language identification, and speech synthesis across a vast range of over 1,100 languages.
While Meta’s endeavors are commendable, other entities are also investing resources in the advancement of sophisticated AI translation and transcription tools. Companies such as Amazon, Microsoft, OpenAI, and various startups already provide commercial services and open-source models.
Google is also working on the Universal Speech Model, to comprehend the world’s most widely spoken languages. Meanwhile, Mozilla’s Common Voice initiative aims to create a diverse collection of voices for training automatic speech recognition algorithms. Among these efforts, SeamlessM4T stands out as an attempt to combine translation and transcription capabilities within a single model.
According to Meta, the creation of SeamlessM4T involved the processing of 4 million hours of speech and “tens of billions” of sentences from publicly accessible internet text. Juan Pino, a research scientist at Meta’s AI research division and a collaborator on the project, did not reveal the specific data sources in an interview with TechCrunch, only mentioning that there was a diverse range of sources.
Meta asserts that the data it utilized, which may include personally identifiable information, was not copyrighted and primarily originated from open sources. This data was used to construct the training dataset for SeamlessM4T, known as SeamlessAlign. Researchers aligned 443,000 hours of speech with texts and generated 29,000 hours of “speech-to-speech” alignments. This enabled SeamlessM4T to transcribe speech-to-text, translate text, generate speech from text, and even translate spoken words from one language to another.
Read Also: Moniepoint Acquires Kenyan Player Kopo Kopo Inc
About Meta SeamlessM4T
Meta claims that on an internal benchmark, SeamlessM4T outperformed the current state-of-the-art speech transcription model in handling background noise and variations in speaker tone. Meta attributes this success to the combination of speech and text data in the training dataset, which provides SeamlessM4T with an advantage over models that rely solely on speech or text.
In a blog post, Meta expresses its belief that SeamlessM4T represents a significant advancement in the pursuit of universal multitask AI systems, delivering exceptional results. However, it’s important to consider potential biases that the model might contain.
A recent article in The Conversation highlights various shortcomings in AI-powered translations, including instances of gender bias. A study from The Proceedings of the National Academy of Sciences revealed that prominent speech recognition systems were more likely to inaccurately transcribe audio from Black speakers compared to White speakers.
In a published blog post, Meta acknowledges that the model tends to “overgeneralize to masculine forms when translating from neutral terms.” It performs better when translating from masculine references (such as English nouns like “he”) for most languages.
Additionally, when gender information is absent, SeamlessM4T tends to lean towards translating in the masculine form around 10% of the time. Meta speculates that this might be due to an “overrepresentation of masculine lexica” in the training data.
Meta asserts that SeamlessM4T doesn’t produce an excessive amount of toxic text in its translations, a common issue with various translation and generative AI text models. However, in certain languages like Bengali and Kyrgyz, the model generates more toxic translations related to socio-economic status and culture. Generally, SeamlessM4T tends to exhibit more toxicity in translations involving sexual orientation and religion.
Meta points out that the public demonstration of SeamlessM4T includes a toxicity filter for input and output speech. However, this filter is not enabled by default in the open-source release of the model.
Challenges in Meta AI translation
There’s a potential loss of linguistic richness that can arise from excessive reliance on AI. Unlike humans, AI lacks the personalized choices that human interpreters use during translation, potentially resulting in a uniform style known as “translationese.” While AI can provide more precise translations, it might come at the expense of variety and diversity in translations.
Due to these concerns, Meta advises against using SeamlessM4T for extensive or certified translations recognized by government agencies and translation authorities.
Furthermore, Meta cautions against using SeamlessM4T for medical or legal purposes to mitigate potential misinterpretations. This caution is notable considering cases where AI mistranslations have led to errors in law enforcement. For instance, a mistranslated text message led to the wrongful accusation of a Kurdish man of financing terrorism. In another case, a flawed translation during a police car search resulted in a misunderstanding, ultimately leading to the dismissal of the case.
Conclusion
While AI translation can enhance accuracy, there might be a trade-off in terms of compromising translation diversity. This underscores the importance of using AI-powered tools judiciously, particularly in sensitive situations.
Follow techkudi.com for more