AUTHOREA
Log in Sign Up Browse Preprints
LOG IN SIGN UP
Ayush Tripathi
Ayush Tripathi
Graduate Student
Ghaziabad

Public Documents 1
BHASHABLEND: Bridging Transcription and Translation for multilingual video content
Ayush Tripathi
Vanshika Yadav

Ayush Tripathi

and 3 more

October 26, 2024
Translation of video content into many languages is effectively and accurately feasible with existing solutions but still poses a great challenge. This work outlines a sophisticated advanced system that satisfies quality and accessibility improvements in multilingual video translation. The proposed method includes extracting audio from video, transcribing the audio using an innovative speech recognition model, and then translating the transcribed text into various languages. Using Google’s translation API, and then converting the translated text into speech with Google’s Text-to-Speech library—all in complete synchrony with the original video. The BhashaBlend model achieved a strong word error rate of 12.4%, significantly better than many of the major ASR systems: Google at 15.82%, and Microsoft at 16.51%. The model’s performance was powerful on languages with the simplest phonetic realization, such as for example, German, English, and Spanish, which proves its dependability also to deliver multilingual transcription and dubbing. This highlights the potential of the model to produce results where excessive lingual complexity is involved and points towards the high applicability scope of BhashaBlend in language-polyvalent applications.

| Powered by Authorea.com

  • Home