loading page

Building Text-to-Speech Models for Low-Resourced Languages from Crowdsourced Data
  • +2
  • Andrew Katumba,
  • Sulaiman Kagumire,
  • Joyce Nakatumba-Nabende,
  • John Quinn,
  • Sudi Murindanyi
Andrew Katumba
Makerere University

Corresponding Author:andrew.katumba@mak.ac.ug

Author Profile
Sulaiman Kagumire
Makerere University
Author Profile
Joyce Nakatumba-Nabende
Makerere University
Author Profile
John Quinn
Makerere University
Author Profile
Sudi Murindanyi
Makerere University
Author Profile

Abstract

Text-to-speech (TTS) models have expanded the scope of digital inclusivity by becoming a basis for assistive communication technologies for visually impaired people, facilitating language learning, and allowing for digital textual content consumption in audio form across various sectors. Despite these benefits, the full potential of TTS models is often not realized for the majority of low-resourced African languages because they have traditionally required large amounts of high-quality single-speaker recordings, which are financially costly and time-consuming to obtain. In this paper, we demonstrate that crowdsourced recordings can help overcome the lack of single-speaker data by compensating with data from other speakers of similar intonation (how the voice rises and falls in speech). We fine-tuned an English Variational Inference with adversarial learning for an end-to-end Text-to-Speech (VITS) model on over 10 hours of speech from six female Common Voice (CV) speech data speakers for Luganda and Kiswahili. A human mean opinion score evaluation on 100 test sentences shows that the model trained on six speakers sounds more natural than the benchmark models trained on two speakers and a single speaker for both languages. In addition to careful data curation, this approach shows promise for advancing speech synthesis in the context of low-resourced African languages. Our final models for Luganda and Kiswahili are available at https://huggingface.co/marconilab/VITS-commonvoice-females.
01 Nov 2024Submitted to Applied AI Letters
04 Nov 2024Submission Checks Completed
04 Nov 2024Assigned to Editor
21 Nov 2024Reviewer(s) Assigned
06 Jan 2025Review(s) Completed, Editorial Evaluation Pending
06 Jan 2025Editorial Decision: Revise Major