A framework for detecting and transcribing multilingual speech in low resource languages
A framework for detecting and transcribing multilingual speech in low resource languages
| dc.contributor.author | Abigaba, Wilson | |
| dc.date.accessioned | 2026-01-04T14:19:41Z | |
| dc.date.available | 2026-01-04T14:19:41Z | |
| dc.date.issued | 2025 | |
| dc.description | A project report submitted to the Directorate of Graduate Training for the study leading to partial fulfillment of the requirements for the award of the Degree of Master of Science in Data Communications and Software Engineering of Makerere University | |
| dc.description.abstract | This dissertation presents the development and evaluation of a novel multilingual automatic speech recognition (ASR) framework specifically designed for low-resource Ugandan languages, with focus on Luganda and Runyankole-Rukiga. The research addresses a critical gap in language technology by creating the first deep learning-based ASR system capable of transcribing speech in these Bantu languages, which collectively serve over 8 million speakers in Uganda but have historically been underrepresented in speech recognition technologies. The study employed transfer learning techniques by fine-tuning OpenAI's Whisper model on a custom-curated dataset of 2,000 speech samples comprising approximately 12 hours of audio data. The developed framework achieved a Word Error Rate (WER) of 50% for Luganda and 60% for Runyankole-Rukiga, with corresponding Character Error Rates (CER) of 22% and 28% respectively. These results represent significant improvements over baseline models, demonstrating WER reductions of up to 45% compared to non-fine-tuned systems. The framework incorporates a language detection module capable of identifying language switches with 78% accuracy, enabling real-time multilingual transcription scenarios common in Uganda's multilingual contexts. A comprehensive evaluation involving both technical benchmarks and user studies with 20 participants validated the framework's effectiveness, efficiency, and usability. The system achieved a System Usability Scale (SUS) score of 76.5, indicating above-average usability, with users rating transcription quality at 4.1 out of 5 stars. Performance analysis revealed that the model handles various acoustic conditions, speaker demographics, and speech rates with reasonable accuracy of 78.5%, though challenges remain with highly code-switched utterances and low-frequency vocabulary. The research makes several key contributions to the field of low-resource language processing: (1) creation of the first Whisper-based ASR system for Luganda and Runyankole-Rukiga; (2) demonstration of effective transfer learning strategies for Bantu languages; and (3) validation of practical deployment approaches for resource-constrained environments. The framework is designed with scalability in mind, providing a foundation for expansion to additional Ugandan languages and similar low-resource contexts across Africa. This work has significant implications for linguistic preservation, digital inclusion, and practical applications in healthcare, education, and government services within Uganda. By enabling speech-based interfaces in local languages, the framework contributes to bridging the digital divide and preserving cultural identity in an increasingly AI-driven world. Future work will focus on expanding the dataset, incorporating more speakers and dialects, implementing full code-switching detection capabilities, and deploying the system in real-world applications. Keywords: Automatic Speech Recognition, Low-Resource Languages, Multilingual ASR, Luganda, Runyankole-Rukiga, Transfer Learning, Whisper, Bantu Languages, Language Technology, Uganda | |
| dc.identifier.citation | Abigaba, W. (2025). A framework for detecting and transcribing multilingual speech in low resource languages; Unpublished Masters dissertation, Makerere University, Kampala | |
| dc.identifier.uri | https://makir.mak.ac.ug/handle/10570/16156 | |
| dc.language.iso | en | |
| dc.publisher | Makerere University | |
| dc.title | A framework for detecting and transcribing multilingual speech in low resource languages | |
| dc.type | Other |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- ABIGABA-COSIS-Masters-2025.pdf
- Size:
- 2.88 MB
- Format:
- Adobe Portable Document Format
- Description:
- Masters dissertation
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 462 B
- Format:
- Item-specific license agreed upon to submission
- Description: