Introduction
On 10th October 2025, Dr. Joseph Muguro, a Research Affiliate from Center for Data Science and Artificial Intelligence (DSAIL) and I attended AfriLang AI Conference held at Four Points by Sheraton hotel in Kampala, Uganda. The event brought together experts and researchers to explore how AI can be integrated to preserve and promote African languages. I am excited to share that it was a remarkable event that offered valuable opportunities to learn, connect and network.
Key highlight: The launch of Sunflower by Sunbird AI
The major highlight of the event was the launch of Sunflower by Sunbird AI, Uganda’s first multilingual AI model. The model can understand and communicate in 31 Ugandan languages, helping to bridge the digital language divide. The model was trained on books, radio broadcasts, and community archives. It supports translation, summarization, question answering and outperforms global AI models in 24 of the 31 languages.
Shaping the Future of Language Technologies: The Invited Talks
An insightful session was led by Ms. Babra Babweteera, the Executive Director of Cross-Cultural Foundation of Uganda, who spoke on Language Preservation in the AI Age.
She explained the importance of language and documentation, noting that only a few languages are well documented. She encouraged the use of local language in schools through debates and other learning activities as a way of promoting and preserving language.
In her remarks, she stressed the need to protect and safeguard our language, reminding participants that language is a vital part of our identity.
Dr. John Quinn, the Director of Research at Sunbird AI, delivered a talk on African Language Technology. He discussed the machine translation process used in training Sunflower, explaining how the team utilized diverse text data, carried out pretraining and later performed fine tuning to enhance performance.
He also spoke about Large Language Models (LLMs) and data preparation in model training.
He emphasized the need for compressing large models so that they can run efficiently on local devices as well as reducing the latency in speech based LLMs.
Professor Vukosi Marivate, the co-founder of Masakhane NLP, shared an insightful session on African NLP Strategy. He highlighted the importance of collaboration and open research in building language technologies that reflect African context and serve local communities.
He emphasized the need to develop and strengthen African NLP ecosystems through shared data and open models.
He also discussed the significance of data collection and curation as well as actively engaging communities showing how such efforts are valuable for both research and commercial applications.
Innovations in Language Technology Research: Paper Presentations
We also had paper presentations around data collection and NLP from different speakers. Cynthia Amol, a researcher at Maseno University presented a paper titled Driven Extension of African Datasets Through Human-AI Collaboration.
She highlighted the challenges of data collection for African languages, noting that it is expensive and difficult since most of these languages are low resource languages.
She explained how Tonative Community is addressing the gap by engaging volunteers to contribute data through translation of text data into different Kenyan languages.
I presented DSAIL's ongoing work on bridging the language gap through machine translation, demonstrating how AI can break down language barriers and give greater voice to African languages.
Evelyn Nafula, ML Engineer at Sunbird AI, presented a paper titled How Much Speech Data is Necessary for ASR in African Languages? An Evaluation of Data Scaling in Kinyarwanda and Kikuyu.
She explained how the team conducted Automatic Speech Recognition (ASR) experiments using Kikuyu and Kinyarwanda data with a Whisper model.
She emphasized the importance of noise-free audio data to ensure high quality dataset for model training.
Visit to Sunbird AI and Makere AI Lab
My research focuses on Edge AI for bioacoustic monitoring, aimed at deploying offline inference models for environmental conservation. The camp refined how I think about designing and testing models:
Collaboration, Curiosity, and Shared Discovery
Dr. Joseph Muguro and I had an opportunity to visit Sunbird AI and Makerere AI Lab where we participated in insightful sessions on machine translation using different models such as Qwen and No Language Left Behind (NLLB), Automatic Speech Recognition (ASR) using whisper model which can be fine tuned or extended using adapter modules for specialized task, it also handles punctuation and casing more efficiently than other ASR models.
The session further covered data processing particularly on how to handle tokenization for Kikuyu language which has some special characters, by swapping unnecessary tokens and extending Kikuyu specific tokens.
The sessions also covered human evaluation methodologies as best practices for assessing model performance by involving language experts and linguists.
Additionally, it addressed voice data collection techniques, highlighting the importance of ensuring that speech recordings do not exceed 30 seconds per clip.
This approach helps facilitate more efficient ASR model training, as most pretrained models are not optimized for longer audio clips.
Keeping recordings short also reduces the need for audio trimming, which can sometimes lead to a loss of meaning or context.
The visit provided valuable insights into the practical aspects of building and evaluating language technologies for African languages.
Highlights and Comments
- Learning from experts: we were privileged to learn from fellow AI researchers and experts during the conference.
- Networking: we had an opportunity to connect with fellow researchers, experts, and innovators.
- Experimental learning and knowledge sharing: the visit to Sunbird AI and Makerere AI Lab provided a practical understanding of how AI models are developed, fine-tuned, and evaluated.