TUNUH: Multilingual Health Large Language Model for Low Resource African Languages

By: Mary Kariuki

Background

Large Language Models (LLMs) are advanced artificial intelligence systems trained on vast amounts of data to understand and generate human-like language. They play a crucial role in various Natural Language Processing (NLP) applications, such as machine translation, text generation, speech recognition, and summarization.

However, most existing LLMs are primarily trained on high-resource languages, leaving low-resource languages underrepresented. This imbalance creates a major barrier to digital inclusion, particularly for underserved communities, as speakers of local languages are often excluded from benefiting fully from technological advancements.

This project aims to address this gap by developing a multilingual health LLM tailored for diabetes in low-resource African languages. It seeks to enhance healthcare accessibility by enabling patients to access reliable health information, communicate effectively with healthcare providers, and receive care in their native languages.

Accomplishments

The Tunuh model — a multilingual health LLM with speech-to-text and translation capabilities fine-tuned on Kikuyu
The Tunuh model: an automatic speech recognition system with speech-to-text and translation capabilities

Next Steps

Data collection for diabetes text and speech data to support the fine-tuning of the model into a more specialized, domain specific diabetes health system.

Publications