Goto

Collaborating Authors

 mundari


AdiBhashaa: A Community-Curated Benchmark for Machine Translation into Indian Tribal Languages

Singh, Pooja, Kumar, Sandeep

arXiv.org Artificial Intelligence

Large language models and multilingual machine translation (MT) systems increasingly drive access to information, yet many languages of the tribal communities remain effectively invisible in these technologies. This invisibility exacerbates existing structural inequities in education, governance, and digital participation. We present AdiBhashaa, a community-driven initiative that constructs the first open parallel corpora and baseline MT systems for four major Indian tribal languages-Bhili, Mundari, Gondi, and Santali. This work combines participatory data creation with native speakers, human-in-the-loop validation, and systematic evaluation of both encoder-decoder MT models and large language models. In addition to reporting technical findings, we articulate how AdiBhashaa illustrates a possible model for more equitable AI research: it centers local expertise, builds capacity among early-career researchers from marginalized communities, and foregrounds human validation in the development of language technologies.


MunTTS: A Text-to-Speech System for Mundari

Gumma, Varun, Hada, Rishav, Yadavalli, Aditya, Gogoi, Pamir, Mondal, Ishani, Seshadri, Vivek, Bali, Kalika

arXiv.org Artificial Intelligence

We present MunTTS, an end-to-end text-to-speech (TTS) system specifically for Mundari, a low-resource Indian language of the Austo-Asiatic family. Our work addresses the gap in linguistic technology for underrepresented languages by collecting and processing data to build a speech synthesis system. We begin our study by gathering a substantial dataset of Mundari text and speech and train end-to-end speech models. We also delve into the methods used for training our models, ensuring they are efficient and effective despite the data constraints. We evaluate our system with native speakers and objective metrics, demonstrating its potential as a tool for preserving and promoting the Mundari language in the digital age.