indigenous community
Exploring Multimodal Foundation AI and Expert-in-the-Loop for Sustainable Management of Wild Salmon Fisheries in Indigenous Rivers
Xu, Chi, Jin, Yili, Ma, Sami, Qian, Rongsheng, Fang, Hao, Liu, Jiangchuan, Liu, Xue, Ngai, Edith C. H., Atlas, William I., Connors, Katrina M., Spoljaric, Mark A.
Wild salmon are essential to the ecological, economic, and cultural sustainability of the North Pacific Rim. Y et climate variability, habitat loss, and data limitations in remote ecosystems that lack basic infrastructure support pose significant challenges to effective fisheries management. This project explores the integration of multimodal foundation AI and expert-in-the-loop frameworks to enhance wild salmon monitoring and sustainable fisheries management in Indigenous rivers across Pacific Northwest. By leveraging video and sonar-based monitoring, we develop AI-powered tools for automated species identification, counting, and length measurement, reducing manual effort, expediting delivery of results, and improving decision-making accuracy. Expert validation and active learning frameworks ensure ecological relevance while reducing annotation burdens. To address unique technical and societal challenges, we bring together a cross-domain, interdisciplinary team of university researchers, fisheries biologists, Indigenous stewardship practitioners, government agencies, and conservation organizations. Through these collaborations, our research fosters ethical AI co-development, open data sharing, and culturally informed fisheries management.
Designing Speech Technologies for Australian Aboriginal English: Opportunities, Risks and Participation
Hutchinson, Ben, Louro, Celeste Rodríguez, Collard, Glenys, Cooper, Ned
In Australia, post-contact language varieties, including creoles and local varieties of international languages, emerged as a result of forced contact between Indigenous communities and English speakers. These contact varieties are widely used, yet are poorly supported by language technologies. This gap presents barriers to participation in civil and economic society for Indigenous communities using these varieties, and reproduces minoritisation of contemporary Indigenous sociolinguistic identities. This paper concerns three questions regarding this context. First, can speech technologies support speakers of Australian Aboriginal English, a local indigenised variety of English? Second, what risks are inherent in such a project? Third, what technology development practices are appropriate for this context, and how can researchers integrate meaningful community participation in order to mitigate risks? We argue that opportunities do exist -- as well as risks -- and demonstrate this through a case study exploring design practices in a real-world project aiming to improve speech technologies for Australian Aboriginal English. We discuss how we integrated culturally appropriate and participatory processes throughout the project. We call for increased support for languages used by Indigenous communities, including contact varieties, which provide practical economic and socio-cultural benefits, provided that participatory and culturally safe practices are enacted.
Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and Maliseet
Wang, Shenran, Yang, Changbing, Parkhill, Mike, Quinn, Chad, Hammerly, Christopher, Zhu, Jian
We present lightweight flow matching multilingual text-to-speech (TTS) systems for Ojibwe, Mi'kmaq, and Maliseet, three Indigenous languages in North America. Our results show that training a multilingual TTS model on three typologically similar languages can improve the performance over monolingual models, especially when data are scarce. Attention-free architectures are highly competitive with self-attention architecture with higher memory efficiency. Our research not only advances technical development for the revitalization of low-resource languages but also highlights the cultural gap in human evaluation protocols, calling for a more community-centered approach to human evaluation.
Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences
Pinhanez, Claudio, Cavalin, Paulo, Storto, Luciana, Fimbow, Thomas, Cobbinah, Alexander, Nogima, Julio, Vasconcelos, Marisa, Domingues, Pedro, Mizukami, Priscila de Souza, Grell, Nicole, Gongora, Majoí, Gonçalves, Isabel
Since 2022 we have been exploring application areas and technologies in which Artificial Intelligence (AI) and modern Natural Language Processing (NLP), such as Large Language Models (LLMs), can be employed to foster the usage and facilitate the documentation of Indigenous languages which are in danger of disappearing. We start by discussing the decreasing diversity of languages in the world and how working with Indigenous languages poses unique ethical challenges for AI and NLP. To address those challenges, we propose an alternative development AI cycle based on community engagement and usage. Then, we report encouraging results in the development of high-quality machine learning translators for Indigenous languages by fine-tuning state-of-the-art (SOTA) translators with tiny amounts of data and discuss how to avoid some common pitfalls in the process. We also present prototypes we have built in projects done in 2023 and 2024 with Indigenous communities in Brazil, aimed at facilitating writing, and discuss the development of Indigenous Language Models (ILMs) as a replicable and scalable way to create spell-checkers, next-word predictors, and similar tools. Finally, we discuss how we envision a future for language documentation where dying languages are preserved as interactive language models.
NLP Progress in Indigenous Latin American Languages
Tonja, Atnafu Lambebo, Balouchzahi, Fazlourrahman, Butt, Sabur, Kolesnikova, Olga, Ceballos, Hector, Gelbukh, Alexander, Solorio, Thamar
The paper focuses on the marginalization of indigenous language communities in the face of rapid technological advancements. We highlight the cultural richness of these languages and the risk they face of being overlooked in the realm of Natural Language Processing (NLP). We aim to bridge the gap between these communities and researchers, emphasizing the need for inclusive technological advancements that respect indigenous community perspectives. We show the NLP progress of indigenous Latin American languages and the survey that covers the status of indigenous languages in Latin America, their representation in NLP, and the challenges and innovations required for their preservation and development. The paper contributes to the current literature in understanding the need and progress of NLP for indigenous communities of Latin America, specifically low-resource and indigenous communities in general.
"It's how you do things that matters": Attending to Process to Better Serve Indigenous Communities with Language Technologies
Cooper, Ned, Heldreth, Courtney, Hutchinson, Ben
Indigenous languages are historically under-served by Natural Language Processing (NLP) technologies, but this is changing for some languages with the recent scaling of large multilingual models and an increased focus by the NLP community on endangered languages. This position paper explores ethical considerations in building NLP technologies for Indigenous languages, based on the premise that such projects should primarily serve Indigenous communities. We report on interviews with 17 researchers working in or with Aboriginal and/or Torres Strait Islander communities on language technology projects in Australia. Drawing on insights from the interviews, we recommend practices for NLP researchers to increase attention to the process of engagements with Indigenous communities, rather than focusing only on decontextualised artefacts.
How AI can help forecast how much Arctic sea ice will shrink
In the next week or so, the sea ice floating atop the Arctic Ocean will shrink to its smallest size this year, as summer-warmed waters eat away at the ice's submerged edges. Record lows for sea ice levels will probably not be broken this year, scientists say. In 2020, the ice covered 3.74 million square kilometers of the Arctic at its lowest point, coming nail-bitingly close to an all-time record low. Currently, sea ice is present in just under 5 million square kilometers of Arctic waters, putting it on track to become the 10th-lowest extent of sea ice in the area since satellite record keeping began in 1979. It's an unexpected finish considering that in early summer, sea ice hit a record low for that time of year. The surprise comes in part because the best current statistical- and physics-based forecasting tools can closely predict sea ice extent only a few weeks in advance, but the accuracy of long-range forecasts falters.
Māori are trying to save their language from Big Tech
In March 2018, Peter-Lucas Jones and the ten other staff at Te Hiku Media, a small non-profit radio station nestled just below New Zealand's most northern tip, were in disbelief. In ten days, thanks to a competition it had started, Māori speakers across New Zealand had recorded over 300 hours of annotated audio in their mother tongue. It was enough data to build language tech for te reo Māori, the Māori language – including automatic speech recognition and speech-to-text. The small staff of Māori language broadcasters and one engineer were about to become pioneers in indigenous speech recognition technology. But building the tools was only half the battle. Te Hiku soon found itself fending off corporate entities trying to develop their own indigenous data sets and resisting detrimental western approaches to data sharing.