AITopics

2511.00486

Country: Asia > India (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Yamoah, Kweku Andoh, Weako, Jackson, Dorley, Emmanuel J.

Building a Functional Machine Translation Corpus for Kpelle

arXiv.org Artificial IntelligenceMay-27-2025

In this paper, we introduce the first publicly available English-Kpelle dataset for machine translation, comprising over 2000 sentence pairs drawn from everyday communication, religious texts, and educational materials. By fine-tuning Meta's No Language Left Behind(NLLB) model on two versions of the dataset, we achieved BLEU scores of up to 30 in the Kpelle-to-English direction, demonstrating the benefits of data augmentation. Our findings align with NLLB-200 benchmarks on other African languages, underscoring Kpelle's potential for competitive performance despite its low-resource status. Beyond machine translation, this dataset enables broader NLP tasks, including speech recognition and language modelling. We conclude with a roadmap for future dataset expansion, emphasizing orthographic consistency, community-driven validation, and interdisciplinary collaboration to advance inclusive language technology development for Kpelle and other low-resourced Mande languages.

artificial intelligence, machine translation, natural language, (18 more...)

2505.18905

Country:

North America > United States (1.00)
Africa (1.00)

Genre:

Research Report (0.70)
Instructional Material (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Chirkova, Nadezhda, Liang, Sheng, Nikoulina, Vassilina

Empirical study of pretrained multilingual language models for zero-shot cross-lingual generation

arXiv.org Artificial IntelligenceNov-15-2023

Zero-shot cross-lingual generation assumes finetuning the multilingual pretrained language model (mPLM) on a generation task in one language and then using it to make predictions for this task in other languages. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work, we test alternative mPLMs, such as mBART and NLLB-200, and compare various approaches proposed in the literature in a unified setting. We first underline the importance of tuning learning rate used for finetuning, which helps to substantially alleviate the problem of generation in the wrong language. Then, we show that with careful learning rate tuning, the simple full finetuning of the model acts as a very strong baseline; other competitive approaches include parameter-efficient tuning with adapters and training on several source languages. Finally, we find that mBART performs similarly to mT5 of the same size, and NLLB-200 can be competitive in some cases.

computational linguistic, mbart, target language, (10 more...)

2310.09917

Country:

Africa > Guinea-Bissau (0.14)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(10 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.61)

Gow-Smith, Edward, Villegas, Danae Sánchez

Sheffield's Submission to the AmericasNLP Shared Task on Machine Translation into Indigenous Languages

arXiv.org Artificial IntelligenceJun-16-2023

In this paper we describe the University of Sheffield's submission to the AmericasNLP 2023 Shared Task on Machine Translation into Indigenous Languages which comprises the translation from Spanish to eleven indigenous languages. Our approach consists of extending, training, and ensembling different variations of NLLB-200. We use data provided by the organizers and data from various other sources such as constitutions, handbooks, news articles, and backtranslations generated from monolingual data. On the dev set, our best submission outperforms the baseline by 11% average chrF across all languages, with substantial improvements particularly for Aymara, Guarani and Quechua. On the test set, we achieve the highest average chrF of all the submissions, we rank first in four of the eleven languages, and at least one of our submissions ranks in the top 3 for all languages.

artificial intelligence, natural language, submission, (17 more...)

2306.0983

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
North America > Costa Rica (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
(12 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

#artificialintelligenceAug-19-2022, 13:13:55 GMT

How Meta Is Making Artificial Intelligence More Inclusive

Artificial intelligence (AI) must be inclusive to reach its potential. AI applications that solve problems for a small segment of the population will fail to achieve widespread adoption. So, it's important that AI applications be designed and prepared with data that reflects as many segments of the global population as possible. Many moving parts need to be managed well to do that, and one of them is language. The more languages an AI application can handle, the more inclusive it is.

artificial intelligence, meta, translation, (15 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.53)

#artificialintelligenceJul-21-2022, 16:22:10 GMT

Introduction to No Language Left Behind (NLLB-200)

Meta AI recently open-sourced its massive translation model, No Language Left Behind (NLLB-200), intending to exclude language barriers across the globe. As we know, that machine translation has become a key area of research nowadays, and it has become a great news for many researchers and organisations who can use it for their respective research and work. So let's take a look at the news and understand a bit about NLLB-200 with the below points: No Language Left Behind (NLLB-200) is a model from the series of massive machine translation models of MetaAI for language translation. A newer member of the series NLLB-200 is capable of translating between 200 languages, representing Meta's capacity of Meta in the direction of AI researchers. These development aims to allow people to access, share and use online content in their native languages and communicate across the world regardless of language preferences.

language left, nllb-200, translation, (8 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

#artificialintelligenceJul-17-2022, 14:35:41 GMT

Perceptron: AI that can solve math problems and translate over 200 different languages – TechCrunch

Research in the field of machine learning and AI, now a key technology in practically every industry and company, is far too voluminous for anyone to read it all. This column, Perceptron, aims to collect some of the most relevant recent discoveries and papers -- particularly in, but not limited to, artificial intelligence -- and explain why they matter. In this batch of recent research, Meta open-sourced a language system that it claims is the first capable of translating 200 different languages with "state-of-the-art" results. Not to be outdone, Google detailed a machine learning model, Minerva, that can solve quantitative reasoning problems including mathematical and scientific questions. And Microsoft released a language model, Godel, for generating "realistic" conversations that's along the lines of Google's widely publicized Lamda. And then we have some new text-to-image generators with a twist.

artificial intelligence, machine learning, nllb-200, (14 more...)

Country: Asia > Laos (0.05)

Genre: Research Report > Promising Solution (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

#artificialintelligenceJul-16-2022, 07:30:46 GMT

Behind No language Left Behind

What if you didn't need English to translate? Meta's new and improved open source AI model'NLLB-200' is capable of translating 200 languages without English! "Communicating across languages is one superpower that AI provides, but as we keep advancing our AI work it's improving everything we do--from showing the most interesting content on Facebook and Instagram, to recommending more relevant ads, to keeping our services safe for everyone", says Mark Zuckerberg, CEO, Meta. Accessibility through language ensures that the benefits of the advancement of technology reach everyone, no matter what language they may speak. Tech companies are assuming a proactive role in attempting to bridge this gap.

artificial intelligence, natural language, translation, (18 more...)

Industry: Information Technology > Services (0.56)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

#artificialintelligenceJul-12-2022, 08:15:59 GMT

Meta's AI-based Sphere 'may be the next big break in NLP'

Meta has open-sourced a machine-learning resource that could one day supplant Wikipedia as the world's biggest publicly available knowledge-verification database. Dubbed Sphere, it can be used to perform knowledge-intensive natural language processing, or KI-NLP, we're told. In practical terms, that means it can be used to answer complicated questions using natural language, and find sources for claims. A given example of its use is asking Sphere, "Who is Joëlle Sambi Nzeba?" Wikipedia doesn't have an entry for her, but Sphere said she was "born in Belgium and grew up partly in Kinshasa (Congo). She currently lives in Brussels. She is a writer and slammer, alongside her activism in a feminist movement," and links to a website where it got that information about her work.

big break, meta, wikipedia, (8 more...)

Country:

Europe > Belgium (0.26)
Africa > Democratic Republic of the Congo > Kinshasa Province > Kinshasa (0.26)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

#artificialintelligenceJul-8-2022, 04:50:19 GMT

Meta's NLLB-200 AI model improves translation quality by 44%

Meta has unveiled a new AI model called NLLB-200 that can translate 200 languages and improves quality by an average of 44 percent. Translation apps have been fairly adept at the most popular languages for some time. Even when they don't offer a perfect translation, it's normally close enough for the native speaker to understand. However, there are hundreds of millions of people in regions with many languages – like Africa and Asia – that still suffer from poor translation services. "To help people connect better today and be part of the metaverse of tomorrow, our AI researchers created No Language Left Behind (NLLB), an effort to develop high-quality machine translation capabilities for most of the world's languages. Today, we're announcing an important breakthrough in NLLB: We've built a single AI model called NLLB-200, which translates 200 different languages with results far more accurate than what previous technology could accomplish."

meta, nllb-200, translation, (7 more...)

Country:

Asia (0.26)
Africa (0.26)
North America > United States > California (0.06)
Europe > Netherlands > North Holland > Amsterdam (0.06)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)