AITopics | emakhuwa

Collaborating Authors

emakhuwa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Expanding FLORES+ Benchmark for more Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation

Ali, Felermino D. M. Antonio, Cardoso, Henrique Lopes, Sousa-Silva, Rui

arXiv.org Artificial IntelligenceAug-21-2024

As part of the Open Language Data Initiative shared tasks, we have expanded the FLORES+ evaluation set to include Emakhuwa, a low-resource language widely spoken in Mozambique. We translated the dev and devtest sets from Portuguese into Emakhuwa, and we detail the translation process and quality assurance measures used. Our methodology involved various quality checks, including post-editing and adequacy assessments. The resulting datasets consist of multiple reference sentences for each source. We present baseline results from training a Neural Machine Translation system and fine-tuning existing multilingual translation models. Our findings suggest that spelling inconsistencies remain a challenge in Emakhuwa. Additionally, the baseline models underperformed on this evaluation set, underscoring the necessity for further research to enhance machine translation quality for Emakhuwa. The data is publicly available at https://huggingface.co/datasets/LIACC/Emakhuwa-FLORES.

artificial intelligence, machine translation, natural language, (15 more...)

arXiv.org Artificial Intelligence

2408.11457

Country:

Europe > Portugal > Porto > Porto (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Towards a parallel corpus of Portuguese and the Bantu language Emakhuwa of Mozambique

Ali, Felermino D. M. A., Caines, Andrew, Malavi, Jaimito L. A.

arXiv.org Artificial IntelligenceApr-12-2021

Major advancement in the performance of machine translation models has been made possible in part thanks to the availability of large-scale parallel corpora. But for most languages in the world, the existence of such corpora is rare. Emakhuwa, a language spoken in Mozambique, is like most African languages low-resource in NLP terms. It lacks both computational and linguistic resources and, to the best of our knowledge, few parallel corpora including Emakhuwa already exist. In this paper we describe the creation of the Emakhuwa-Portuguese parallel corpus, which is a collection of texts from the Jehovah's Witness website and a variety of other sources including the African Story Book website, the Universal Declaration of Human Rights and Mozambican legal documents. The dataset contains 47,415 sentence pairs, amounting to 699,976 word tokens of Emakhuwa and 877,595 word tokens in Portuguese. After normalization processes which remain to be completed, the corpus will be made freely available for research use.

corpus, emakhuwa, translation, (14 more...)

arXiv.org Artificial Intelligence

2104.05753

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
South America > Brazil > São Paulo (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Law (0.56)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback