Goto

Collaborating Authors

 Ajibade, Benjamin


Mitigating Translationese in Low-resource Languages: The Storyboard Approach

arXiv.org Artificial Intelligence

Low-resource languages often face challenges in acquiring high-quality language data due to the reliance on translation-based methods, which can introduce the translationese effect. This phenomenon results in translated sentences that lack fluency and naturalness in the target language. In this paper, we propose a novel approach for data collection by leveraging storyboards to elicit more fluent and natural sentences. Our method involves presenting native speakers with visual stimuli in the form of storyboards and collecting their descriptions without direct exposure to the source text. We conducted a comprehensive evaluation comparing our storyboard-based approach with traditional text translation-based methods in terms of accuracy and fluency. Human annotators and quantitative metrics were used to assess translation quality. The results indicate a preference for text translation in terms of accuracy, while our method demonstrates worse accuracy but better fluency in the language focused.


Design and Implementation of English To Yor\`ub\'a Verb Phrase Machine Translation System

arXiv.org Artificial Intelligence

Despite the population of speakers, Yorùbá is still considered as a low The advancement in Natural language resource language (for which few language Processing (NLP) can be attributed to recent resources exist), making it very difficult for the improvements in the strategy and techniques of development of more advanced models such as the large data collection, archiving, analysis, and Neural Machine model that requires large volumes visualization. NLP began in the '50s as machine of data. With the number of speakers, translating translation (MT), intended to aid in code-breaking the language to other widely spoken languages was during World War II although the translations were not initially emphasized. However, recent not successful, these early stages of MT were linguistic researchers are taking up the challenges necessary stepping stones on the way to more by giving more attention (as compared to the highresource sophisticated technologies (Zhang, 2018; Quinn, language of the Western World).


BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

arXiv.org Artificial Intelligence

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.


The African Stopwords project: curating stopwords for African languages

arXiv.org Artificial Intelligence

Stopwords are fundamental in Natural Language Processing (NLP) techniques for information retrieval. One of the common tasks in preprocessing of text data is the removal of stopwords. Currently, while high-resource languages like English benefit from the availability of several stopwords, low-resource languages, such as those found in the African continent, have none that are standardized and available for use in NLP packages. Stopwords in the context of African languages are understudied and can reveal information about the crossover between languages. The African Stopwords project aims to study and curate stopwords for African languages. When analysing text data and building various NLP models, stopwords might not add much value to the meaning of the document (Singh, 2019) depending on the NLP task (like text classification).