AITopics | creole

Collaborating Authors

creole

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Kr\'eyoLID From Language Identification Towards Language Mining

Dent, Rasul, Suarez, Pedro Ortiz, Clérice, Thibault, Sagot, Benoît

arXiv.org Artificial IntelligenceMar-9-2025

Automatic language identification is frequently framed as a multi-class classification problem. However, when creating digital corpora for less commonly written languages, it may be more appropriate to consider it a data mining problem. For these varieties, one knows ahead of time that the vast majority of documents are of little interest. By minimizing resources spent on classifying such documents, we can create corpora much faster and with better coverage than using established pipelines. To demonstrate the effectiveness of the language mining perspective, we introduce a new pipeline and corpora for several French-based Creoles.

corpora, creole, threshold, (13 more...)

arXiv.org Artificial Intelligence

2503.06547

Country:

Indian Ocean (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(19 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Moly\'e: A Corpus-based Approach to Language Contact in Colonial France

Dent, Rasul, Janès, Juliette, Clérice, Thibault, Suarez, Pedro Ortiz, Sagot, Benoît

arXiv.org Artificial IntelligenceAug-8-2024

Whether or not several Creole languages which developed during the early modern period can be considered genetic descendants of European languages has been the subject of intense debate. This is in large part due to the absence of evidence of intermediate forms. This work introduces a new open corpus, the Moly\'e corpus, which combines stereotypical representations of three kinds of language variation in Europe with early attestations of French-based Creole languages across a period of 400 years. It is intended to facilitate future research on the continuity between contact situations in Europe and Creolophone (former) colonies.

corpus, creole, pronoun, (15 more...)

arXiv.org Artificial Intelligence

2408.04554

Country:

Africa > Mauritius (0.05)
South America > French Guiana (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(17 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Guylingo: The Republic of Guyana Creole Corpora

Clarke, Christopher, Daynauth, Roland, Wilkinson, Charlene, Devonish, Hubert, Mars, Jason

arXiv.org Artificial IntelligenceJul-2-2024

While major languages often enjoy substantial attention and resources, the linguistic diversity across the globe encompasses a multitude of smaller, indigenous, and regional languages that lack the same level of computational support. One such region is the Caribbean. While commonly labeled as "English speaking", the ex-British Caribbean region consists of a myriad of Creole languages thriving alongside English. In this paper, we present Guylingo: a comprehensive corpus designed for advancing NLP research in the domain of Creolese (Guyanese English-lexicon Creole), the most widely spoken language in the culturally rich nation of Guyana. We first outline our framework for gathering and digitizing this diverse corpus, inclusive of colloquial expressions, idioms, and regional variations in a low-resource language. We then demonstrate the challenges of training and evaluating NLP models for machine translation in Creole. Lastly, we discuss the unique opportunities presented by recent NLP advancements for accelerating the formal adoption of Creole languages as official languages in the Caribbean.

creole, creole language, translation, (15 more...)

arXiv.org Artificial Intelligence

2405.03832

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Atlantic Ocean > South Atlantic Ocean > Gulf of Guinea (0.04)
Africa > Gulf of Guinea (0.04)
(10 more...)

Genre: Research Report (0.50)

Industry:

Education (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

CreoleVal: Multilingual Multitask Benchmarks for Creoles

Lent, Heather, Tatariya, Kushal, Dabre, Raj, Chen, Yiyi, Fekete, Marcell, Ploeger, Esther, Zhou, Li, Heje, Hans Erik, Kanojia, Diptesh, Belony, Paul, Bollmann, Marcel, Grobol, Loïc, de Lhoneux, Miryam, Hershcovich, Daniel, DeGraff, Michel, Søgaard, Anders, Bjerva, Johannes

arXiv.org Artificial IntelligenceOct-30-2023

Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and other highly-resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data. In this work we present CreoleVal, a collection of benchmark datasets spanning 8 different NLP tasks, covering up to 28 Creole languages; it is an aggregate of brand new development datasets for machine comprehension, relation classification, and machine translation for Creoles, in addition to a practical gateway to a handful of preexisting benchmarks. For each benchmark, we conduct baseline experiments in a zero-shot setting in order to further ascertain the capabilities and limitations of transfer learning for Creoles. Ultimately, the goal of CreoleVal is to empower research on Creoles in NLP and computational linguistics. We hope this resource will contribute to technological inclusion for Creole language users around the globe.

computational linguistic, creole, translation, (15 more...)

arXiv.org Artificial Intelligence

2310.19567

Country:

North America > Haiti (0.29)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(28 more...)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AfroLID: A Neural Language Identification Tool for African Languages

Adebara, Ife, Elmadany, AbdelRahim, Abdul-Mageed, Muhammad, Inciarte, Alcides Alcoba

arXiv.org Artificial IntelligenceDec-6-2022

Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world's 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural LID toolkit for $517$ African languages and varieties. AfroLID exploits a multi-domain web dataset manually curated from across 14 language families utilizing five orthographic systems. When evaluated on our blind Test set, AfroLID achieves 95.89 F_1-score. We also compare AfroLID to five existing LID tools that each cover a small number of African languages, finding it to outperform them on most languages. We further show the utility of AfroLID in the wild by testing it on the acutely under-served Twitter domain. Finally, we offer a number of controlled case studies and perform a linguistically-motivated error analysis that allow us to both showcase AfroLID's powerful capabilities and limitations.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2210.11744

Country:

Africa > South Africa (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(32 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)

Add feedback

JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

Armstrong, Ruth-Ann, Hewitt, John, Manning, Christopher

arXiv.org Artificial IntelligenceDec-6-2022

JamPatoisNLI provides the first dataset for natural language inference in a creole language, Jamaican Patois. Many of the most-spoken low-resource languages are creoles. These languages commonly have a lexicon derived from a major world language and a distinctive grammar reflecting the languages of the original speakers and the process of language birth by creolization. This gives them a distinctive place in exploring the effectiveness of transfer from large monolingual or multilingual pretrained models. While our work, along with previous work, shows that transfer from these models to low-resource languages that are unrelated to languages in their training set is not very effective, we would expect stronger results from transfer to creoles. Indeed, our experiments show considerably better results from few-shot learning of JamPatoisNLI than for such unrelated languages, and help us begin to understand how the unique relationship between creoles and their high-resource base languages affect cross-lingual transfer. JamPatoisNLI, which consists of naturally-occurring premises and expert-written hypotheses, is a step towards steering research into a traditionally underserved language and a useful benchmark for understanding cross-lingual NLP.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.03419

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Africa > Nigeria > Jigawa State > Dutse (0.04)
Pacific Ocean (0.04)
(9 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback