AITopics | Palen-Michel, Chester

Collaborating Authors

Palen-Michel, Chester

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages

Palen-Michel, Chester, Pickering, Maxwell, Kruse, Maya, Sälevä, Jonne, Lignos, Constantine

arXiv.org Artificial IntelligenceDec-12-2024

We present OpenNER 1.0, a standardized collection of openly available named entity recognition (NER) datasets. OpenNER contains 34 datasets spanning 51 languages, annotated in varying named entity ontologies. We correct annotation format issues, standardize the original datasets into a uniform representation, map entity type names to be more consistent across corpora, and provide the collection in a structure that enables research in multilingual and multi-ontology NER. We provide baseline models using three pretrained multilingual language models to compare the performance of recent models and facilitate future research in NER.

artificial intelligence, natural language, text processing, (14 more...)

arXiv.org Artificial Intelligence

2412.09587

Country:

Europe (1.00)
Asia (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

QueryNER: Segmentation of E-commerce Queries

Palen-Michel, Chester, Liang, Lizzie, Wu, Zhe, Lignos, Constantine

arXiv.org Artificial IntelligenceMay-15-2024

Prior work in sequence labeling for e-commerce has largely addressed aspect-value extraction which focuses on extracting portions of a product title or query for narrowly defined aspects. Our work instead focuses on the goal of dividing a query into meaningful chunks with broadly applicable types. We report baseline tagging results and conduct experiments comparing token and entity dropping for null and low recall query recovery. Challenging test sets are created using automatic transformations and show how simple data augmentation techniques can make the models more robust to noise. We make the QueryNER dataset publicly available.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.09507

Country:

North America > United States > New Mexico (0.14)
North America > United States > Minnesota (0.14)
North America > United States > Colorado (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Leisure & Entertainment (0.93)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Add feedback

LR-Sum: Summarization for Less-Resourced Languages

Palen-Michel, Chester, Lignos, Constantine

arXiv.org Artificial IntelligenceOct-26-2023

This preprint describes work in progress on LR-Sum, a new permissively-licensed dataset created with the goal of enabling further research in automatic summarization for less-resourced languages. LR-Sum contains human-written summaries for 40 languages, many of which are less-resourced. We describe our process for extracting and filtering the dataset from the Multilingual Open Text corpus (Palen-Michel et al., 2022). The source data is public domain newswire collected from from Voice of America websites, and LR-Sum is released under a Creative Commons license (CC BY 4.0), making it one of the most openly-licensed multilingual summarization datasets. We describe how we plan to use the data for modeling experiments and discuss limitations of the dataset.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.09674

Country:

Europe (1.00)
Asia (1.00)
Africa (1.00)
North America > United States > Louisiana (0.14)

Genre: Research Report (0.40)

Industry:

Information Technology (0.66)
Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

Adelani, David Ifeoluwa, Neubig, Graham, Ruder, Sebastian, Rijhwani, Shruti, Beukman, Michael, Palen-Michel, Chester, Lignos, Constantine, Alabi, Jesujoba O., Muhammad, Shamsuddeen H., Nabende, Peter, Dione, Cheikh M. Bamba, Bukula, Andiswa, Mabuya, Rooweither, Dossou, Bonaventure F. P., Sibanda, Blessing, Buzaaba, Happy, Mukiibi, Jonathan, Kalipe, Godson, Mbaye, Derguene, Taylor, Amelia, Kabore, Fatoumata, Emezue, Chris Chinenye, Aremu, Anuoluwapo, Ogayo, Perez, Gitau, Catherine, Munkoh-Buabeng, Edwin, Koagne, Victoire M., Tapo, Allahsera Auguste, Macucwa, Tebogo, Marivate, Vukosi, Mboning, Elvis, Gwadabe, Tajuddeen, Adewumi, Tosin, Ahia, Orevaoghene, Nakatumba-Nabende, Joyce, Mokono, Neo L., Ezeani, Ignatius, Chukwuneke, Chiamaka, Adeyemi, Mofetoluwa, Hacheme, Gilles Q., Abdulmumin, Idris, Ogundepo, Odunayo, Yousuf, Oreen, Ngoli, Tatiana Moteu, Klakow, Dietrich

arXiv.org Artificial IntelligenceNov-15-2022

African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages.

computational linguistic, information retrieval, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.12391

Country:

Europe (1.00)
Asia (1.00)
Africa (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

MasakhaNER: Named Entity Recognition for African Languages

Adelani, David Ifeoluwa, Abbott, Jade, Neubig, Graham, D'souza, Daniel, Kreutzer, Julia, Lignos, Constantine, Palen-Michel, Chester, Buzaaba, Happy, Rijhwani, Shruti, Ruder, Sebastian, Mayhew, Stephen, Azime, Israel Abebe, Muhammad, Shamsuddeen, Emezue, Chris Chinenye, Nakatumba-Nabende, Joyce, Ogayo, Perez, Aremu, Anuoluwapo, Gitau, Catherine, Mbaye, Derguene, Alabi, Jesujoba, Yimam, Seid Muhie, Gwadabe, Tajuddeen, Ezeani, Ignatius, Niyongabo, Rubungo Andre, Mukiibi, Jonathan, Otiende, Verrah, Orife, Iroro, David, Davis, Ngom, Samba, Adewumi, Tosin, Rayson, Paul, Adeyemi, Mofetoluwa, Muriuki, Gerald, Anebi, Emmanuel, Chukwuneke, Chiamaka, Odu, Nkiruka, Wairagala, Eric Peter, Oyerinde, Samuel, Siro, Clemencia, Bateesa, Tobius Saul, Oloyede, Temilola, Wambui, Yvonne, Akinode, Victor, Nabagereka, Deborah, Katusiime, Maurice, Awokoya, Ayodele, MBOUP, Mouhamadane, Gebreyohannes, Dibora, Tilaye, Henok, Nwaike, Kelechi, Wolde, Degaga, Faye, Abdoulaye, Sibanda, Blessing, Ahia, Orevaoghene, Dossou, Bonaventure F. P., Ogueji, Kelechi, DIOP, Thierno Ibrahima, Diallo, Abdoulaye, Akinfaderin, Adewale, Marengereke, Tendai, Osei, Salomey

arXiv.org Artificial IntelligenceMar-22-2021

We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders. We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER. We analyze our datasets and conduct an extensive empirical evaluation of state-of-the-art methods across both supervised and transfer learning settings. We release the data, code, and models in order to inspire future research on African NLP.

dataset, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

2103.11811

Country:

Asia (1.00)
Africa > Nigeria (0.94)
Europe > Germany (0.68)
(2 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback