Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP
Bakagianni, Juli, Pouli, Kanella, Gavriilidou, Maria, Pavlopoulos, John
–arXiv.org Artificial Intelligence
Natural Language Processing (NLP) research has traditionally been predominantly focused on English, driven by the availability of resources, the size of the research community, and market demands. Recently, there has been a noticeable shift towards multilingualism in NLP, recognizing the need for inclusivity and effectiveness across diverse languages and cultures. Monolingual surveys have the potential to complement the broader trend towards multilingualism in NLP by providing foundational insights and resources necessary for effectively addressing the linguistic diversity of global communication. However, monolingual NLP surveys are extremely rare in literature. This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys. Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks. We include a classification of Language Resources (LRs), according to their availability, and datasets, according to their annotation, to highlight publicly-available and machine-actionable LRs. By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022, providing a comprehensive overview of the current state and challenges of Greek NLP research. We discuss the progress of Greek NLP and outline encountered Greek LRs, classified by availability and usability. As we show, our proposed method helps avoid common pitfalls, such as data leakage and contamination, and to assess language support per NLP task. We consider this systematic literature review of Greek NLP an application of our method that showcases the benefits of a monolingual NLP survey. Similar applications could be regard the myriads of languages whose progress in NLP lags behind that of well-supported languages.
arXiv.org Artificial Intelligence
Jul-13-2024
- Country:
- North America
- Dominican Republic (0.04)
- Cuba (0.04)
- United States
- Pennsylvania (0.04)
- Washington > King County
- Seattle (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Los Angeles County
- Los Angeles (0.04)
- Europe
- Slovenia (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Hertfordshire > Hatfield (0.04)
- Cambridgeshire > Cambridge (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- France
- Greece
- Epirus > Ioannina (0.04)
- Central Macedonia > Thessaloniki (0.04)
- Middle East
- Germany > North Rhine-Westphalia
- Cologne Region > Cologne (0.04)
- Sweden > Vaestra Goetaland
- Gothenburg (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Singapore (0.04)
- Japan (0.04)
- Middle East > Yemen
- Amanat Al Asimah > Sanaa (0.04)
- China > Inner Mongolia
- Hohhot (0.04)
- North America
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.67)
- Industry:
- Law (1.00)
- Information Technology (1.00)
- Government > Regional Government (0.92)
- Media > News (0.67)
- Law Enforcement & Public Safety > Terrorism (0.67)
- Banking & Finance (0.67)
- Health & Medicine > Therapeutic Area
- Immunology (0.67)
- Education
- Curriculum > Subject-Specific Education (0.92)
- Educational Setting (0.67)
- Educational Technology (0.67)
- Technology:
- Information Technology > Artificial Intelligence
- Representation & Reasoning > Semantic Networks (0.67)
- Natural Language
- Text Processing (1.00)
- Machine Translation (1.00)
- Large Language Model (1.00)
- Information Extraction (1.00)
- Grammars & Parsing (1.00)
- Discourse & Dialogue (1.00)
- Chatbot (1.00)
- Text Classification (0.67)
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Statistical Learning (0.92)
- Information Technology > Artificial Intelligence