Lexical Complexity Prediction: An Overview
North, Kai, Zampieri, Marcos, Shardlow, Matthew
–arXiv.org Artificial Intelligence
Understanding the meaning of words in context is fundamental for reading comprehension. The perceived difficulty, hereafter referred to as complexity, of a target word within a given text varies widely among readers. With an increased demand for distance learning and educational technologies[107], research into automatically predicting which words are likely to cause comprehension problems is becoming a popular area of research [115, 147, 185]. Systems have been created to identify complex words that are difficult to acquire, reproduce, or understand for children [79], second-language learners [89], people suffering from a reading disability, such as dyslexia [131] or aphasia [35, 53], or more generally, individuals with low literacy [59, 175]. In Computational Linguistics and Natural Language Processing (NLP), the task of automatically recognizing complex words is most often achieved by training machine learning (ML) models. These ML models assign a complexity value to each target word within an inputted extract, sentence, or text that allows for the identification of complex words. This information can then be used to improve downstream lexical and text simplification systems that provide simpler alternatives to aid reading comprehension. Take the extract shown in Table 1 for example.
arXiv.org Artificial Intelligence
Mar-8-2023
- Country:
- Oceania > Australia (0.04)
- South America
- North America
- United States
- Maryland > Baltimore (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Colorado > Denver County
- Denver (0.04)
- Wisconsin
- Milwaukee County > Milwaukee (0.04)
- Dane County > Madison (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- California
- Los Angeles County > Los Angeles (0.14)
- San Diego County > San Diego (0.05)
- New York > New York County
- New York City (0.04)
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe
- Slovenia (0.04)
- Bulgaria
- Varna Province > Varna (0.04)
- Sofia City Province > Sofia (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Middle East > Malta
- Port Region > Southern Harbour District > Valletta (0.04)
- Spain
- Valencian Community > Valencia Province
- Valencia (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Andalusia > Málaga Province
- Málaga (0.04)
- Valencian Community > Valencia Province
- Sweden > Uppsala County
- Uppsala (0.04)
- France
- Île-de-France > Paris
- Paris (0.04)
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Marseille (0.04)
- Occitanie > Haute-Garonne
- Toulouse (0.04)
- Brittany > Ille-et-Vilaine
- Rennes (0.04)
- Île-de-France > Paris
- Italy
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- United Kingdom > Scotland
- City of Edinburgh > Edinburgh (0.04)
- Asia
- Vietnam > Long An Province (0.04)
- Thailand
- Taiwan
- Takao Province > Kaohsiung (0.04)
- Taiwan Province > Taipei (0.04)
- Middle East > Qatar
- Japan > Honshū
- Kantō > Ibaraki Prefecture
- Tsukuba (0.04)
- Kansai > Kyoto Prefecture
- Kyoto (0.04)
- Kantō > Ibaraki Prefecture
- India > Maharashtra
- Mumbai (0.04)
- China
- Africa
- South Africa > Western Cape
- Cape Town (0.04)
- Middle East > Algeria
- Algiers Province > Algiers (0.04)
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- South Africa > Western Cape
- Genre:
- Research Report > New Finding (1.00)
- Overview (1.00)
- Instructional Material (0.87)
- Industry:
- Technology:
- Information Technology > Artificial Intelligence
- Natural Language
- Text Processing (1.00)
- Machine Translation (0.67)
- Machine Learning
- Statistical Learning (1.00)
- Neural Networks > Deep Learning (1.00)
- Ensemble Learning (0.93)
- Decision Tree Learning (0.68)
- Performance Analysis (0.67)
- Natural Language
- Information Technology > Artificial Intelligence