Lexical Complexity Prediction: An Overview

North, Kai, Zampieri, Marcos, Shardlow, Matthew

Mar-8-2023–arXiv.org Artificial Intelligence

Understanding the meaning of words in context is fundamental for reading comprehension. The perceived difficulty, hereafter referred to as complexity, of a target word within a given text varies widely among readers. With an increased demand for distance learning and educational technologies[107], research into automatically predicting which words are likely to cause comprehension problems is becoming a popular area of research [115, 147, 185]. Systems have been created to identify complex words that are difficult to acquire, reproduce, or understand for children [79], second-language learners [89], people suffering from a reading disability, such as dyslexia [131] or aphasia [35, 53], or more generally, individuals with low literacy [59, 175]. In Computational Linguistics and Natural Language Processing (NLP), the task of automatically recognizing complex words is most often achieved by training machine learning (ML) models. These ML models assign a complexity value to each target word within an inputted extract, sentence, or text that allows for the identification of complex words. This information can then be used to improve downstream lexical and text simplification systems that provide simpler alternatives to aid reading comprehension. Take the extract shown in Table 1 for example.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Mar-8-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia (0.04)
- South America
  - Ecuador > Guayas Province
    - Guayaquil (0.04)
  - Brazil > Ceará
    - Fortaleza (0.04)
- North America
  - United States
    - Maryland > Baltimore (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Colorado > Denver County
      - Denver (0.04)
    - Wisconsin
      - Milwaukee County > Milwaukee (0.04)
      - Dane County > Madison (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Oregon > Multnomah County
      - Portland (0.04)
    - New Mexico > Santa Fe County
      - Santa Fe (0.04)
    - California
      - Los Angeles County > Los Angeles (0.14)
      - San Diego County > San Diego (0.05)
    - New York > New York County
      - New York City (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Slovenia (0.04)
  - Bulgaria
    - Varna Province > Varna (0.04)
    - Sofia City Province > Sofia (0.04)
  - Iceland > Capital Region
    - Reykjavik (0.04)
  - Middle East > Malta
    - Port Region > Southern Harbour District > Valletta (0.04)
  - Spain
    - Valencian Community > Valencia Province
      - Valencia (0.04)
    - Catalonia > Barcelona Province
      - Barcelona (0.04)
    - Andalusia > Málaga Province
      - Málaga (0.04)
  - Sweden > Uppsala County
    - Uppsala (0.04)
  - France
    - Île-de-France > Paris
      - Paris (0.04)
    - Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
      - Marseille (0.04)
    - Occitanie > Haute-Garonne
      - Toulouse (0.04)
    - Brittany > Ille-et-Vilaine
      - Rennes (0.04)
  - Italy
    - Tuscany > Florence (0.04)
    - Liguria > Genoa (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
  - United Kingdom > Scotland
    - City of Edinburgh > Edinburgh (0.04)
- Asia
  - Vietnam > Long An Province (0.04)
  - Thailand
    - Bangkok > Bangkok (0.05)
    - Phuket > Phuket (0.04)
  - Taiwan
    - Takao Province > Kaohsiung (0.04)
    - Taiwan Province > Taipei (0.04)
  - Middle East > Qatar
    - Ad-Dawhah > Doha (0.04)
  - Japan > Honshū
    - Kantō > Ibaraki Prefecture
      - Tsukuba (0.04)
    - Kansai > Kyoto Prefecture
      - Kyoto (0.04)
  - India > Maharashtra
    - Mumbai (0.04)
  - China
    - Beijing > Beijing (0.04)
    - Hong Kong (0.04)
- Africa
  - South Africa > Western Cape
    - Cape Town (0.04)
  - Middle East > Algeria
    - Algiers Province > Algiers (0.04)
  - Ethiopia > Addis Ababa
    - Addis Ababa (0.04)

Genre:
- Research Report > New Finding (1.00)
- Overview (1.00)
- Instructional Material (0.87)

Industry:
- Health & Medicine (1.00)
- Education
  - Educational Setting > Online (0.66)
  - Educational Technology > Educational Software
    - Computer Based Training (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Text Processing (1.00)
    - Machine Translation (0.67)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (1.00)
    - Ensemble Learning (0.93)
    - Decision Tree Learning (0.68)
    - Performance Analysis (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found