MultiLS-SP/CA: Lexical Complexity Prediction and Lexical Simplification Resources for Catalan and Spanish
Bott, Stefan, Saggion, Horacio, Rojas, Nelson Peréz, Salazar, Martin Solis, Ramirez, Saul Calderon
–arXiv.org Artificial Intelligence
Automatic lexical simplification is a task to substitute lexical items that may be unfamiliar and difficult to understand with easier and more common words. This paper presents MultiLS-SP/CA, a novel dataset for lexical simplification in Spanish and Catalan. This dataset represents the first of its kind in Catalan and a substantial addition to the sparse data on automatic lexical simplification which is available for Spanish. Specifically, MultiLS-SP is the first dataset for Spanish which includes scalar ratings of the understanding difficulty of lexical items. In addition, we describe experiments with this dataset, which can serve as a baseline for future work on the same data.
arXiv.org Artificial Intelligence
Apr-11-2024
- Country:
- Asia
- China > Beijing
- Beijing (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Russia (0.04)
- China > Beijing
- Europe
- Bulgaria > Varna Province
- Varna (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Russia > Northwestern Federal District
- Leningrad Oblast > Saint Petersburg (0.04)
- Spain
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Valencian Community > Valencia Province
- Valencia (0.04)
- Catalonia > Barcelona Province
- Sweden > Östergötland County
- Linköping (0.04)
- Bulgaria > Varna Province
- North America
- Costa Rica > Cartago Province
- Cartago (0.04)
- United States
- Maryland (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Costa Rica > Cartago Province
- Asia
- Genre:
- Research Report (0.40)
- Industry:
- Education (0.46)
- Government (0.67)
- Technology: