The #Somos600M Project: Generating NLP resources that represent the diversity of the languages from LATAM, the Caribbean, and Spain
–arXiv.org Artificial Intelligence
We are 600 million Spanish speakers. We launched the #Somos600M Project because the diversity of the languages from LATAM, the Caribbean and Spain needs to be represented in Artificial Intelligence (AI) systems. Despite being the 7.5% of the world population, there is no open dataset to instruction-tune large language models (LLMs), nor a leaderboard to evaluate and compare them. In this paper, we present how we have created as an international open-source community the first versions of the instruction and evaluation datasets, indispensable resources for the advancement of Natural Language Processing (NLP) in our languages.
arXiv.org Artificial Intelligence
Jul-1-2024
- Country:
- Asia > Middle East
- UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Europe
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Slovenia (0.04)
- Spain > Galicia
- Madrid (0.05)
- Croatia > Dubrovnik-Neretva County
- North America
- Canada > Ontario
- Toronto (0.04)
- Central America (0.04)
- Montserrat (0.04)
- United States > New Mexico
- Santa Fe County > Santa Fe (0.04)
- Canada > Ontario
- South America
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Peru (0.14)
- Chile > Santiago Metropolitan Region
- Asia > Middle East
- Genre:
- Research Report (0.40)
- Industry:
- Education (0.94)
- Government (0.68)
- Health & Medicine (0.93)
- Technology: