lenguajenaturalai
The #Somos600M Project: Generating NLP resources that represent the diversity of the languages from LATAM, the Caribbean, and Spain
We are 600 million Spanish speakers. We launched the #Somos600M Project because the diversity of the languages from LATAM, the Caribbean and Spain needs to be represented in Artificial Intelligence (AI) systems. Despite being the 7.5% of the world population, there is no open dataset to instruction-tune large language models (LLMs), nor a leaderboard to evaluate and compare them. In this paper, we present how we have created as an international open-source community the first versions of the instruction and evaluation datasets, indispensable resources for the advancement of Natural Language Processing (NLP) in our languages.
Country:
- South America > Peru (0.14)
- Europe > Spain > Galicia > Madrid (0.05)
- North America > Canada > Ontario > Toronto (0.04)
- (8 more...)
Industry:
- Education (0.94)
- Health & Medicine (0.93)
- Government (0.68)
Technology: