Extreme Machine Learning. Prediction of Catalan native speakers by administrative level

#artificialintelligence 

This exercise represents an intended worst-case scenario for machine learning, as my aim was to test the results on a scarce dataset (n 52). In the Catalan linguistic area in Spain there is no census data about native speakers. Indeed, there is no real useful data about Catalan language from a data science perspective. I focus my attention in this case because until 1950, virtually 100% of the population of the Catalan linguistic area was Catalan native speaker, but from 1950 until now, population has doubled by immigration, that in addition to demographic and political reasons, resulted in Catalan as a minority language in most of its former linguistic area. Currently, we don't have extensive quantitative data to know the real situation or evolution of Catalan language in its territory.