Not Every AI Problem is a Data Problem: We Should Be Intentional About Data Scaling

Rodchenko, Tanya, Noy, Natasha, Scherrer, Nino, Prendki, Jennifer

arXiv.org Artificial Intelligence 

For example, translation between languages exhibits regular and persistent patterns at different scales (across sentences, paragraphs, documents). In general, language patterns are stable over time. We know what type of data we need to expand to new languages. And while it may be challenging to acquire the data for rare or only spoken languages, it is easy to judge whether newly acquired data is what we need. In contrast, use cases where data lacks strong, persistent topological features or where the structure is highly fragmented or unstable over time, may not be as well-suited for data scaling approaches.