new study test machine
New study tests machine learning on detection of borrowed words in world languages
Lexical borrowing is very widespread and may affect even those words that play an important role in our daily life. English'mountain', for example, was borrowed from Old French, along with many other words. Researchers from the Pontificia Universidad Católica del Perú and the Max Planck Institute for the Science of Human History have investigated the ability of machine learning algorithms to identify lexical borrowings using word lists from a single language. Results published in the journal PLOS ONE show that current machine-learning methods alone are insufficient for borrowing detection, confirming that additional data and expert knowledge are needed to tackle one of historical linguistics' most pressing challenges. Lexical borrowing, or the direct transfer of words from one language to another, has interested scholars for millennia, as evidenced in Plato's Kratylos dialog, in which Socrates discusses the challenge imposed by borrowed words on etymological studies.
New study tests machine learning on detection of borrowed words in world languages
IMAGE: Lexical borrowing is very widespread and may affect even those words that play an important role in our daily life. English'mountain', for example, was borrowed from Old French, along... view more Lexical borrowing, or the direct transfer of words from one language to another, has interested scholars for millennia, as evidenced already in Plato's Kratylos dialogue, in which Socrates discusses the challenge imposed by borrowed words on etymological studies. In historical linguistics, lexical borrowings help researchers trace the evolution of modern languages and indicate cultural contact between distinct linguistic groups - whether recent or ancient. However, the techniques for identifying borrowed words have resisted formalization, demanding that researchers rely on a variety of proxy information and the comparison of multiple languages. "The automated detection of lexical borrowings is still one of the most difficult tasks we face in computational historical linguistics," says Johann-Mattis List, who led the study. In the current study, researchers from PUCP and MPI-SHH employed different machine learning techniques to train language models that mimic the way in which linguists identify borrowings when considering only the evidence provided by a single language: if sounds or the ways in which sounds combine to form words are atypical when comparing them with other words in the same language, this often hints to recent borrowings.