AITopics | loanword

Collaborating Authors

loanword

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OLaPh: Optimal Language Phonemizer

Wirth, Johannes

arXiv.org Artificial IntelligenceSep-25-2025

Phonemization, the conversion of text into phonemes, is a key step in text-to-speech. Traditional approaches use rule-based transformations and lexicon lookups, while more advanced methods apply preprocessing techniques or neural networks for improved accuracy on out-of-domain vocabulary. However, all systems struggle with names, loanwords, abbreviations, and homographs. This work presents OLaPh (Optimal Language Phonemizer), a framework that combines large lexica, multiple NLP techniques, and compound resolution with a probabilistic scoring function. Evaluations in German and English show improved accuracy over previous approaches, including on a challenging dataset. To further address unresolved cases, we train a large language model on OLaPh-generated data, which achieves even stronger generalization and performance. Together, the framework and LLM improve phonemization consistency and provide a freely available resource for future research.

large language model, machine learning, phonemization, (18 more...)

arXiv.org Artificial Intelligence

2509.20086

Country:

Europe > Netherlands (0.14)
Europe > Czechia (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

Feature-Refined Unsupervised Model for Loanword Detection

Kpoglu, Promise Dodzi

arXiv.org Artificial IntelligenceAug-26-2025

We propose an unsupervised method for detecting loanwords i.e., words borrowed from one language into another. While prior work has primarily relied on language-external information to identify loanwords, such approaches can introduce circularity and constraints into the historical linguistics workflow. In contrast, our model relies solely on language-internal information to process both native and borrowed words in monolingual and multilingual wordlists. By extracting pertinent linguistic features, scoring them, and mapping them probabilistically, we iteratively refine initial results by identifying and generalizing from emerging patterns until convergence. This hybrid approach leverages both linguistic and statistical cues to guide the discovery process. We evaluate our method on the task of isolating loanwords in datasets from six standard Indo-European languages: English, German, French, Italian, Spanish, and Portuguese. Experimental results demonstrate that our model outperforms baseline methods, with strong performance gains observed when scaling to cross-linguistic data.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.17923

Country:

Europe (0.28)
North America (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Statistical analysis of word flow among five Indo-European languages

Molina, Josué Ely, Flores, Jorge, Gershenson, Carlos, Pineda, Carlos

arXiv.org Artificial IntelligenceJan-17-2023

A recent increase in data availability has allowed the possibility to perform different statistical linguistic studies. Here we use the Google Books Ngram dataset to analyze word flow among English, French, German, Italian, and Spanish. We study what we define as ``migrant words'', a type of loanwords that do not change their spelling. We quantify migrant words from one language to another for different decades, and notice that most migrant words can be aggregated in semantic fields and associated to historic events. We also study the statistical properties of accumulated migrant words and their rank dynamics. We propose a measure of use of migrant words that could be used as a proxy of cultural influence. Our methodology is not exempt of caveats, but our results are encouraging to promote further studies in this direction.

artificial intelligence, migrant word, source language, (18 more...)

arXiv.org Artificial Intelligence

2301.06985

Country:

North America > Mexico > Mexico City > Mexico City (0.05)
North America > Panama (0.04)
South America > Peru (0.04)
(11 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Government > Regional Government (0.67)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.49)
Government > Immigration & Customs (0.49)
Government > Military (0.46)

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

The Communities That Live Captioning Leaves Behind

SlateApr-14-2021, 13:00:00 GMT

During our synagogue's Zoom services last week, my family and I found ourselves giggling when we should have been serious. Auto-captions were turned on, and they kept botching the rabbi's Hebrew-laced English. Mourner's Kaddish (memorial prayer) was transcribed as mourner Scottish, and refua shlema (wish for a "full recovery") became with flu wash Emma. Some of the transcriptions bordered on offensive, like when Torah became terrorism and yasher koach (great job!) became wish a cough. We weren't relying on the captions and could laugh at these mistakes.

live captioning leaves, loanword, software, (11 more...)

Slate

Country: North America > United States > Arizona (0.05)

Industry: Information Technology (0.31)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.73)

Add feedback

Cross-Lingual Bridges with Models of Lexical Borrowing

Tsvetkov, Yulia, Dyer, Chris

Journal of Artificial Intelligence ResearchJan-13-2016

Linguistic borrowing is the phenomenon of transferring linguistic constructions (lexical, phonological, morphological, and syntactic) from a donor language to a recipient language as a result of contacts between communities speaking different languages. Borrowed words are found in all languages, andin contrast to cognate relationshipsborrowing relationships may exist across unrelated languages (for example, about 40% of Swahilis vocabulary is borrowed from the unrelated language Arabic). In this work, we develop a model of morpho-phonological transformations across languages. Its features are based on universal constraints from Optimality Theory (OT), and we show that compared to several standardbut linguistically more naïvebaselines, our OT-inspired model obtains good performance at predicting donor forms from borrowed forms with only a few dozen training examples, making this a cost-effective strategy for sharing lexical information across languages. We demonstrate applications of the lexical borrowing model in machine translation, using resource-rich donor language to obtain translations of out-of-vocabulary loanwords in a lower resource language. Our framework obtains substantial improvements (up to 1.6 BLEU) over standard baselines.

constraint, proc, translation, (16 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4786

AI Access Foundation

10975

Journal of Artificial Intelligence Research

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > Finland > Uusimaa > Helsinki (0.04)
Indian Ocean (0.04)
(7 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback