AITopics | english and german

Collaborating Authors

english and german

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OLaPh: Optimal Language Phonemizer

Wirth, Johannes

arXiv.org Artificial IntelligenceSep-25-2025

Phonemization, the conversion of text into phonemes, is a key step in text-to-speech. Traditional approaches use rule-based transformations and lexicon lookups, while more advanced methods apply preprocessing techniques or neural networks for improved accuracy on out-of-domain vocabulary. However, all systems struggle with names, loanwords, abbreviations, and homographs. This work presents OLaPh (Optimal Language Phonemizer), a framework that combines large lexica, multiple NLP techniques, and compound resolution with a probabilistic scoring function. Evaluations in German and English show improved accuracy over previous approaches, including on a challenging dataset. To further address unresolved cases, we train a large language model on OLaPh-generated data, which achieves even stronger generalization and performance. Together, the framework and LLM improve phonemization consistency and provide a freely available resource for future research.

large language model, machine learning, phonemization, (18 more...)

arXiv.org Artificial Intelligence

2509.20086

Country:

Europe > Netherlands (0.14)
Europe > Czechia (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

Sampling the Swadesh List to Identify Similar Languages with Tree Spaces

Ordway, Garett, Patrangenaru, Vic

arXiv.org Artificial IntelligenceMay-10-2024

Communication plays a vital role in human interaction. Studying language is a worthwhile task and more recently has become quantitative in nature with developments of fields like quantitative comparative linguistics and lexicostatistics. With respect to the authors own native languages, the ancestry of the English language and the Latin alphabet are of the primary interest. The Indo-European Tree traces many modern languages back to the Proto-Indo-European root. Swadesh's cognates played a large role in developing that historical perspective where some of the primary branches are Germanic, Celtic, Italic, and Balto-Slavic. This paper will use data analysis on open books where the simplest singular space is the 3-spider - a union T3 of three rays with their endpoints glued at a point 0 - which can represent these tree spaces for language clustering. These trees are built using a single linkage method for clustering based on distances between samples from languages which use the Latin Script. Taking three languages at a time, the barycenter is determined. Some initial results have found both non-sticky and sticky sample means. If the mean exhibits non-sticky properties, then one language may come from a different ancestor than the other two. If the mean is considered sticky, then the languages may share a common ancestor or all languages may have different ancestry.

open book, phylogenetic tree, swadesh, (15 more...)

arXiv.org Artificial Intelligence

2405.06549

Country:

North America > United States > Florida > Hillsborough County > University (0.04)
Europe > Middle East (0.04)
Asia > Middle East (0.04)
(2 more...)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Health & Medicine > Therapeutic Area (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.68)

Add feedback

Syntactic Language Change in English and German: Metrics, Parsers, and Convergences

Chen, Yanran, Zhao, Wei, Breitbarth, Anne, Stoeckel, Manuel, Mehler, Alexander, Eger, Steffen

arXiv.org Artificial IntelligenceMar-28-2024

Many studies have shown that human languages tend to optimize for lower complexity and increased communication efficiency. Syntactic dependency distance, which measures the linear distance between dependent words, is often considered a key indicator of language processing difficulty and working memory load. The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years. We base our observations on five dependency parsers, including the widely used Stanford CoreNLP as well as 4 newer alternatives. Our analysis of syntactic language change goes beyond linear dependency distance and explores 15 metrics relevant to dependency distance minimization (DDM) and/or based on tree graph properties, such as the tree height and degree variance. Even though we have evidence that recent parsers trained on modern treebanks are not heavily affected by data 'noise' such as spelling changes and OCR errors in our historic data, we find that results of syntactic language change are sensitive to the parsers involved, which is a caution against using a single parser for evaluating syntactic language change as done in previous work. We also show that syntactic language change over the time period investigated is largely similar between English and German for the different metrics explored: only 4% of cases we examine yield opposite conclusions regarding upwards and downtrends of syntactic metrics across German and English. We also show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions. To our best knowledge, ours is the most comprehensive analysis of syntactic language change using modern NLP technology in recent corpora of English and German.

dependency distance, english and german, parser, (11 more...)

arXiv.org Artificial Intelligence

2402.11549

Country:

Europe > Sweden > Uppsala County > Uppsala (0.04)
Europe > Germany > Berlin (0.04)
North America > United States > Maryland > Baltimore (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Robustness of Multi-Source MT to Transcription Errors

Macháček, Dominik, Polák, Peter, Bojar, Ondřej, Dabre, Raj

arXiv.org Artificial IntelligenceMay-26-2023

Automatic speech translation is sensitive to speech recognition errors, but in a multilingual scenario, the same content may be available in various languages via simultaneous interpreting, dubbing or subtitling. In this paper, we hypothesize that leveraging multiple sources will improve translation quality if the sources complement one another in terms of correct information they contain. To this end, we first show that on a 10-hour ESIC corpus, the ASR errors in the original English speech and its simultaneous interpreting into German and Czech are mutually independent. We then use two sources, English and German, in a multi-source setting for translation into Czech to establish its robustness to ASR errors. Furthermore, we observe this robustness when translating both noisy sources together in a simultaneous translation setting. Our results show that multi-source neural machine translation has the potential to be useful in a real-time simultaneous translation setting, thereby motivating further investigation in this area.

artificial intelligence, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2305.16894

Country:

Europe > Portugal > Lisbon > Lisbon (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(16 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Pragmatic information in translation: a corpus-based study of tense and mood in English and German

Ramm, Anita, Lapshinova-Koltunski, Ekaterina, Fraser, Alexander

arXiv.org Artificial IntelligenceJul-10-2020

Grammatical tense and mood are important linguistic phenomena to consider in natural language processing (NLP) research. We consider the correspondence between English and German tense and mood in translation. Human translators do not find this correspondence easy, and as we will show through careful analysis, there are no simplistic ways to map tense and mood from one language to another. Our observations about the challenges of human translation of tense and mood have important implications for multilingual NLP. Of particular importance is the challenge of modeling tense and mood in rule-based, phrase-based statistical and neural machine translation.

artificial intelligence, machine translation, natural language, (16 more...)

arXiv.org Artificial Intelligence

2007.05234

Country:

Europe > Germany > Berlin (0.05)
Oceania > Australia (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(16 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback