AITopics | indo-european language

Collaborating Authors

indo-european language

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How Does Quantization Affect Multilingual LLMs?

Marchisio, Kelly, Dash, Saurabh, Chen, Hongyu, Aumiller, Dennis, Üstün, Ahmet, Hooker, Sara, Ruder, Sebastian

arXiv.org Artificial IntelligenceJul-3-2024

Quantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantized LLMs on English tasks, none have examined the effect of quantization across languages. We conduct a thorough analysis of quantized multilingual LLMs, focusing on their performance across languages and at varying scales. We use automatic benchmarks, LLM-as-a-Judge methods, and human evaluation, finding that (1) harmful effects of quantization are apparent in human evaluation, and automatic metrics severely underestimate the detriment: a 1.7% average drop in Japanese across automatic tasks corresponds to a 16.0% drop reported by human evaluators on realistic prompts; (2) languages are disparately affected by quantization, with non-Latin script languages impacted worst; and (3) challenging tasks such as mathematical reasoning degrade fastest. As the ability to serve low-compute models is critical for wide global adoption of NLP technologies, our results urge consideration of multilingual performance as a key evaluation criterion for efficient models.

arxiv, evaluation, quantization, (14 more...)

arXiv.org Artificial Intelligence

2407.03211

Country:

North America > United States (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > Ontario > Toronto (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in Indo-European Languages

Bhat, Vineet, Jyothi, Preethi, Bhattacharyya, Pushpak

arXiv.org Artificial IntelligenceOct-25-2023

Disfluency correction (DC) is the process of removing disfluent elements like fillers, repetitions and corrections from spoken utterances to create readable and interpretable text. DC is a vital post-processing step applied to Automatic Speech Recognition (ASR) outputs, before subsequent processing by downstream language understanding tasks. Existing DC research has primarily focused on English due to the unavailability of large-scale open-source datasets. Towards the goal of multilingual disfluency correction, we present a high-quality human-annotated DC corpus covering four important Indo-European languages: English, Hindi, German and French. We provide extensive analysis of results of state-of-the-art DC models across all four languages obtaining F1 scores of 97.55 (English), 94.29 (Hindi), 95.89 (German) and 92.97 (French). To demonstrate the benefits of DC on downstream tasks, we show that DC leads to 5.65 points increase in BLEU scores on average when used in conjunction with a state-of-the-art Machine Translation (MT) system. We release code to run our experiments along with our annotated dataset here.

disfluency correction, indo-european language, scale human annotated corpus, (1 more...)

arXiv.org Artificial Intelligence

2310.16749

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.87)

Add feedback

Origin of Indo-European languages traced back to 8000 years ago

New ScientistJul-27-2023, 19:00:30 GMT

The common ancestor of Indo-European languages, which are now spoken by close to half the world's population, was spoken in the eastern Mediterranean around 8000 years ago, according to an analysis of related words. Indo-European languages, spanning from English to Sanskrit, have long been thought to share a common ancestor. The first linguist to make this link, William Jones, said in a lecture in 1786 that no linguist could examine Greek, Latin and Sanskrit together "without believing them to have sprung" from some common ancestor. But researchers have struggled to agree on the origin story of this so-called proto-Indo-European language, says Paul Heggarty, who is now at the Pontifical Catholic University of Peru. There are two main hypotheses, he says.

heggarty, hypothesis, indo-european language, (7 more...)

New Scientist

Country:

South America > Peru (0.25)
Asia > Middle East > Republic of Türkiye (0.07)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.36)

Add feedback

Statistical analysis of word flow among five Indo-European languages

Molina, Josué Ely, Flores, Jorge, Gershenson, Carlos, Pineda, Carlos

arXiv.org Artificial IntelligenceJan-17-2023

A recent increase in data availability has allowed the possibility to perform different statistical linguistic studies. Here we use the Google Books Ngram dataset to analyze word flow among English, French, German, Italian, and Spanish. We study what we define as ``migrant words'', a type of loanwords that do not change their spelling. We quantify migrant words from one language to another for different decades, and notice that most migrant words can be aggregated in semantic fields and associated to historic events. We also study the statistical properties of accumulated migrant words and their rank dynamics. We propose a measure of use of migrant words that could be used as a proxy of cultural influence. Our methodology is not exempt of caveats, but our results are encouraging to promote further studies in this direction.

artificial intelligence, migrant word, source language, (18 more...)

arXiv.org Artificial Intelligence

2301.06985

Country:

North America > Mexico > Mexico City > Mexico City (0.05)
North America > Panama (0.04)
South America > Peru (0.04)
(11 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Government > Regional Government (0.67)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.49)
Government > Immigration & Customs (0.49)
Government > Military (0.46)

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

Syntactic structures and the general Markov models

Gakkhar, Sitanshu, Marcolli, Matilde

arXiv.org Artificial IntelligenceOct-18-2022

The focus of the present paper is to investigate the following questions: to what extent syntactic features capture phylogenetic relationships and to what extent Markov models are a viable assumption for phylogenetic reconstruction based on syntactic features. For the second, we also consider an alternative that we argue approximates the infinite site evolutionary model. These questions are motivated by the fact that at both lexical and syntactic level, Markov processes are commonly assumed to underlie computational models of language change; for instance, within the Principles and Parameters setting relevant here, Niyogi and Berwick (1997) developed models of language acquisition and language change based on a Markov process in a space of syntactic parameters. In this paper we focus only on language change processes, viewed through the lens of phylogenetic trees of language families. While the model we consider are not directly related to models of language acquisition and parameter setting, the historical changes of syntax within and across language families, through the modification of syntactic parameters, can be seen as an effect of such underlying dynamics.

machine learning, markov model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2104.08462

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Calabria (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Subdiffusive semantic evolution in Indo-European languages

Asztalos, Bogdán, Palla, Gergely, Czégel, Dániel

arXiv.org Artificial IntelligenceSep-10-2022

How do words change their meaning? Although semantic evolution is driven by a variety of distinct factors, including linguistic, societal, and technological ones, we find that there is one law that holds universally across five major Indo-European languages: that semantic evolution is strongly subdiffusive. Using an automated pipeline of diachronic distributional semantic embedding that controls for underlying symmetries, we show that words follow stochastic trajectories in meaning space with an anomalous diffusion exponent $\alpha= 0.45\pm 0.05$ across languages, in contrast with diffusing particles that follow $\alpha=1$. Randomization methods indicate that preserving temporal correlations in semantic change directions is necessary to recover strongly subdiffusive behavior; however, correlations in change sizes play an important role too. We furthermore show that strong subdiffusion is a robust phenomenon under a wide variety of choices in data analysis and interpretation, such as the choice of fitting an ensemble average of displacements or averaging best-fit exponents of individual word trajectories.

machine learning, natural language, trajectory, (18 more...)

arXiv.org Artificial Intelligence

2209.04701

Country:

Europe > Hungary > Budapest > Budapest (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Arizona > Maricopa County > Tempe (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Data Science (0.88)

Add feedback