AITopics | linguist

Collaborating Authors

linguist

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

For the First Time, AI Analyzes Language as Well as a Human Expert

WIREDDec-14-2025, 07:00:00 GMT

If language is what makes us human, what does it mean now that large language models have gained "metalinguistic" abilities? Among the myriad abilities that humans possess, which ones are uniquely human? Language has been a top candidate at least since Aristotle, who wrote that humanity was "the animal that has language." Even as large language models such as ChatGPT superficially replicate ordinary speech, researchers want to know if there are specific aspects of human language that simply have no parallels in the communication systems of other animals or artificially intelligent devices. In particular, researchers have been exploring the extent to which language models can reason about language itself.

human language, language model, recursion, (16 more...)

WIRED

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
Europe > Slovakia (0.04)
Europe > Czechia (0.04)
Asia > China (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

From Binary to Bilingual: How the National Weather Service is Using Artificial Intelligence to Develop a Comprehensive Translation Program

Trujillo-Falcon, Joseph E., Bozeman, Monica L., Llewellyn, Liam E., Halvorson, Samuel T., Mizell, Meryl, Deshpande, Stuti, Manning, Bob, Fagin, Todd

arXiv.org Artificial IntelligenceOct-17-2025

To advance a Weather-Ready Nation, the National Weather Service (NWS) is developing a systematic translation program to better serve the 68.8 million people in the U.S. who do not speak English at home. This article outlines the foundation of an automated translation tool for NWS products, powered by artificial intelligence. The NWS has partnered with LILT, whose patented training process enables large language models (LLMs) to adapt neural machine translation (NMT) tools for weather terminology and messaging. Designed for scalability across Weather Forecast Offices (WFOs) and National Centers, the system is currently being developed in Spanish, Simplified Chinese, Vietnamese, and other widely spoken non-English languages. Rooted in best practices for multilingual risk communication, the system provides accurate, timely, and culturally relevant translations, significantly reducing manual translation time and easing operational workloads across the NWS. To guide the distribution of these products, GIS mapping was used to identify language needs across different NWS regions, helping prioritize resources for the communities that need them most. We also integrated ethical AI practices throughout the program's design, ensuring that transparency, fairness, and human oversight guide how automated translations are created, evaluated, and shared with the public. This work has culminated into a website featuring experimental multilingual NWS products, including translated warnings, 7-day forecasts, and educational campaigns, bringing the country one step closer to a national warning system that reaches all Americans.

artificial intelligence, natural language, translation, (14 more...)

arXiv.org Artificial Intelligence

2510.14369

Country:

North America > United States > Oklahoma (0.28)
North America > United States > North Dakota (0.28)
North America > United States > Illinois > Champaign County > Urbana (0.14)

Genre: Research Report (0.40)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Testing the Limits of Machine Translation from One Book

Shaw, Jonathan, Mee, Dillon, Khouw, Timothy, Leech, Zackary, Wilson, Daniel

arXiv.org Artificial IntelligenceAug-12-2025

Current state-of-the-art models demonstrate capacity to leverage in-context learning to translate into previously unseen language contexts. Tanzer et al. [2024] utilize language materials (e.g. a grammar) to improve translation quality for Kalamang using large language models (LLMs). We focus on Kanuri, a language that, despite having substantial speaker population, has minimal digital resources. We design two datasets for evaluation: one focused on health and humanitarian terms, and another containing generalized terminology, investigating how domain-specific tasks impact LLM translation quality. By providing different combinations of language resources (grammar, dictionary, and parallel sentences), we measure LLM translation effectiveness, comparing results to native speaker translations and human linguist performance. We evaluate using both automatic metrics and native speaker assessments of fluency and accuracy. Results demonstrate that parallel sentences remain the most effective data source, outperforming other methods in human evaluations and automatic metrics. While incorporating grammar improves over zero-shot translation, it fails as an effective standalone data source. Human evaluations reveal that LLMs achieve accuracy (meaning) more effectively than fluency (grammaticality). These findings suggest LLM translation evaluation benefits from multidimensional assessment beyond simple accuracy metrics, and that grammar alone, without parallel sentences, does not provide sufficient context for effective domain-specific translation.

large language model, natural language, translation, (19 more...)

arXiv.org Artificial Intelligence

2508.06665

Country:

North America > United States (0.46)
Africa > Nigeria (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

The Great Language Flattening

The Atlantic - TechnologyApr-29-2025, 13:30:00 GMT

In at least one crucial way, AI has already won its campaign for global dominance. An unbelievable volume of synthetic prose is published every moment of every day--heaping piles of machine-written news articles, text messages, emails, search results, customer-service chats, even scientific research. Chatbots learned from human writing. Now the influence may run in the other direction. Some people have hypothesized that the proliferation of generative-AI tools such as ChatGPT will seep into human communication, that the terse language we use when prompting a chatbot may lead us to dispose of any niceties or writerly flourishes when corresponding with friends and colleagues.

artificial intelligence, machine learning, natural language, (20 more...)

The Atlantic - Technology

Country:

Oceania > Australia (0.05)
Europe > United Kingdom (0.05)
Europe > Norway (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

Can a Neural Model Guide Fieldwork? A Case Study on Morphological Data Collection

Mahmudi, Aso, Herce, Borja, Amestica, Demian Inostroza, Scherbakov, Andreas, Hovy, Eduard, Vylomova, Ekaterina

arXiv.org Artificial IntelligenceDec-14-2024

Linguistic fieldwork is an important component in language documentation and preservation. However, it is a long, exhaustive, and time-consuming process. This paper presents a novel model that guides a linguist during the fieldwork and accounts for the dynamics of linguist-speaker interactions. We introduce a novel framework that evaluates the efficiency of various sampling strategies for obtaining morphological data and assesses the effectiveness of state-of-the-art neural models in generalising morphological structures. Our experiments highlight two key strategies for improving the efficiency: (1) increasing the diversity of annotated data by uniform sampling among the cells of the paradigm tables, and (2) using model confidence as a guide to enhance positive interaction by providing reliable predictions during annotation.

machine learning, natural language, prediction, (15 more...)

arXiv.org Artificial Intelligence

2409.14628

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(12 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Promising Solution (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Does AI Actually Understand Language?

The Atlantic - TechnologySep-29-2024, 15:00:00 GMT

This article was originally published by Quanta Magazine. A picture may be worth a thousand words, but how many numbers is a word worth? The question may sound silly, but it happens to be the foundation that underlies large language models, or LLMs--and through them, many modern applications of artificial intelligence. Every LLM has its own answer. In Meta's open-source Llama 3 model, words are split into tokens represented by 4,096 numbers; for one version of GPT-3, it's 12,288.

language model, neural network, pavlick, (9 more...)

The Atlantic - Technology

Country: Europe > France (0.16)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

End-to-end Semantic-centric Video-based Multimodal Affective Computing

Lin, Ronghao, Zeng, Ying, Mai, Sijie, Hu, Haifeng

arXiv.org Artificial IntelligenceAug-14-2024

In the pathway toward Artificial General Intelligence (AGI), understanding human's affection is essential to enhance machine's cognition abilities. For achieving more sensual human-AI interaction, Multimodal Affective Computing (MAC) in human-spoken videos has attracted increasing attention. However, previous methods are mainly devoted to designing multimodal fusion algorithms, suffering from two issues: semantic imbalance caused by diverse pre-processing operations and semantic mismatch raised by inconsistent affection content contained in different modalities comparing with the multimodal ground truth. Besides, the usage of manual features extractors make they fail in building end-to-end pipeline for multiple MAC downstream tasks. To address above challenges, we propose a novel end-to-end framework named SemanticMAC to compute multimodal semantic-centric affection for human-spoken videos. We firstly employ pre-trained Transformer model in multimodal data pre-processing and design Affective Perceiver module to capture unimodal affective information. Moreover, we present a semantic-centric approach to unify multimodal representation learning in three ways, including gated feature interaction, multi-task pseudo label generation, and intra-/inter-sample contrastive learning. Finally, SemanticMAC effectively learn specific- and shared-semantic representations in the guidance of semantic-centric labels. Extensive experimental results demonstrate that our approach surpass the state-of-the-art methods on 7 public datasets in four MAC downstream tasks.

modality, proc, representation, (15 more...)

arXiv.org Artificial Intelligence

2408.07694

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Industry:

Media (0.67)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.72)
(2 more...)

Add feedback

QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval

Tan, Hongming, Zhan, Shaoxiong, Lin, Hai, Zheng, Hai-Tao, Kin, Wai, Chan, null

arXiv.org Artificial IntelligenceJul-29-2024

In dense retrieval, embedding long texts into dense vectors can result in information loss, leading to inaccurate query-text matching. Additionally, low-quality texts with excessive noise or sparse key information are unlikely to align well with relevant queries. Recent studies mainly focus on improving the sentence embedding model or retrieval process. In this work, we introduce a novel text augmentation framework for dense retrieval. This framework transforms raw documents into information-dense text formats, which supplement the original texts to effectively address the aforementioned issues without modifying embedding or retrieval methodologies. Two text representations are generated via large language models (LLMs) zero-shot prompting: question-answer pairs and element-driven events. We term this approach QAEA-DR: unifying question-answer generation and event extraction in a text augmentation framework for dense retrieval. To further enhance the quality of generated texts, a scoring-based evaluation and regeneration mechanism is introduced in LLM prompting. Our QAEA-DR model has a positive impact on dense retrieval, supported by both theoretical analysis and empirical experiments.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.20207

Country:

North America > United States > Iowa (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Ohio > Franklin County PH (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Massive Multilingual Holistic Bias

Tan, Xiaoqing Ellen, Hansanti, Prangthip, Wood, Carleigh, Yu, Bokai, Ropers, Christophe, Costa-jussà, Marta R.

arXiv.org Artificial IntelligenceJun-29-2024

In the current landscape of automatic language generation, there is a need to understand, evaluate, and mitigate demographic biases as existing models are becoming increasingly multilingual. To address this, we present the initial eight languages from the MASSIVE MULTILINGUAL HOLISTICBIAS (MMHB) dataset and benchmark consisting of approximately 6 million sentences representing 13 demographic axes. We propose an automatic construction methodology to further scale up MMHB sentences in terms of both language coverage and size, leveraging limited human annotation. Our approach utilizes placeholders in multilingual sentence construction and employs a systematic method to independently translate sentence patterns, nouns, and descriptors. Combined with human translation, this technique carefully designs placeholders to dynamically generate multiple sentence variations and significantly reduces the human translation workload. The translation process has been meticulously conducted to avoid an English-centric perspective and include all necessary morphological variations for languages that require them, improving from the original English HOLISTICBIAS. Finally, we utilize MMHB to report results on gender bias and added toxicity in machine translation tasks. On the gender analysis, MMHB unveils: (1) a lack of gender robustness showing almost +4 chrf points in average for masculine semantic sentences compared to feminine ones and (2) a preference to overgeneralize to masculine forms by reporting more than +12 chrf points in average when evaluating with masculine compared to feminine references. MMHB triggers added toxicity up to 2.3%.

descriptor, linguist, translation, (15 more...)

arXiv.org Artificial Intelligence

2407.00486

Country:

Asia > Singapore (0.04)
Asia > China (0.04)
North America > United States > Alaska (0.04)
(15 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Grammaticality Representation in ChatGPT as Compared to Linguists and Laypeople

Qiu, Zhuang, Duan, Xufeng, Cai, Zhenguang G.

arXiv.org Artificial IntelligenceJun-16-2024

Large language models (LLMs) have demonstrated exceptional performance across various linguistic tasks. However, it remains uncertain whether LLMs have developed human-like fine-grained grammatical intuition. This preregistered study (https://osf.io/t5nes) presents the first large-scale investigation of ChatGPT's grammatical intuition, building upon a previous study that collected laypeople's grammatical judgments on 148 linguistic phenomena that linguists judged to be grammatical, ungrammatical, or marginally grammatical (Sprouse, Schutze, & Almeida, 2013). Our primary focus was to compare ChatGPT with both laypeople and linguists in the judgement of these linguistic constructions. In Experiment 1, ChatGPT assigned ratings to sentences based on a given reference sentence. Experiment 2 involved rating sentences on a 7-point scale, and Experiment 3 asked ChatGPT to choose the more grammatical sentence from a pair. Overall, our findings demonstrate convergence rates ranging from 73% to 95% between ChatGPT and linguists, with an overall point-estimate of 89%. Significant correlations were also found between ChatGPT and laypeople across all tasks, though the correlation strength varied by task. We attribute these results to the psychometric nature of the judgment tasks and the differences in language processing styles between humans and LLMs.

chatgpt, linguist, participant, (15 more...)

arXiv.org Artificial Intelligence

2406.11116

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > China > Hong Kong (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback