linguist
For the First Time, AI Analyzes Language as Well as a Human Expert
If language is what makes us human, what does it mean now that large language models have gained "metalinguistic" abilities? Among the myriad abilities that humans possess, which ones are uniquely human? Language has been a top candidate at least since Aristotle, who wrote that humanity was "the animal that has language." Even as large language models such as ChatGPT superficially replicate ordinary speech, researchers want to know if there are specific aspects of human language that simply have no parallels in the communication systems of other animals or artificially intelligent devices. In particular, researchers have been exploring the extent to which language models can reason about language itself.
- North America > United States > California > Alameda County > Berkeley (0.05)
- Europe > Slovakia (0.04)
- Europe > Czechia (0.04)
- Asia > China (0.04)
From Binary to Bilingual: How the National Weather Service is Using Artificial Intelligence to Develop a Comprehensive Translation Program
Trujillo-Falcon, Joseph E., Bozeman, Monica L., Llewellyn, Liam E., Halvorson, Samuel T., Mizell, Meryl, Deshpande, Stuti, Manning, Bob, Fagin, Todd
To advance a Weather-Ready Nation, the National Weather Service (NWS) is developing a systematic translation program to better serve the 68.8 million people in the U.S. who do not speak English at home. This article outlines the foundation of an automated translation tool for NWS products, powered by artificial intelligence. The NWS has partnered with LILT, whose patented training process enables large language models (LLMs) to adapt neural machine translation (NMT) tools for weather terminology and messaging. Designed for scalability across Weather Forecast Offices (WFOs) and National Centers, the system is currently being developed in Spanish, Simplified Chinese, Vietnamese, and other widely spoken non-English languages. Rooted in best practices for multilingual risk communication, the system provides accurate, timely, and culturally relevant translations, significantly reducing manual translation time and easing operational workloads across the NWS. To guide the distribution of these products, GIS mapping was used to identify language needs across different NWS regions, helping prioritize resources for the communities that need them most. We also integrated ethical AI practices throughout the program's design, ensuring that transparency, fairness, and human oversight guide how automated translations are created, evaluated, and shared with the public. This work has culminated into a website featuring experimental multilingual NWS products, including translated warnings, 7-day forecasts, and educational campaigns, bringing the country one step closer to a national warning system that reaches all Americans.
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- North America > Canada (0.14)
- North America > United States > Oklahoma > Cleveland County > Norman (0.14)
- (13 more...)
Testing the Limits of Machine Translation from One Book
Shaw, Jonathan, Mee, Dillon, Khouw, Timothy, Leech, Zackary, Wilson, Daniel
Current state-of-the-art models demonstrate capacity to leverage in-context learning to translate into previously unseen language contexts. Tanzer et al. [2024] utilize language materials (e.g. a grammar) to improve translation quality for Kalamang using large language models (LLMs). We focus on Kanuri, a language that, despite having substantial speaker population, has minimal digital resources. We design two datasets for evaluation: one focused on health and humanitarian terms, and another containing generalized terminology, investigating how domain-specific tasks impact LLM translation quality. By providing different combinations of language resources (grammar, dictionary, and parallel sentences), we measure LLM translation effectiveness, comparing results to native speaker translations and human linguist performance. We evaluate using both automatic metrics and native speaker assessments of fluency and accuracy. Results demonstrate that parallel sentences remain the most effective data source, outperforming other methods in human evaluations and automatic metrics. While incorporating grammar improves over zero-shot translation, it fails as an effective standalone data source. Human evaluations reveal that LLMs achieve accuracy (meaning) more effectively than fluency (grammaticality). These findings suggest LLM translation evaluation benefits from multidimensional assessment beyond simple accuracy metrics, and that grammar alone, without parallel sentences, does not provide sufficient context for effective domain-specific translation.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (8 more...)
The Great Language Flattening
In at least one crucial way, AI has already won its campaign for global dominance. An unbelievable volume of synthetic prose is published every moment of every day--heaping piles of machine-written news articles, text messages, emails, search results, customer-service chats, even scientific research. Chatbots learned from human writing. Now the influence may run in the other direction. Some people have hypothesized that the proliferation of generative-AI tools such as ChatGPT will seep into human communication, that the terse language we use when prompting a chatbot may lead us to dispose of any niceties or writerly flourishes when corresponding with friends and colleagues.
Can a Neural Model Guide Fieldwork? A Case Study on Morphological Data Collection
Mahmudi, Aso, Herce, Borja, Amestica, Demian Inostroza, Scherbakov, Andreas, Hovy, Eduard, Vylomova, Ekaterina
Linguistic fieldwork is an important component in language documentation and preservation. However, it is a long, exhaustive, and time-consuming process. This paper presents a novel model that guides a linguist during the fieldwork and accounts for the dynamics of linguist-speaker interactions. We introduce a novel framework that evaluates the efficiency of various sampling strategies for obtaining morphological data and assesses the effectiveness of state-of-the-art neural models in generalising morphological structures. Our experiments highlight two key strategies for improving the efficiency: (1) increasing the diversity of annotated data by uniform sampling among the cells of the paradigm tables, and (2) using model confidence as a guide to enhance positive interaction by providing reliable predictions during annotation.
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Oceania > Australia > Australian Capital Territory > Canberra (0.04)
- (12 more...)
- Research Report > New Finding (0.93)
- Research Report > Promising Solution (0.66)
Does AI Actually Understand Language?
This article was originally published by Quanta Magazine. A picture may be worth a thousand words, but how many numbers is a word worth? The question may sound silly, but it happens to be the foundation that underlies large language models, or LLMs--and through them, many modern applications of artificial intelligence. Every LLM has its own answer. In Meta's open-source Llama 3 model, words are split into tokens represented by 4,096 numbers; for one version of GPT-3, it's 12,288.
End-to-end Semantic-centric Video-based Multimodal Affective Computing
Lin, Ronghao, Zeng, Ying, Mai, Sijie, Hu, Haifeng
In the pathway toward Artificial General Intelligence (AGI), understanding human's affection is essential to enhance machine's cognition abilities. For achieving more sensual human-AI interaction, Multimodal Affective Computing (MAC) in human-spoken videos has attracted increasing attention. However, previous methods are mainly devoted to designing multimodal fusion algorithms, suffering from two issues: semantic imbalance caused by diverse pre-processing operations and semantic mismatch raised by inconsistent affection content contained in different modalities comparing with the multimodal ground truth. Besides, the usage of manual features extractors make they fail in building end-to-end pipeline for multiple MAC downstream tasks. To address above challenges, we propose a novel end-to-end framework named SemanticMAC to compute multimodal semantic-centric affection for human-spoken videos. We firstly employ pre-trained Transformer model in multimodal data pre-processing and design Affective Perceiver module to capture unimodal affective information. Moreover, we present a semantic-centric approach to unify multimodal representation learning in three ways, including gated feature interaction, multi-task pseudo label generation, and intra-/inter-sample contrastive learning. Finally, SemanticMAC effectively learn specific- and shared-semantic representations in the guidance of semantic-centric labels. Extensive experimental results demonstrate that our approach surpass the state-of-the-art methods on 7 public datasets in four MAC downstream tasks.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Research Report > New Finding (0.34)
- Research Report > Promising Solution (0.34)
- Media (0.67)
- Leisure & Entertainment (0.46)
QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval
Tan, Hongming, Zhan, Shaoxiong, Lin, Hai, Zheng, Hai-Tao, Kin, Wai, Chan, null
In dense retrieval, embedding long texts into dense vectors can result in information loss, leading to inaccurate query-text matching. Additionally, low-quality texts with excessive noise or sparse key information are unlikely to align well with relevant queries. Recent studies mainly focus on improving the sentence embedding model or retrieval process. In this work, we introduce a novel text augmentation framework for dense retrieval. This framework transforms raw documents into information-dense text formats, which supplement the original texts to effectively address the aforementioned issues without modifying embedding or retrieval methodologies. Two text representations are generated via large language models (LLMs) zero-shot prompting: question-answer pairs and element-driven events. We term this approach QAEA-DR: unifying question-answer generation and event extraction in a text augmentation framework for dense retrieval. To further enhance the quality of generated texts, a scoring-based evaluation and regeneration mechanism is introduced in LLM prompting. Our QAEA-DR model has a positive impact on dense retrieval, supported by both theoretical analysis and empirical experiments.
- North America > United States > Iowa (0.05)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Ohio > Franklin County PH (0.04)
Towards Massive Multilingual Holistic Bias
Tan, Xiaoqing Ellen, Hansanti, Prangthip, Wood, Carleigh, Yu, Bokai, Ropers, Christophe, Costa-jussà, Marta R.
In the current landscape of automatic language generation, there is a need to understand, evaluate, and mitigate demographic biases as existing models are becoming increasingly multilingual. To address this, we present the initial eight languages from the MASSIVE MULTILINGUAL HOLISTICBIAS (MMHB) dataset and benchmark consisting of approximately 6 million sentences representing 13 demographic axes. We propose an automatic construction methodology to further scale up MMHB sentences in terms of both language coverage and size, leveraging limited human annotation. Our approach utilizes placeholders in multilingual sentence construction and employs a systematic method to independently translate sentence patterns, nouns, and descriptors. Combined with human translation, this technique carefully designs placeholders to dynamically generate multiple sentence variations and significantly reduces the human translation workload. The translation process has been meticulously conducted to avoid an English-centric perspective and include all necessary morphological variations for languages that require them, improving from the original English HOLISTICBIAS. Finally, we utilize MMHB to report results on gender bias and added toxicity in machine translation tasks. On the gender analysis, MMHB unveils: (1) a lack of gender robustness showing almost +4 chrf points in average for masculine semantic sentences compared to feminine ones and (2) a preference to overgeneralize to masculine forms by reporting more than +12 chrf points in average when evaluating with masculine compared to feminine references. MMHB triggers added toxicity up to 2.3%.
- Asia > Singapore (0.04)
- Asia > China (0.04)
- North America > United States > Alaska (0.04)
- (15 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Grammaticality Representation in ChatGPT as Compared to Linguists and Laypeople
Qiu, Zhuang, Duan, Xufeng, Cai, Zhenguang G.
Large language models (LLMs) have demonstrated exceptional performance across various linguistic tasks. However, it remains uncertain whether LLMs have developed human-like fine-grained grammatical intuition. This preregistered study (https://osf.io/t5nes) presents the first large-scale investigation of ChatGPT's grammatical intuition, building upon a previous study that collected laypeople's grammatical judgments on 148 linguistic phenomena that linguists judged to be grammatical, ungrammatical, or marginally grammatical (Sprouse, Schutze, & Almeida, 2013). Our primary focus was to compare ChatGPT with both laypeople and linguists in the judgement of these linguistic constructions. In Experiment 1, ChatGPT assigned ratings to sentences based on a given reference sentence. Experiment 2 involved rating sentences on a 7-point scale, and Experiment 3 asked ChatGPT to choose the more grammatical sentence from a pair. Overall, our findings demonstrate convergence rates ranging from 73% to 95% between ChatGPT and linguists, with an overall point-estimate of 89%. Significant correlations were also found between ChatGPT and laypeople across all tasks, though the correlation strength varied by task. We attribute these results to the psychometric nature of the judgment tasks and the differences in language processing styles between humans and LLMs.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Asia > China > Hong Kong (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)