AITopics | aave

Collaborating Authors

aave

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reinforcing Stereotypes of Anger: Emotion AI on African American Vernacular English

Dorn, Rebecca, Chance, Christina, Rusti, Casandra, Bickham, Charles Jr., Chang, Kai-Wei, Morstatter, Fred, Lerman, Kristina

arXiv.org Artificial IntelligenceNov-17-2025

Automated emotion detection is widely used in applications ranging from well-being monitoring to high-stakes domains like mental health and hiring. However, models often rely on annotations that reflect dominant cultural norms, limiting model ability to recognize emotional expression in dialects often excluded from training data distributions, such as African American Vernacular English (AAVE). This study examines emotion recognition model performance on AAVE compared to General American English (GAE). We analyze 2.7 million tweets geo-tagged within Los Angeles. Texts are scored for strength of AAVE using computational approximations of dialect features. Annotations of emotion presence and intensity are collected on a dataset of 875 tweets with both high and low AAVE densities. To assess model accuracy on a task as subjective as emotion perception, we calculate community-informed "silver" labels where AAVE-dense tweets are labeled by African American, AAVE-fluent (ingroup) annotators. On our labeled sample, GPT and BERT-based models exhibit false positive prediction rates of anger on AAVE more than double than on GAE. SpanEmo, a popular text-based emotion model, increases false positive rates of anger from 25 percent on GAE to 60 percent on AAVE. Additionally, a series of linear regressions reveals that models and non-ingroup annotations are significantly more correlated with profanity-based AAVE features than ingroup annotations. Linking Census tract demographics, we observe that neighborhoods with higher proportions of African American residents are associated with higher predictions of anger (Pearson's correlation r = 0.27) and lower joy (r = -0.10). These results find an emergent safety issue of emotion AI reinforcing racial stereotypes through biased emotion classification. We emphasize the need for culturally and dialect-informed affective computing systems.

artificial intelligence, emotion, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.10846

Country: North America > United States > California > Los Angeles County > Los Angeles (0.49)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.48)
Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?

Abbas, Chaymaa, Awad, Mariette, Tajeddine, Razane

arXiv.org Artificial IntelligenceOct-10-2025

Style-conditioned data poisoning is identified as a covert vector for amplifying sociolinguistic bias in large language models. Using small poisoned budgets that pair dialectal prompts -- principally African American Vernacular English (AAVE) and a Southern dialect -- with toxic or stereotyped completions during instruction tuning, this work probes whether linguistic style can act as a latent trigger for harmful behavior. Across multiple model families and scales, poisoned exposure elevates toxicity and stereotype expression for dialectal inputs -- most consistently for AAVE -- while Standard American English remains comparatively lower yet not immune. A multi-metric audit combining classifier-based toxicity with an LLM-as-a-judge reveals stereotype-laden content even when lexical toxicity appears muted, indicating that conventional detectors under-estimate sociolinguistic harms. Additionally, poisoned models exhibit emergent jailbreaking despite the absence of explicit slurs in the poison, suggesting weakened alignment rather than memorization. These findings underscore the need for dialect-aware evaluation, content-level stereotype auditing, and training protocols that explicitly decouple style from toxicity to prevent bias amplification through seemingly minor, style-based contamination.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.19195

Country:

North America > United States (1.00)
Asia (0.68)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (0.47)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

From Rules to Rewards: Reinforcement Learning for Interest Rate Adjustment in DeFi Lending

Qu, Hanxiao, Gogol, Krzysztof, Groetschla, Florian, Tessone, Claudio

arXiv.org Artificial IntelligenceJun-3-2025

Decentralized Finance (DeFi) lending enables permission-less borrowing via smart contracts. However, it faces challenges in optimizing interest rates, mitigating bad debt, and improving capital efficiency. Rule-based interest-rate models struggle to adapt to dynamic market conditions, leading to inefficiencies. This work applies Offline Reinforcement Learning (RL) to optimize interest rate adjustments in DeFi lending protocols. Using historical data from Aave protocol, we evaluate three RL approaches: Conservative Q-Learning (CQL), Behavior Cloning (BC), and TD3 with Behavior Cloning (TD3-BC). TD3-BC demonstrates superior performance in balancing utilization, capital stability, and risk, outperforming existing models. It adapts effectively to historical stress events like the May 2021 crash and the March 2023 USDC depeg, showcasing potential for automated, real-time governance.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2506.00505

Genre: Research Report (0.50)

Industry:

Banking & Finance > Trading (1.00)
Banking & Finance > Economy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Finding A Voice: Evaluating African American Dialect Generation for Chatbot Technology

Finch, Sarah E., Paek, Ellie S., Kwon, Sejung, Choi, Ikseon, Wells, Jessica, Chandler, Rasheeta, Choi, Jinho D.

arXiv.org Artificial IntelligenceJan-6-2025

As chatbots become increasingly integrated into everyday tasks, designing systems that accommodate diverse user populations is crucial for fostering trust, engagement, and inclusivity. This study investigates the ability of contemporary Large Language Models (LLMs) to generate African American Vernacular English (AAVE) and evaluates the impact of AAVE usage on user experiences in chatbot applications. We analyze the performance of three LLM families (Llama, GPT, and Claude) in producing AAVE-like utterances at varying dialect intensities and assess user preferences across multiple domains, including healthcare and education. Despite LLMs' proficiency in generating AAVE-like language, findings indicate that AAVE-speaking users prefer Standard American English (SAE) chatbots, with higher levels of AAVE correlating with lower ratings for a variety of characteristics, including chatbot trustworthiness and role appropriateness. These results highlight the complexities of creating inclusive AI systems and underscore the need for further exploration of diversity to enhance human-computer interactions.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.03441

Country: North America > United States (0.47)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks

Lin, Fangru, Mao, Shaoguang, La Malfa, Emanuele, Hofmann, Valentin, de Wynter, Adrian, Yao, Jing, Chen, Si-Qing, Wooldridge, Michael, Wei, Furu

arXiv.org Artificial IntelligenceOct-14-2024

Language is not monolithic. While many benchmarks are used as proxies to systematically estimate Large Language Models' (LLM) performance in real-life tasks, they tend to ignore the nuances of within-language variation and thus fail to model the experience of speakers of minority dialects. Focusing on African American Vernacular English (AAVE), we present the first study on LLMs' fairness and robustness to a dialect in canonical reasoning tasks (algorithm, math, logic, and comprehensive reasoning). We hire AAVE speakers, including experts with computer science backgrounds, to rewrite seven popular benchmarks, such as HumanEval and GSM8K. The result of this effort is ReDial, a dialectal benchmark comprising $1.2K+$ parallel query pairs in Standardized English and AAVE. We use ReDial to evaluate state-of-the-art LLMs, including GPT-4o/4/3.5-turbo, LLaMA-3.1/3, Mistral, and Phi-3. We find that, compared to Standardized English, almost all of these widely used models show significant brittleness and unfairness to queries in AAVE. Furthermore, AAVE queries can degrade performance more substantially than misspelled texts in Standardized English, even when LLMs are more familiar with the AAVE queries. Finally, asking models to rephrase questions in Standardized English does not close the performance gap but generally introduces higher costs. Overall, our findings indicate that LLMs provide unfair service to dialect users in complex reasoning tasks. Code can be found at https://github.com/fangru-lin/redial_dialect_robustness_fairness.git.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.11005

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)
Africa (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education (1.00)
Law (0.92)
Leisure & Entertainment (0.67)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark

Gupta, Abhay, Meng, Philip, Yurtseven, Ece, O'Brien, Sean, Zhu, Kevin

arXiv.org Artificial IntelligenceAug-27-2024

Detecting biases in natural language understanding (NLU) for African American Vernacular English (AAVE) is crucial to developing inclusive natural language processing (NLP) systems. To address dialect-induced performance discrepancies, we introduce AAVENUE ({AAVE} {N}atural Language {U}nderstanding {E}valuation), a benchmark for evaluating large language model (LLM) performance on NLU tasks in AAVE and Standard American English (SAE). AAVENUE builds upon and extends existing benchmarks like VALUE, replacing deterministic syntactic and morphological transformations with a more flexible methodology leveraging LLM-based translation with few-shot prompting, improving performance across our evaluation metrics when translating key tasks from the GLUE and SuperGLUE benchmarks. We compare AAVENUE and VALUE translations using five popular LLMs and a comprehensive set of metrics including fluency, BARTScore, quality, coherence, and understandability. Additionally, we recruit fluent AAVE speakers to validate our translations for authenticity. Our evaluations reveal that LLMs consistently perform better on SAE tasks than AAVE-translated versions, underscoring inherent biases and highlighting the need for more inclusive NLP models. We have open-sourced our source code on GitHub and created a website to showcase our work at https://aavenue.live.

aave, benchmark, translation, (17 more...)

arXiv.org Artificial Intelligence

2408.14845

Country:

North America > United States > New York > Queens County > New York City (0.04)
North America > United States > New York > Bronx County > New York City (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Self-supervised Speech Representations Still Struggle with African American Vernacular English

Chang, Kalvin, Chou, Yi-Hui, Shi, Jiatong, Chen, Hsuan-Ming, Holliday, Nicole, Scharenborg, Odette, Mortensen, David R.

arXiv.org Artificial IntelligenceAug-26-2024

Underperformance of ASR systems for speakers of African American Vernacular English (AAVE) and other marginalized language varieties is a well-documented phenomenon, and one that reinforces the stigmatization of these varieties. We investigate whether or not the recent wave of Self-Supervised Learning (SSL) speech models can close the gap in ASR performance between AAVE and Mainstream American English (MAE). We evaluate four SSL models (wav2vec 2.0, HuBERT, WavLM, and XLS-R) on zero-shot Automatic Speech Recognition (ASR) for these two varieties and find that these models perpetuate the bias in performance against AAVE. Additionally, the models have higher word error rates on utterances with more phonological and morphosyntactic features of AAVE. Despite the success of SSL speech models in improving ASR for low resource varieties, SSL pre-training alone may not bridge the gap between AAVE and MAE. Our code is publicly available at https://github.com/cmu-llab/s3m-aave.

aave, speech recognition, utterance, (14 more...)

arXiv.org Artificial Intelligence

2408.14262

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > Quebec > Montreal (0.04)
Oceania > Cook Islands (0.04)
(10 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

As AI tools get smarter, they're growing more covertly racist, experts find

The GuardianMar-16-2024, 14:00:07 GMT

Popular artificial intelligence tools are becoming more covertly racist as they advance, says an alarming new report. A team of technology and linguistics researchers revealed this week that large language models like OpenAI's ChatGPT and Google's Gemini hold racist stereotypes about speakers of African American Vernacular English, or AAVE, an English dialect created and spoken by Black Americans. "We know that these technologies are really commonly used by companies to do tasks like screening job applicants," said Valentin Hoffman, a researcher at the Allen Institute for Artificial Intelligence and co-author of the recent paper, published this week in arXiv, an open-access research archive from Cornell University. Hoffman explained that previously researchers "only really looked at what overt racial biases these technologies might hold" and never "examined how these AI systems react to less overt markers of race, like dialect differences". Black people who use AAVE in speech, the paper says, "are known to experience racial discrimination in a wide range of contexts, including education, employment, housing, and legal outcomes". Hoffman and his colleagues asked the AI models to assess the intelligence and employability of people who speak using AAVE compared to people who speak using what they dub "standard American English".

ai model, covertly racist, language model, (15 more...)

The Guardian

Country: North America > United States (0.49)

Genre: Research Report (0.70)

Industry: Law > Civil Rights & Constitutional Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

Add feedback

DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules

Liu, Yanchen, Held, William, Yang, Diyi

arXiv.org Artificial IntelligenceDec-5-2023

Existing large language models (LLMs) that mainly focus on Standard American English (SAE) often lead to significantly worse performance when being applied to other English dialects. While existing mitigations tackle discrepancies for individual target dialects, they assume access to high-accuracy dialect identification systems. The boundaries between dialects are inherently flexible, making it difficult to categorize language into discrete predefined categories. In this paper, we propose DADA (Dialect Adaptation via Dynamic Aggregation), a modular approach to imbue SAE-trained models with multi-dialectal robustness by composing adapters which handle specific linguistic features. The compositional architecture of DADA allows for both targeted adaptation to specific dialect variants and simultaneous adaptation to various dialects. We show that DADA is effective for both single task and instruction finetuned language models, offering an extensible and interpretable framework for adapting existing LLMs to different English dialects.

adapter, dialect, feature adapter, (12 more...)

arXiv.org Artificial Intelligence

2305.13406

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(17 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

VALUE: Understanding Dialect Disparity in NLU

Ziems, Caleb, Chen, Jiaao, Harris, Camille, Anderson, Jessica, Yang, Diyi

arXiv.org Artificial IntelligenceSep-13-2022

English Natural Language Understanding (NLU) systems have achieved great performances and even outperformed humans on benchmarks like GLUE and SuperGLUE. However, these benchmarks contain only textbook Standard American English (SAE). Other dialects have been largely overlooked in the NLP community. This leads to biased and inequitable NLU systems that serve only a sub-population of speakers. To understand disparities in current models and to facilitate more dialect-competent NLU systems, we introduce the VernAcular Language Understanding Evaluation (VALUE) benchmark, a challenging variant of GLUE that we created with a set of lexical and morphosyntactic transformation rules. In this initial release (V.1), we construct rules for 11 features of African American Vernacular English (AAVE), and we recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments in a participatory design manner. Experiments show that these new dialectal features can lead to a drop in model performance. To run the transformation code and download both synthetic and gold-standard dialectal GLUE benchmarks, see https://github.com/SALT-NLP/value

aave, computational linguistic, transformation, (16 more...)

arXiv.org Artificial Intelligence

2204.03031

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(22 more...)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (0.93)
Media (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.66)

Add feedback