aave
Reinforcing Stereotypes of Anger: Emotion AI on African American Vernacular English
Dorn, Rebecca, Chance, Christina, Rusti, Casandra, Bickham, Charles Jr., Chang, Kai-Wei, Morstatter, Fred, Lerman, Kristina
Automated emotion detection is widely used in applications ranging from well-being monitoring to high-stakes domains like mental health and hiring. However, models often rely on annotations that reflect dominant cultural norms, limiting model ability to recognize emotional expression in dialects often excluded from training data distributions, such as African American Vernacular English (AAVE). This study examines emotion recognition model performance on AAVE compared to General American English (GAE). We analyze 2.7 million tweets geo-tagged within Los Angeles. Texts are scored for strength of AAVE using computational approximations of dialect features. Annotations of emotion presence and intensity are collected on a dataset of 875 tweets with both high and low AAVE densities. To assess model accuracy on a task as subjective as emotion perception, we calculate community-informed "silver" labels where AAVE-dense tweets are labeled by African American, AAVE-fluent (ingroup) annotators. On our labeled sample, GPT and BERT-based models exhibit false positive prediction rates of anger on AAVE more than double than on GAE. SpanEmo, a popular text-based emotion model, increases false positive rates of anger from 25 percent on GAE to 60 percent on AAVE. Additionally, a series of linear regressions reveals that models and non-ingroup annotations are significantly more correlated with profanity-based AAVE features than ingroup annotations. Linking Census tract demographics, we observe that neighborhoods with higher proportions of African American residents are associated with higher predictions of anger (Pearson's correlation r = 0.27) and lower joy (r = -0.10). These results find an emergent safety issue of emotion AI reinforcing racial stereotypes through biased emotion classification. We emphasize the need for culturally and dialect-informed affective computing systems.
- North America > United States > California > Los Angeles County > Los Angeles (0.49)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (10 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.48)
- Education > Educational Setting (0.46)
From Rules to Rewards: Reinforcement Learning for Interest Rate Adjustment in DeFi Lending
Qu, Hanxiao, Gogol, Krzysztof, Groetschla, Florian, Tessone, Claudio
Decentralized Finance (DeFi) lending enables permission-less borrowing via smart contracts. However, it faces challenges in optimizing interest rates, mitigating bad debt, and improving capital efficiency. Rule-based interest-rate models struggle to adapt to dynamic market conditions, leading to inefficiencies. This work applies Offline Reinforcement Learning (RL) to optimize interest rate adjustments in DeFi lending protocols. Using historical data from Aave protocol, we evaluate three RL approaches: Conservative Q-Learning (CQL), Behavior Cloning (BC), and TD3 with Behavior Cloning (TD3-BC). TD3-BC demonstrates superior performance in balancing utilization, capital stability, and risk, outperforming existing models. It adapts effectively to historical stress events like the May 2021 crash and the March 2023 USDC depeg, showcasing potential for automated, real-time governance.
- Banking & Finance > Trading (1.00)
- Banking & Finance > Economy (1.00)
Finding A Voice: Evaluating African American Dialect Generation for Chatbot Technology
Finch, Sarah E., Paek, Ellie S., Kwon, Sejung, Choi, Ikseon, Wells, Jessica, Chandler, Rasheeta, Choi, Jinho D.
As chatbots become increasingly integrated into everyday tasks, designing systems that accommodate diverse user populations is crucial for fostering trust, engagement, and inclusivity. This study investigates the ability of contemporary Large Language Models (LLMs) to generate African American Vernacular English (AAVE) and evaluates the impact of AAVE usage on user experiences in chatbot applications. We analyze the performance of three LLM families (Llama, GPT, and Claude) in producing AAVE-like utterances at varying dialect intensities and assess user preferences across multiple domains, including healthcare and education. Despite LLMs' proficiency in generating AAVE-like language, findings indicate that AAVE-speaking users prefer Standard American English (SAE) chatbots, with higher levels of AAVE correlating with lower ratings for a variety of characteristics, including chatbot trustworthiness and role appropriateness. These results highlight the complexities of creating inclusive AI systems and underscore the need for further exploration of diversity to enhance human-computer interactions.
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (5 more...)
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks
Lin, Fangru, Mao, Shaoguang, La Malfa, Emanuele, Hofmann, Valentin, de Wynter, Adrian, Yao, Jing, Chen, Si-Qing, Wooldridge, Michael, Wei, Furu
Language is not monolithic. While many benchmarks are used as proxies to systematically estimate Large Language Models' (LLM) performance in real-life tasks, they tend to ignore the nuances of within-language variation and thus fail to model the experience of speakers of minority dialects. Focusing on African American Vernacular English (AAVE), we present the first study on LLMs' fairness and robustness to a dialect in canonical reasoning tasks (algorithm, math, logic, and comprehensive reasoning). We hire AAVE speakers, including experts with computer science backgrounds, to rewrite seven popular benchmarks, such as HumanEval and GSM8K. The result of this effort is ReDial, a dialectal benchmark comprising $1.2K+$ parallel query pairs in Standardized English and AAVE. We use ReDial to evaluate state-of-the-art LLMs, including GPT-4o/4/3.5-turbo, LLaMA-3.1/3, Mistral, and Phi-3. We find that, compared to Standardized English, almost all of these widely used models show significant brittleness and unfairness to queries in AAVE. Furthermore, AAVE queries can degrade performance more substantially than misspelled texts in Standardized English, even when LLMs are more familiar with the AAVE queries. Finally, asking models to rephrase questions in Standardized English does not close the performance gap but generally introduces higher costs. Overall, our findings indicate that LLMs provide unfair service to dialect users in complex reasoning tasks. Code can be found at https://github.com/fangru-lin/redial_dialect_robustness_fairness.git.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore (0.04)
- Africa (0.04)
- (9 more...)
- Education (1.00)
- Law (0.92)
- Leisure & Entertainment (0.67)
- Information Technology (0.67)
AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark
Gupta, Abhay, Meng, Philip, Yurtseven, Ece, O'Brien, Sean, Zhu, Kevin
Detecting biases in natural language understanding (NLU) for African American Vernacular English (AAVE) is crucial to developing inclusive natural language processing (NLP) systems. To address dialect-induced performance discrepancies, we introduce AAVENUE ({AAVE} {N}atural Language {U}nderstanding {E}valuation), a benchmark for evaluating large language model (LLM) performance on NLU tasks in AAVE and Standard American English (SAE). AAVENUE builds upon and extends existing benchmarks like VALUE, replacing deterministic syntactic and morphological transformations with a more flexible methodology leveraging LLM-based translation with few-shot prompting, improving performance across our evaluation metrics when translating key tasks from the GLUE and SuperGLUE benchmarks. We compare AAVENUE and VALUE translations using five popular LLMs and a comprehensive set of metrics including fluency, BARTScore, quality, coherence, and understandability. Additionally, we recruit fluent AAVE speakers to validate our translations for authenticity. Our evaluations reveal that LLMs consistently perform better on SAE tasks than AAVE-translated versions, underscoring inherent biases and highlighting the need for more inclusive NLP models. We have open-sourced our source code on GitHub and created a website to showcase our work at https://aavenue.live.
- North America > United States > New York > Queens County > New York City (0.04)
- North America > United States > New York > Bronx County > New York City (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > Singapore (0.04)
Self-supervised Speech Representations Still Struggle with African American Vernacular English
Chang, Kalvin, Chou, Yi-Hui, Shi, Jiatong, Chen, Hsuan-Ming, Holliday, Nicole, Scharenborg, Odette, Mortensen, David R.
Underperformance of ASR systems for speakers of African American Vernacular English (AAVE) and other marginalized language varieties is a well-documented phenomenon, and one that reinforces the stigmatization of these varieties. We investigate whether or not the recent wave of Self-Supervised Learning (SSL) speech models can close the gap in ASR performance between AAVE and Mainstream American English (MAE). We evaluate four SSL models (wav2vec 2.0, HuBERT, WavLM, and XLS-R) on zero-shot Automatic Speech Recognition (ASR) for these two varieties and find that these models perpetuate the bias in performance against AAVE. Additionally, the models have higher word error rates on utterances with more phonological and morphosyntactic features of AAVE. Despite the success of SSL speech models in improving ASR for low resource varieties, SSL pre-training alone may not bridge the gap between AAVE and MAE. Our code is publicly available at https://github.com/cmu-llab/s3m-aave.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Oceania > Cook Islands (0.04)
- (10 more...)
As AI tools get smarter, they're growing more covertly racist, experts find
Popular artificial intelligence tools are becoming more covertly racist as they advance, says an alarming new report. A team of technology and linguistics researchers revealed this week that large language models like OpenAI's ChatGPT and Google's Gemini hold racist stereotypes about speakers of African American Vernacular English, or AAVE, an English dialect created and spoken by Black Americans. "We know that these technologies are really commonly used by companies to do tasks like screening job applicants," said Valentin Hoffman, a researcher at the Allen Institute for Artificial Intelligence and co-author of the recent paper, published this week in arXiv, an open-access research archive from Cornell University. Hoffman explained that previously researchers "only really looked at what overt racial biases these technologies might hold" and never "examined how these AI systems react to less overt markers of race, like dialect differences". Black people who use AAVE in speech, the paper says, "are known to experience racial discrimination in a wide range of contexts, including education, employment, housing, and legal outcomes". Hoffman and his colleagues asked the AI models to assess the intelligence and employability of people who speak using AAVE compared to people who speak using what they dub "standard American English".
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)
DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules
Liu, Yanchen, Held, William, Yang, Diyi
Existing large language models (LLMs) that mainly focus on Standard American English (SAE) often lead to significantly worse performance when being applied to other English dialects. While existing mitigations tackle discrepancies for individual target dialects, they assume access to high-accuracy dialect identification systems. The boundaries between dialects are inherently flexible, making it difficult to categorize language into discrete predefined categories. In this paper, we propose DADA (Dialect Adaptation via Dynamic Aggregation), a modular approach to imbue SAE-trained models with multi-dialectal robustness by composing adapters which handle specific linguistic features. The compositional architecture of DADA allows for both targeted adaptation to specific dialect variants and simultaneous adaptation to various dialects. We show that DADA is effective for both single task and instruction finetuned language models, offering an extensible and interpretable framework for adapting existing LLMs to different English dialects.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (17 more...)
VALUE: Understanding Dialect Disparity in NLU
Ziems, Caleb, Chen, Jiaao, Harris, Camille, Anderson, Jessica, Yang, Diyi
English Natural Language Understanding (NLU) systems have achieved great performances and even outperformed humans on benchmarks like GLUE and SuperGLUE. However, these benchmarks contain only textbook Standard American English (SAE). Other dialects have been largely overlooked in the NLP community. This leads to biased and inequitable NLU systems that serve only a sub-population of speakers. To understand disparities in current models and to facilitate more dialect-competent NLU systems, we introduce the VernAcular Language Understanding Evaluation (VALUE) benchmark, a challenging variant of GLUE that we created with a set of lexical and morphosyntactic transformation rules. In this initial release (V.1), we construct rules for 11 features of African American Vernacular English (AAVE), and we recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments in a participatory design manner. Experiments show that these new dialectal features can lead to a drop in model performance. To run the transformation code and download both synthetic and gold-standard dialectal GLUE benchmarks, see https://github.com/SALT-NLP/value
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (22 more...)
- Leisure & Entertainment (0.93)
- Media (0.93)
Investigating African-American Vernacular English in Transformer-Based Text Generation
Groenwold, Sophie, Ou, Lily, Parekh, Aesha, Honnavalli, Samhita, Levy, Sharon, Mirza, Diba, Wang, William Yang
The growth of social media has encouraged the written use of African American Vernacular English (AAVE), which has traditionally been used only in oral contexts. However, NLP models have historically been developed using dominant English varieties, such as Standard American English (SAE), due to text corpora availability. We investigate the performance of GPT-2 on AAVE text by creating a dataset of intent-equivalent parallel AAVE/SAE tweet pairs, thereby isolating syntactic structure and AAVE- or SAE-specific language for each pair. We evaluate each sample and its GPT-2 generated text with pretrained sentiment classifiers and find that while AAVE text results in more classifications of negative sentiment than SAE, the use of GPT-2 generally increases occurrences of positive sentiment for both. Additionally, we conduct human evaluation of AAVE and SAE text generated with GPT-2 to compare contextual rigor and overall quality.
- Europe > Italy > Tuscany > Florence (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (6 more...)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.47)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.87)