AITopics | name2

Collaborating Authors

name2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Say It Another Way: Auditing LLMs with a User-Grounded Automated Paraphrasing Framework

Chataigner, Cléa, Ma, Rebecca, Ganesh, Prakhar, Chen, Yuhao, Taïk, Afaf, Creager, Elliot, Farnadi, Golnoosh

arXiv.org Artificial IntelligenceOct-10-2025

Large language models (LLMs) are highly sensitive to subtle changes in prompt phrasing, posing challenges for reliable auditing. Prior methods often apply unconstrained prompt paraphrasing, which risk missing linguistic and demographic factors that shape authentic user interactions. We introduce AUGMENT (Automated User-Grounded Modeling and Evaluation of Natural Language Transformations), a framework for generating controlled paraphrases, grounded in user behaviors. AUGMENT leverages linguistically informed rules and enforces quality through checks on instruction adherence, semantic similarity, and realism, ensuring paraphrases are both reliable and meaningful for auditing. Through case studies on the BBQ and MMLU datasets, we show that controlled paraphrases uncover systematic weaknesses that remain obscured under unconstrained variation. These results highlight the value of the AUGMENT framework for reliable auditing.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.03563

Country:

Europe (0.68)
Asia > Middle East > UAE (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts

Levy, Sharon, Adler, William D., Karver, Tahilin Sanchez, Dredze, Mark, Kaufman, Michelle R.

arXiv.org Artificial IntelligenceOct-14-2024

Large language models (LLMs) acquire beliefs about gender from training data and can therefore generate text with stereotypical gender attitudes. Prior studies have demonstrated model generations favor one gender or exhibit stereotypes about gender, but have not investigated the complex dynamics that can influence model reasoning and decision-making involving gender. We study gender equity within LLMs through a decision-making lens with a new dataset, DeMET Prompts, containing scenarios related to intimate, romantic relationships. We explore nine relationship configurations through name pairs across three name lists (men, women, neutral). We investigate equity in the context of gender roles through numerous lenses: typical and gender-neutral names, with and without model safety enhancements, same and mixed-gender relationships, and egalitarian versus traditional scenarios across various topics. While all models exhibit the same biases (women favored, then those with gender-neutral names, and lastly men), safety guardrails reduce bias. In addition, models tend to circumvent traditional male dominance stereotypes and side with 'traditionally female' individuals more often, suggesting relationships are viewed as a female domain by the models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.11084

Country:

Asia > Singapore (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

Huang, Yufei, Xiong, Deyi

arXiv.org Artificial IntelligenceJun-28-2023

Holistically measuring societal biases of large language models is crucial for detecting and reducing ethical risks in highly capable AI models. In this work, we present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models, covering stereotypes and societal biases in 14 social dimensions related to Chinese culture and values. The curation process contains 4 essential steps: bias identification via extensive literature review, ambiguous context generation, AI-assisted disambiguous context generation, snd manual review \& recomposition. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. The dataset exhibits wide coverage and high diversity. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories. Additionally, we observe from our experiments that fine-tuned models could, to a certain extent, heed instructions and avoid generating outputs that are morally harmful in some types, in the way of "moral self-correction". Our dataset and results are publicly available at \href{https://github.com/YFHuangxxxx/CBBQ}{https://github.com/YFHuangxxxx/CBBQ}, offering debiasing research opportunities to a widened community.

large language model, machine learning, name1, (18 more...)

arXiv.org Artificial Intelligence

2306.16244

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Germany (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
(36 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative

Weissweiler, Leonie, Hofmann, Valentin, Köksal, Abdullatif, Schütze, Hinrich

arXiv.org Artificial IntelligenceOct-24-2022

Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics. Rather than rules that operate on lexical items, it posits constructions as the central building blocks of language, i.e., linguistic units of different granularity that combine syntax and semantics. As a first step towards assessing the compatibility of CxG with the syntactic and semantic knowledge demonstrated by state-of-the-art pretrained language models (PLMs), we present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC). We conduct experiments examining the classification accuracy of a syntactic probe on the one hand and the models' behaviour in a semantic application task on the other, with BERT, RoBERTa, and DeBERTa as the example PLMs. Our results show that all three investigated PLMs are able to recognise the structure of the CC but fail to use its meaning. While human-like performance of PLMs on many NLP tasks has been alleged, this indicates that PLMs still suffer from substantial shortcomings in central domains of linguistic knowledge.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.13181

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
(17 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI

Tarunesh, Ishan, Aditya, Somak, Choudhury, Monojit

arXiv.org Artificial IntelligenceDec-4-2021

Natural Language Inference (NLI) is considered a representative task to test natural language understanding (NLU). In this work, we propose an extensible framework to collectively yet categorically test diverse Logical reasoning capabilities required for NLI (and by extension, NLU). Motivated by behavioral testing, we create a semi-synthetic large test-bench (363 templates, 363k examples) and an associated framework that offers following utilities: 1) individually test and analyze reasoning capabilities along 17 reasoning dimensions (including pragmatic reasoning), 2) design experiments to study cross-capability information content (leave one out or bring one in); and 3) the synthetic nature enable us to control for artifacts and biases. The inherited power of automated test case instantiation from free-form natural language templates (using CheckList), and a well-defined taxonomy of capabilities enable us to extend to (cognitively) harder test cases while varying the complexity of natural language. Through our analysis of state-of-the-art NLI systems, we observe that our benchmark is indeed hard (and non-trivial even with training on additional resources). Some capabilities stand out as harder. Further fine-grained analysis and fine-tuning experiments reveal more insights about these capabilities and the models -- supporting and extending previous observations. Towards the end we also perform an user-study, to investigate whether behavioral information can be utilised to generalize much better for some models compared to others.

machine learning, natural language, template, (20 more...)

arXiv.org Artificial Intelligence

2112.02333

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(17 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Trusting RoBERTa over BERT: Insights from CheckListing the Natural Language Inference Task

Tarunesh, Ishan, Aditya, Somak, Choudhury, Monojit

arXiv.org Artificial IntelligenceJul-15-2021

The recent state-of-the-art natural language understanding (NLU) systems often behave unpredictably, failing on simpler reasoning examples. Despite this, there has been limited focus on quantifying progress towards systems with more predictable behavior. We think that reasoning capability-wise behavioral summary is a step towards bridging this gap. We create a CheckList test-suite (184K examples) for the Natural Language Inference (NLI) task, a representative NLU task. We benchmark state-of-the-art NLI systems on this test-suite, which reveals fine-grained insights into the reasoning abilities of BERT and RoBERTa. Our analysis further reveals inconsistencies of the models on examples derived from the same template or distinct templates but pertaining to same reasoning capability, indicating that generalizing the models' behavior through observations made on a CheckList is non-trivial. Through an user-study, we find that users were able to utilize behavioral information to generalize much better for examples predicted from RoBERTa, compared to that of BERT.

name1, name2, template, (14 more...)

arXiv.org Artificial Intelligence

2107.07229

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Russia (0.05)
Asia > Russia (0.05)
(15 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.46)

Add feedback