AITopics

Hande, Adeep, Sundararajan, Kishorekumar, Hamidian, Sardar, Ture, Ferhan

Content Moderation in TV Search: Balancing Policy Compliance, Relevance, and User Experience

Millions of people rely on search functionality to find and explore content on entertainment platforms. Modern search systems use a combination of candidate generation and ranking approaches, with advanced methods leveraging deep learning and LLM-based techniques to retrieve, generate, and categorize search results. Despite these advancements, search algorithms can still surface inappropriate or irrelevant content due to factors like model unpredictability, metadata errors, or overlooked design flaws. Such issues can misalign with product goals and user expectations, potentially harming user trust and business outcomes. In this work, we introduce an additional monitoring layer using Large Language Models (LLMs) to enhance content moderation. This additional layer flags content if the user did not intend to search for it. This approach serves as a baseline for product quality assurance, with collected feedback used to refine the initial retrieval mechanisms of the search model, ensuring a safer and more reliable user experience.

large language model, machine learning, natural language, (16 more...)

doi: 10.1145/3726302.3731962

2505.17207

Country:

North America > United States > District of Columbia > Washington (0.05)
Europe > Italy (0.05)
North America > United States > Indiana > Monroe County > Bloomington (0.04)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (0.69)
Law (0.48)
Media > Television (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Arbabi, Alireza, Kerschbaum, Florian

Relative Bias: A Comparative Framework for Quantifying Bias in LLMs

arXiv.org Machine LearningMay-26-2025

The growing deployment of large language models (LLMs) has amplified concerns regarding their inherent biases, raising critical questions about their fairness, safety, and societal impact. However, quantifying LLM bias remains a fundamental challenge, complicated by the ambiguity of what "bias" entails. This challenge grows as new models emerge rapidly and gain widespread use, while introducing potential biases that have not been systematically assessed. In this paper, we propose the Relative Bias framework, a method designed to assess how an LLM's behavior deviates from other LLMs within a specified target domain. We introduce two complementary methodologies: (1) Embedding Transformation analysis, which captures relative bias patterns through sentence representations over the embedding space, and (2) LLM-as-a-Judge, which employs a language model to evaluate outputs comparatively. Applying our framework to several case studies on bias and alignment scenarios following by statistical tests for validation, we find strong alignment between the two scoring methods, offering a systematic, scalable, and statistically grounded approach for comparative bias analysis in LLMs.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Machine Learning

2505.17131

Country:

Asia > China (0.16)
North America > United States (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Media (0.68)
Law > Civil Rights & Constitutional Law (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

AI Literacy for Legal AI Systems: A practical approach

Gultekin-Varkonyi, Gizem

Legal AI systems are increasingly being adopted by judicial and legal system deployers and providers worldwide to support a range of applications. While they offer potential benefits such as reducing bias, increasing efficiency, and improving accountability, they also pose significant risks, requiring a careful balance between opportunities, and legal and ethical development and deployment. AI literacy, as a legal requirement under the EU AI Act and a critical enabler of ethical AI for deployers and providers, could be a tool to achieve this. The article introduces the term "legal AI systems" and then analyzes the concept of AI literacy and the benefits and risks associated with these systems. This analysis is linked to a broader AI-L concept for organizations that deal with legal AI systems. The outcome of the article, a roadmap questionnaire as a practical tool for developers and providers to assess risks, benefits, and stakeholder concerns, could be useful in meeting societal and regulatory expectations for legal AI.

ai system, machine learning, natural language, (18 more...)

2505.18006

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (1.00)

Industry:

Law > Criminal Law (1.00)
Information Technology > Security & Privacy (1.00)
Education > Educational Setting (0.93)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Applied AI (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Sun, Jingyi, Warren, Greta, Shklovski, Irina, Augenstein, Isabelle

Explaining Sources of Uncertainty in Automated Fact-Checking

Understanding sources of a model's uncertainty regarding its predictions is crucial for effective human-AI collaboration. Prior work proposes using numerical uncertainty or hedges ("I'm not sure, but ..."), which do not explain uncertainty that arises from conflicting evidence, leaving users unable to resolve disagreements or rely on the output. We introduce CLUE (Conflict-and-Agreement-aware Language-model Uncertainty Explanations), the first framework to generate natural language explanations of model uncertainty by (i) identifying relationships between spans of text that expose claim-evidence or inter-evidence conflicts and agreements that drive the model's predictive uncertainty in an unsupervised way, and (ii) generating explanations via prompting and attention steering that verbalize these critical interactions. Across three language models and two fact-checking datasets, we show that CLUE produces explanations that are more faithful to the model's uncertainty and more consistent with fact-checking decisions than prompting for uncertainty explanations without span-interaction guidance. Human evaluators judge our explanations to be more helpful, more informative, less redundant, and more logically consistent with the input than this baseline. CLUE requires no fine-tuning or architectural changes, making it plug-and-play for any white-box language model. By explicitly linking uncertainty to evidence conflicts, it offers practical support for fact-checking and generalises readily to other tasks that require reasoning over complex information.

explanation, large language model, natural language, (18 more...)

2505.17855

Country:

Europe (1.00)
North America > United States (0.93)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Industry:

Information Technology > Security & Privacy (0.68)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Hutiri, Wiebke, Cimpoi, Mircea, Scheuerman, Morgan, Matthews, Victoria, Xiang, Alice

TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation

Dataset transparency is a key enabler of responsible AI, but insights into multimodal dataset attributes that impact trustworthy and ethical aspects of AI applications remain scarce and are difficult to compare across datasets. To address this challenge, we introduce Trustworthy and Ethical Dataset Indicators (TEDI) that facilitate the systematic, empirical analysis of dataset documentation. TEDI encompasses 143 fine-grained indicators that characterize trustworthy and ethical attributes of multimodal datasets and their collection processes. The indicators are framed to extract verifiable information from dataset documentation. Using TEDI, we manually annotated and analyzed over 100 multimodal datasets that include human voices. We further annotated data sourcing, size, and modality details to gain insights into the factors that shape trustworthy and ethical dimensions across datasets. We find that only a select few datasets have documented attributes and practices pertaining to consent, privacy, and harmful content indicators. The extent to which these and other ethical indicators are addressed varies based on the data collection method, with documentation of datasets collected via crowdsourced and direct collection approaches being more likely to mention them. Scraping dominates scale at the cost of ethical indicators, but is not the only viable collection method. Our approach and empirical insights contribute to increasing dataset transparency along trustworthy and ethical dimensions and pave the way for automating the tedious task of extracting information from dataset documentation in future.

artificial intelligence, machine learning, natural language, (21 more...)

2505.17841

Country:

North America > United States (0.46)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Law (1.00)
Health & Medicine (1.00)
Government (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
(3 more...)

Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?

Lu, Chengda, Fan, Xiaoyu, Huang, Yu, Xu, Rongwu, Li, Jijie, Xu, Wei

Jailbreak attacks have been observed to largely fail against recent reasoning models enhanced by Chain-of-Thought (CoT) reasoning. However, the underlying mechanism remains underexplored, and relying solely on reasoning capacity may raise security concerns. In this paper, we try to answer the question: Does CoT reasoning really reduce harmfulness from jailbreaking? Through rigorous theoretical analysis, we demonstrate that CoT reasoning has dual effects on jailbreaking harmfulness. Based on the theoretical insights, we propose a novel jailbreak method, FicDetail, whose practical performance validates our theoretical findings.

large language model, machine learning, natural language, (16 more...)

2505.1765

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)
(2 more...)

Ferreira, Juliett Suárez, Slavkovik, Marija, Casillas, Jorge

Transparency and Proportionality in Post-Processing Algorithmic Bias Correction

Algorithmic decision-making systems sometimes produce errors or skewed predictions toward a particular group, leading to unfair results. Debiasing practices, applied at different stages of the development of such systems, occasionally introduce new forms of unfairness or exacerbate existing inequalities. We focus on post-processing techniques that modify algorithmic predictions to achieve fairness in classification tasks, examining the unintended consequences of these interventions. To address this challenge, we develop a set of measures that quantify the disparity in the flips applied to the solution in the post-processing stage. The proposed measures will help practitioners: (1) assess the proportionality of the debiasing strategy used, (2) have transparency to explain the effects of the strategy in each group, and (3) based on those results, analyze the possibility of the use of some other approaches for bias mitigation or to solve the problem. We introduce a methodology for applying the proposed metrics during the post-processing stage and illustrate its practical application through an example. This example demonstrates how analyzing the proportionality of the debiasing strategy complements traditional fairness metrics, providing a deeper perspective to ensure fairer outcomes across all groups.

artificial intelligence, machine learning, proportionality, (18 more...)

2505.17525

Country:

North America > United States (0.68)
Europe (0.46)

Genre: Research Report (0.50)

Industry:

Law (0.68)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

PD$^3$: A Project Duplication Detection Framework via Adapted Multi-Agent Debate

Bao, Dezheng, Yang, Yueci, Chen, Xin, Jiang, Zhengxuan, Fei, Zeguo, Zhang, Daoze, Huang, Xuanwen, Chen, Junru, Yu, Chutian, Yuan, Xiang, Yang, Yang

Project duplication detection is critical for project quality assessment, as it improves resource utilization efficiency by preventing investing in newly proposed project that have already been studied. It requires the ability to understand high-level semantics and generate constructive and valuable feedback. Existing detection methods rely on basic word- or sentence-level comparison or solely apply large language models, lacking valuable insights for experts and in-depth comprehension of project content and review criteria. To tackle this issue, we propose PD$^3$, a Project Duplication Detection framework via adapted multi-agent Debate. Inspired by real-world expert debates, it employs a fair competition format to guide multi-agent debate to retrieve relevant projects. For feedback, it incorporates both qualitative and quantitative analysis to improve its practicality. Over 800 real-world power project data spanning more than 20 specialized fields are used to evaluate the framework, demonstrating that our method outperforms existing approaches by 7.43% and 8.00% in two downstream tasks. Furthermore, we establish an online platform, Review Dingdang, to assist power experts, saving 5.73 million USD in initial detection on more than 100 newly proposed projects.

artificial intelligence, detection, natural language, (15 more...)

2505.17492

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Energy (0.69)
Law (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions

Toubia, Olivier, Gui, George Z., Peng, Tianyi, Merlau, Daniel J., Li, Ang, Chen, Haozhe

LLM-based digital twin simulation, where large language models are used to emulate individual human behavior, holds great promise for research in AI, social science, and digital experimentation. However, progress in this area has been hindered by the scarcity of real, individual-level datasets that are both large and publicly available. This lack of high-quality ground truth limits both the development and validation of digital twin methodologies. To address this gap, we introduce a large-scale, public dataset designed to capture a rich and holistic view of individual human behavior. We survey a representative sample of $N = 2,058$ participants (average 2.42 hours per person) in the US across four waves with 500 questions in total, covering a comprehensive battery of demographic, psychological, economic, personality, and cognitive measures, as well as replications of behavioral economics experiments and a pricing survey. The final wave repeats tasks from earlier waves to establish a test-retest accuracy baseline. Initial analyses suggest the data are of high quality and show promise for constructing digital twins that predict human behavior well at the individual and aggregate levels. By making the full dataset publicly available, we aim to establish a valuable testbed for the development and benchmarking of LLM-based persona simulations. Beyond LLM applications, due to its unique breadth and scale the dataset also enables broad social science research, including studies of cross-construct correlations and heterogeneous treatment effects.

large language model, machine learning, natural language, (19 more...)

2505.17479

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.94)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Consumer Health (1.00)
(14 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)