Law
An Arbitration Control for an Ensemble of Diversified DQN variants in Continual Reinforcement Learning
Deep reinforcement learning (RL) models, despite their efficiency in learning an optimal policy in static environments, easily loses previously learned knowledge (i.e., catastrophic forgetting). It leads RL models to poor performance in continual reinforcement learning (CRL) scenarios. To address this, we present an arbitration control mechanism over an ensemble of RL agents. It is motivated by and closely aligned with how humans make decisions in a CRL context using an arbitration control of multiple RL agents in parallel as observed in the prefrontal cortex. We integrated two key ideas into our model: (1) an ensemble of RLs (i.e., DQN variants) explicitly trained to have diverse value functions and (2) an arbitration control that prioritizes agents with higher reliability (i.e., less error) in recent trials. We propose a framework for CRL, an A rbitration C ontrol for an E nsemble of D iversified DQN variants ( ACED-DQN). We demonstrate significant performance improvements in both static and continual environments, supported by empirical evidence showing the effectiveness of arbitration control over diversified DQNs during training. In this work, we introduced a framework that enables RL agents to continuously learn, with inspiration from the human brain.
Using LLMs to create analytical datasets: A case study of reconstructing the historical memory of Colombia
Anderson, David, Benitez, Galia, Bjarnadottir, Margret, Reyya, Shriyan
Colombia has been submerged in decades of armed conflict, yet until recently, the systematic documentation of violence was not a priority for the Colombian government. This has resulted in a lack of publicly available conflict information and, consequently, a lack of historical accounts. This study contributes to Colombia's historical memory by utilizing GPT, a large language model (LLM), to read and answer questions about over 200,000 violence-related newspaper articles in Spanish. We use the resulting dataset to conduct both descriptive analysis and a study of the relationship between violence and the eradication of coca crops, offering an example of policy analyses that such data can support. Our study demonstrates how LLMs have opened new research opportunities by enabling examinations of large text corpora at a previously infeasible depth.
Mitigation of Gender and Ethnicity Bias in AI-Generated Stories through Model Explanations
Dimgba, Martha O., Oba, Sharon, Agrawal, Ameeta, Giabbanelli, Philippe J.
Language models have been shown to propagate social bias through their output, particularly in the representation of gender and ethnicity. This paper investigates gender and ethnicity biases in AI-generated occupational stories. Representation biases are measured before and after applying our proposed mitigation strategy, Bias Analysis and Mitigation through Explanation (BAME), revealing improvements in demographic representation ranging from 2% to 20%. BAME leverages model-generated explanations to inform targeted prompt engineering, effectively reducing biases without modifying model parameters. By analyzing stories generated across 25 occupational groups, three large language models (Claude 3.5 Sonnet, Llama 3.1 70B Instruct, and GPT-4 Turbo), and multiple demographic dimensions, we identify persistent patterns of overrepresentation and underrepresentation linked to training data stereotypes. Our findings demonstrate that guiding models with their own internal reasoning mechanisms can significantly enhance demographic parity, thereby contributing to the development of more transparent generative AI systems.
Context Engineering for Trustworthiness: Rescorla Wagner Steering Under Mixed and Inappropriate Contexts
Wang, Rushi, Liu, Jiateng, Qian, Cheng, Shen, Yifan, Pan, Yanzhou, Xu, Zhaozhuo, Abbasi, Ahmed, Ji, Heng, Zhang, Denghui
Incorporating external context can significantly enhance the response quality of Large Language Models (LLMs). However, real-world contexts often mix relevant information with disproportionate inappropriate content, posing reliability risks. How do LLMs process and prioritize mixed context? To study this, we introduce the Poisoned Context Testbed, pairing queries with real-world contexts containing relevant and inappropriate content. Inspired by associative learning in animals, we adapt the Rescorla-Wagner (RW) model from neuroscience to quantify how competing contextual signals influence LLM outputs. Our adapted model reveals a consistent behavioral pattern: LLMs exhibit a strong tendency to incorporate information that is less prevalent in the context. This susceptibility is harmful in real-world settings, where small amounts of inappropriate content can substantially degrade response quality. Empirical evaluations on our testbed further confirm this vulnerability. To tackle this, we introduce RW-Steering, a two-stage finetuning-based approach that enables the model to internally identify and ignore inappropriate signals. Unlike prior methods that rely on extensive supervision across diverse context mixtures, RW-Steering generalizes robustly across varying proportions of inappropriate content. Experiments show that our best fine-tuned model improves response quality by 39.8% and reverses the undesirable behavior curve, establishing RW-Steering as a robust, generalizable context engineering solution for improving LLM safety in real-world use.
Emotionally-Aware Agents for Dispute Resolution
Rakshit, Sushrita, Hale, James, Chawla, Kushal, Brett, Jeanne M., Gratch, Jonathan
--In conflict, people use emotional expressions to shape their counterparts' thoughts, feelings, and actions. This paper explores whether automatic text emotion recognition offers insight into this influence in the context of dispute resolution. Prior work has shown the promise of such methods in negotiations; however, disputes evoke stronger emotions and different social processes. We use a large corpus of buyer-seller dispute dialogues to investigate how emotional expressions shape subjective and objective outcomes. We further demonstrate that large-language models yield considerably greater explanatory power than previous methods for emotion intensity annotation and better match the decisions of human annotators. Findings support existing theoretical models for how emotional expressions contribute to conflict escalation and resolution and suggest that agent-based systems could be useful in managing disputes by recognizing and potentially mitigating emotional escalation. Emotional expressions serve essential social functions in human relationships. They convey one's beliefs, desires, and intentions -- shaping the beliefs, desires, and intentions of interaction partners [1], [2]. People high in emotional intelligence achieve more success in navigating emotional relationships [3], and there exists growing interest in creating AI agents that understand and enact these social functions [4], [5]. Prior work suggests that emotionally-aware agents are suitable for a growing list of applications, including teaching people to convey emotions effectively [6], improving human-agent interaction [7], detecting and moderating toxic communication [8], and serving as methodological tools for studying human emotion [9]. This paper examines the capacity of agents to understand human emotional expressions in the context of text-based dispute resolution. Disputes arise when one party in a relationship (an individual, group, or nation) levies a claim that another party refuses to accept, thus threatening the future of the relationship [10].
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Zeng, Zhiyuan, Liu, Jiashuo, Chen, Siyuan, He, Tianci, Liao, Yali, Tian, Yixiao, Wang, Jinpeng, Wang, Zaiyuan, Yang, Yang, Yin, Lingyue, Yin, Mingren, Zhu, Zhenwei, Cai, Tianle, Chen, Zehui, Chen, Jiecao, Du, Yantao, Gao, Xiang, Guo, Jiacheng, Hu, Liang, Jiao, Jianpeng, Li, Xiangsheng, Liu, Jingkai, Ni, Shuang, Wen, Zhoufutu, Zhang, Ge, Zhang, Kaiyuan, Zhou, Xin, Blanchet, Jose, Qiu, Xipeng, Wang, Mengdi, Huang, Wenhao
Future prediction is a complex task for LLM agents, requiring a high level of analytical thinking, information gathering, contextual understanding, and decision-making under uncertainty. Agents must not only gather and interpret vast amounts of dynamic information but also integrate diverse data sources, weigh uncertainties, and adapt predictions based on emerging trends, just as human experts do in fields like politics, economics, and finance. Despite its importance, no large-scale benchmark exists for evaluating agents on future prediction, largely due to challenges in handling real-time updates and retrieving timely, accurate answers. To address this, we introduce $\textbf{FutureX}$, a dynamic and live evaluation benchmark specifically designed for LLM agents performing future prediction tasks. FutureX is the largest and most diverse live benchmark for future prediction, supporting real-time daily updates and eliminating data contamination through an automated pipeline for question gathering and answer collection. We evaluate 25 LLM/agent models, including those with reasoning, search capabilities, and integration of external tools such as the open-source Deep Research Agent and closed-source Deep Research models. This comprehensive evaluation assesses agents' adaptive reasoning and performance in dynamic environments. Additionally, we provide in-depth analyses of agents' failure modes and performance pitfalls in future-oriented tasks, including the vulnerability to fake web pages and the temporal validity. Our goal is to establish a dynamic, contamination-free evaluation standard that drives the development of LLM agents capable of performing at the level of professional human analysts in complex reasoning and predictive thinking.
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
Chen, Runjin, Arditi, Andy, Sleight, Henry, Evans, Owain, Lindsey, Jack
Large language models interact with users through a simulated 'Assistant' persona. While the Assistant is typically trained to be helpful, harmless, and honest, it sometimes deviates from these ideals. In this paper, we identify directions in the model's activation space-persona vectors-underlying several traits, such as evil, sycophancy, and propensity to hallucinate. We confirm that these vectors can be used to monitor fluctuations in the Assistant's personality at deployment time. We then apply persona vectors to predict and control personality shifts that occur during training. We find that both intended and unintended personality changes after finetuning are strongly correlated with shifts along the relevant persona vectors. These shifts can be mitigated through post-hoc intervention, or avoided in the first place with a new preventative steering method. Moreover, persona vectors can be used to flag training data that will produce undesirable personality changes, both at the dataset level and the individual sample level. Our method for extracting persona vectors is automated and can be applied to any personality trait of interest, given only a natural-language description.
Social Bias in Multilingual Language Models: A Survey
Gamboa, Lance Calvin Lim, Feng, Yue, Lee, Mark
Pretrained multilingual models exhibit the same social bias as models processing English texts. This systematic review analyzes emerging research that extends bias evaluation and mitigation approaches into multilingual and non-English contexts. We examine these studies with respect to linguistic diversity, cultural awareness, and their choice of evaluation metrics and mitigation techniques. Our survey illuminates gaps in the field's dominant methodological design choices (e.g., preference for certain languages, scarcity of multilingual mitigation experiments) while cataloging common issues encountered and solutions implemented in adapting bias benchmarks across languages and cultures. Drawing from the implications of our findings, we chart directions for future research that can reinforce the multilingual bias literature's inclusivity, cross-cultural appropriateness, and alignment with state-of-the-art NLP advancements.
'Existential crisis': how Google's shift to AI has upended the online news model
When the chief executive of the Financial Times suggested at a media conference this summer that rival publishers might consider a "Nato for news" alliance to strengthen negotiations with artificial intelligence companies there was a ripple of chuckles from attendees. Yet Jon Slade's revelation that his website had seen a "pretty sudden and sustained" decline of 25% to 30% in traffic to its articles from readers arriving via internet search engines quickly made clear the serious nature of the threat the AI revolution poses. Queries typed into sites such as Google, which accounts for more than 90% of the search market, have been central to online journalism since its inception, with news providers optimising headlines and content to ensure a top ranking and revenue-raising clicks. But now Google's AI Overviews, which sit at the top of the results page and summarise responses and often negate the need to follow links to content, as well as its recently launched AI Mode tab that answers queries in a chatbot format, have prompted fears of a "Google zero" future where traffic referrals dry up. "This is the single biggest change to search I have seen in decades," says one senior editorial tech executive.
AI startup Anthropic agrees to pay 1.5bn to settle book piracy lawsuit
The artificial intelligence company Anthropic has agreed to pay 1.5bn to settle a class-action lawsuit by book authors who say the company took pirated copies of their works to train its chatbot. The company has agreed to pay authors about 3,000 for each of an estimated 500,000 books covered by the settlement. "It is the first of its kind in the AI era." A trio of authors – thriller novelist Andrea Bartz and nonfiction writers Charles Graeber and Kirk Wallace Johnson – sued last year and now represent a broader group of writers and publishers whose books Anthropic downloaded to train its chatbot Claude. If Anthropic had not settled, experts say losing the case after a scheduled December trial could have cost the San Francisco-based company even more money.