Goto

Collaborating Authors

 dispute


Toffee Crisp and Blue Riband can't be called chocolate any more

BBC News

Toffee Crisp and Blue Riband can't be called chocolate any more Toffee Crisp and Blue Riband bars can no longer be called chocolate after maker Nestle changed their recipes. To be described as milk chocolate in the UK a product needs to have at least 20% cocoa solids and 20% milk solids, a level each product fell below once a higher amount of cheaper vegetable fat was used. Nestle said its reformulations were needed due to higher input costs but were carefully developed and sensory tested and there were no plans to alter the recipes of other chocolate products. As many ingredient costs, such as cocoa and butter, increased food companies have altered recipes to use less of the expensive ingredients, as well as shrinking serving sizes. Nestle now describes the treats as being encased in a smooth milk chocolate flavour coating rather than being covered in milk chocolate.


ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation

Zhou, Siying, Wu, Yiquan, Chen, Hui, Hu, Xavier, Kuang, Kun, Jatowt, Adam, Hu, Ming, Zheng, Chunyan, Wu, Fei

arXiv.org Artificial Intelligence

Legal claims refer to the plaintiff's demands in a case and are essential to guiding judicial reasoning and case resolution. While many works have focused on improving the efficiency of legal professionals, the research on helping non-professionals (e.g., plaintiffs) remains unexplored. This paper explores the problem of legal claim generation based on the given case's facts. First, we construct ClaimGen-CN, the first dataset for Chinese legal claim generation task, from various real-world legal disputes. Additionally, we design an evaluation metric tailored for assessing the generated claims, which encompasses two essential dimensions: factuality and clarity. Building on this, we conduct a comprehensive zero-shot evaluation of state-of-the-art general and legal-domain large language models. Our findings highlight the limitations of the current models in factual precision and expressive clarity, pointing to the need for more targeted development in this domain. To encourage further exploration of this important task, we will make the dataset publicly available.


AgentChangeBench: A Multi-Dimensional Evaluation Framework for Goal-Shift Robustness in Conversational AI

Rana, Manik, Man, Calissa, Msiiwa, Anotida Expected, Paine, Jeffrey, Zhu, Kevin, Dev, Sunishchal, Sharma, Vasu, R, Ahan M

arXiv.org Artificial Intelligence

Goal changes are a defining feature of real world multi-turn interactions, yet current agent benchmarks primarily evaluate static objectives or one-shot tool use. We introduce AgentChangeBench, a benchmark explicitly designed to measure how tool augmented language model agents adapt to mid dialogue goal shifts across three enterprise domains. Our framework formalizes evaluation through four complementary metrics: Task Success Rate (TSR) for effectiveness, Tool Use Efficiency (TUE) for reliability, Tool Call Redundancy Rate (TCRR) for wasted effort, and Goal-Shift Recovery Time (GSRT) for adaptation latency. AgentChangeBench comprises 2,835 task sequences and five user personas, each designed to trigger realistic shift points in ongoing workflows. Using this setup, we evaluate several frontier models and uncover sharp contrasts obscured by traditional $\text{pass}@k$ scores: for example, GPT-4o reaches $92.2\%$ recovery on airline booking shifts while Gemini collapses to $48.6\%$, and retail tasks show near perfect parameter validity yet redundancy rates above $80\%$, revealing major inefficiencies. These findings demonstrate that high raw accuracy does not imply robustness under dynamic goals, and that explicit measurement of recovery time and redundancy is essential. AgentChangeBench establishes a reproducible testbed for diagnosing and improving agent resilience in realistic enterprise settings.


Emotionally-Aware Agents for Dispute Resolution

Rakshit, Sushrita, Hale, James, Chawla, Kushal, Brett, Jeanne M., Gratch, Jonathan

arXiv.org Artificial Intelligence

--In conflict, people use emotional expressions to shape their counterparts' thoughts, feelings, and actions. This paper explores whether automatic text emotion recognition offers insight into this influence in the context of dispute resolution. Prior work has shown the promise of such methods in negotiations; however, disputes evoke stronger emotions and different social processes. We use a large corpus of buyer-seller dispute dialogues to investigate how emotional expressions shape subjective and objective outcomes. We further demonstrate that large-language models yield considerably greater explanatory power than previous methods for emotion intensity annotation and better match the decisions of human annotators. Findings support existing theoretical models for how emotional expressions contribute to conflict escalation and resolution and suggest that agent-based systems could be useful in managing disputes by recognizing and potentially mitigating emotional escalation. Emotional expressions serve essential social functions in human relationships. They convey one's beliefs, desires, and intentions -- shaping the beliefs, desires, and intentions of interaction partners [1], [2]. People high in emotional intelligence achieve more success in navigating emotional relationships [3], and there exists growing interest in creating AI agents that understand and enact these social functions [4], [5]. Prior work suggests that emotionally-aware agents are suitable for a growing list of applications, including teaching people to convey emotions effectively [6], improving human-agent interaction [7], detecting and moderating toxic communication [8], and serving as methodological tools for studying human emotion [9]. This paper examines the capacity of agents to understand human emotional expressions in the context of text-based dispute resolution. Disputes arise when one party in a relationship (an individual, group, or nation) levies a claim that another party refuses to accept, thus threatening the future of the relationship [10].


DRAssist: Dispute Resolution Assistance using Large Language Models

Pawar, Sachin, Apte, Manoj, Palshikar, Girish K., Ali, Basit, Ramrakhiyani, Nitin

arXiv.org Artificial Intelligence

Disputes between two parties occur in almost all domains such as taxation, insurance, banking, healthcare, etc. The disputes are generally resolved in a specific forum (e.g., consumer court) where facts are presented, points of disagreement are discussed, arguments as well as specific demands of the parties are heard, and finally a human judge resolves the dispute by often favouring one of the two parties. In this paper, we explore the use of large language models (LLMs) as assistants for the human judge to resolve such disputes, as part of our DRAssist system. We focus on disputes from two specific domains -- automobile insurance and domain name disputes. DRAssist identifies certain key structural elements (e.g., facts, aspects or disagreement, arguments) of the disputes and summarizes the unstructured dispute descriptions to produce a structured summary for each dispute. We then explore multiple prompting strategies with multiple LLMs for their ability to assist in resolving the disputes in these domains. In DRAssist, these LLMs are prompted to produce the resolution output at three different levels -- (i) identifying an overall stronger party in a dispute, (ii) decide whether each specific demand of each contesting party can be accepted or not, (iii) evaluate whether each argument by each contesting party is strong or weak. We evaluate the performance of LLMs on all these tasks by comparing them with relevant baselines using suitable evaluation metrics.


Microsoft and OpenAI's AGI Fight Is Bigger Than a Contract

WIRED

I first learned about The Clause from Microsoft CEO Satya Nadella. During an interview with him in May 2023, I asked about the deal between Microsoft and OpenAI that granted his company exclusive access to the startup's groundbreaking AI technology. I knew the contract had set a cap on how much profit Microsoft could make from the arrangement, and I asked him what would happen if and when that point was reached. The answer was a bit puzzling. "Fundamentally, their long-term idea is we get to superintelligence," he told me.


A Dual-Layered Evaluation of Geopolitical and Cultural Bias in LLMs

Kim, Sean, Kim, Hyuhng Joon

arXiv.org Artificial Intelligence

As large language models (LLMs) are increasingly deployed across diverse linguistic and cultural contexts, understanding their behavior in both factual and disputable scenarios is essential, especially when their outputs may shape public opinion or reinforce dominant narratives. In this paper, we define two types of bias in LLMs: model bias (bias stemming from model training) and inference bias (bias induced by the language of the query), through a two-phase evaluation. Phase 1 evaluates LLMs on factual questions where a single verifiable answer exists, assessing whether models maintain consistency across different query languages. Phase 2 expands the scope by probing geopolitically sensitive disputes, where responses may reflect culturally embedded or ideologically aligned perspectives. We construct a manually curated dataset spanning both factual and disputable QA, across four languages and question types. The results show that Phase 1 exhibits query language induced alignment, while Phase 2 reflects an interplay between the model's training context and query language. This paper offers a structured framework for evaluating LLM behavior across neutral and sensitive topics, providing insights for future LLM deployment and culturally aware evaluation practices in multilingual contexts.


Alabama paid a law firm millions to defend its prisons. It used AI and turned in fake citations

The Guardian

In less than a year-and-a-half, Frankie Johnson, a man incarcerated at the William E Donaldson prison outside Birmingham, Alabama, says he was stabbed around 20 times. In December of 2019, Johnson says, he was stabbed "at least nine times" in his housing unit. In March of 2020, an officer handcuffed him to a desk following a group therapy meeting, and left the unit, after which another prisoner came in and stabbed him five times. In November of the same year, Johnson says, he was handcuffed by an officer and brought to the prison yard, where another prisoner attacked him with an ice pick, stabbing him "five to six times", as two correctional officers looked on. According to Johnson, one of the officers had actually encouraged his attacker to carry out the assault in retaliation for a previous argument between Johnson and the officer.


Politico's Newsroom Is Starting a Legal Battle With Management Over AI

WIRED

Politico became one of the first newsrooms last year to win a union contract that included rules on how the media outlet can deploy artificial intelligence. The PEN Guild, which represents Politico and its sister publication, environment and energy site E&E News, is now gearing up for another first. The union's members allege that the AI provisions in their contract have been violated, and they're preparing for a groundbreaking legal dispute with management. The outcome could set a precedent for how much input journalists ultimately have over how AI is used in their newsrooms. Last year, Politico began publishing AI-generated live news summaries during big political events like the Democratic National Convention and the US vice presidential debates.

  Country: North America > United States (0.17)
  Industry:

Labeling Case Similarity based on Co-Citation of Legal Articles in Judgment Documents with Empirical Dispute-Based Evaluation

Liu, Chao-Lin, Wu, Po-Hsien, Yu, Yi-Ting

arXiv.org Artificial Intelligence

This report addresses the challenge of limited labeled datasets for developing legal recommender systems, particularly in specialized domains like labor disputes. We propose a new approach leveraging the co-citation of legal articles within cases to establish similarity and enable algorithmic annotation. This method draws a parallel to the concept of case co-citation, utilizing cited articles as indicators of shared legal issues. To evaluate the labeled results, we employ a system that recommends similar cases based on plaintiffs' accusations, defendants' rebuttals, and points of disputes. The evaluation demonstrates that the recommender, with finetuned text embedding models and a reasonable BiLSTM module can recommend labor cases whose similarity was measured by the co-citation of the legal articles. This research contributes to the development of automated annotation techniques for legal documents, particularly in areas with limited access to comprehensive legal databases.