claimant
Audits Under Resource, Data, and Access Constraints: Scaling Laws For Less Discriminatory Alternatives
Cen, Sarah H., Goyal, Salil, Javed, Zaynah, Karthik, Ananya, Liang, Percy, Ho, Daniel E.
AI audits play a critical role in AI accountability and safety. One branch of the law for which AI audits are particularly salient is anti-discrimination law. Several areas of anti-discrimination law implicate the "less discriminatory alternative" (LDA) requirement, in which a protocol (e.g., model) is defensible if no less discriminatory protocol that achieves comparable performance can be found with a reasonable amount of effort. Notably, the burden of proving an LDA exists typically falls on the claimant (the party alleging discrimination). This creates a significant hurdle in AI cases, as the claimant would seemingly need to train a less discriminatory yet high-performing model, a task requiring resources and expertise beyond most litigants. Moreover, developers often shield information about and access to their model and training data as trade secrets, making it difficult to reproduce a similar model from scratch. In this work, we present a procedure enabling claimants to determine if an LDA exists, even when they have limited compute, data, information, and model access. We focus on the setting in which fairness is given by demographic parity and performance by binary cross-entropy loss. As our main result, we provide a novel closed-form upper bound for the loss-fairness Pareto frontier (PF). We show how the claimant can use it to fit a PF in the "low-resource regime," then extrapolate the PF that applies to the (large) model being contested, all without training a single large model. The expression thus serves as a scaling law for loss-fairness PFs. To use this scaling law, the claimant would require a small subsample of the train/test data. Then, the claimant can fit the context-specific PF by training as few as 7 (small) models. We stress test our main result in simulations, finding that our scaling law holds even when the exact conditions of our theory do not.
- North America > United States > California > Santa Clara County > Palo Alto (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Wisconsin (0.04)
- (4 more...)
- Law > Labor & Employment Law (1.00)
- Law > Civil Rights & Constitutional Law (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Law > Litigation (0.89)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.45)
High court tells UK lawyers to stop misuse of AI after fake case-law citations
The high court has told senior lawyers to take urgent action to prevent the misuse of artificial intelligence after dozens of fake case-law citations were put before the courts that were either completely fictitious or contained made-up passages. Lawyers are increasingly using AI systems to help them build legal arguments, but two cases this year were blighted by made-up case-law citations that were either definitely or suspected to have been generated by AI. In a 89m damages case against the Qatar National Bank, the claimants made 45 case-law citations, 18 of which turned out to be fictitious, with quotes in many of the others also bogus. The claimant admitted using publicly available AI tools and his solicitor accepted he cited the sham authorities. When Haringey Law Centre challenged the London borough of Haringey over its alleged failure to provide its client with temporary accommodation, its lawyer cited phantom case law five times.
- Europe > United Kingdom > England > Greater London > London > Haringey (0.25)
- Asia > Middle East > Qatar (0.25)
- North America > United States > New York (0.05)
- Europe > United Kingdom > Wales (0.05)
When Fairness Isn't Statistical: The Limits of Machine Learning in Evaluating Legal Reasoning
Barale, Claire, Rovatsos, Michael, Bhuta, Nehal
Legal decisions are increasingly evaluated for fairness, consistency, and bias using machine learning (ML) techniques. In high-stakes domains like refugee adjudication, such methods are often applied to detect disparities in outcomes. Yet it remains unclear whether statistical methods can meaningfully assess fairness in legal contexts shaped by discretion, normative complexity, and limited ground truth. In this paper, we empirically evaluate three common ML approaches (feature-based analysis, semantic clustering, and predictive modeling) on a large, real-world dataset of 59,000+ Canadian refugee decisions (AsyLex). Our experiments show that these methods produce divergent and sometimes contradictory signals, that predictive modeling often depends on contextual and procedural features rather than legal features, and that semantic clustering fails to capture substantive legal reasoning. We show limitations of statistical fairness evaluation, challenge the assumption that statistical regularity equates to fairness, and argue that current computational approaches fall short of evaluating fairness in legally discretionary domains. We argue that evaluating fairness in law requires methods grounded not only in data, but in legal reasoning and institutional context.
- North America > United States (0.04)
- North America > Haiti (0.04)
- North America > Canada > Manitoba > Winnipeg Metropolitan Region > Winnipeg (0.04)
- (3 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)
- Law (1.00)
- Government > Immigration & Customs (1.00)
- Education > Curriculum > Subject-Specific Education (0.46)
A Reality Check on Context Utilisation for Retrieval-Augmented Generation
Hagström, Lovisa, Marjanović, Sara Vera, Yu, Haeun, Arora, Arnav, Lioma, Christina, Maistro, Maria, Atanasova, Pepa, Augenstein, Isabelle
Retrieval-augmented generation (RAG) helps address the limitations of the parametric knowledge embedded within a language model (LM). However, investigations of how LMs utilise retrieved information of varying complexity in real-world scenarios have been limited to synthetic contexts. We introduce DRUID (Dataset of Retrieved Unreliable, Insufficient and Difficult-to-understand contexts) with real-world queries and contexts manually annotated for stance. The dataset is based on the prototypical task of automated claim verification, for which automated retrieval of real-world evidence is crucial. We compare DRUID to synthetic datasets (CounterFact, ConflictQA) and find that artificial datasets often fail to represent the complex and diverse real-world context settings. We show that synthetic datasets exaggerate context characteristics rare in real retrieved data, which leads to inflated context utilisation results, as measured by our novel ACU score. Moreover, while previous work has mainly focused on singleton context characteristics to explain context utilisation, correlations between singleton context properties and ACU on DRUID are surprisingly small compared to other properties related to context source. Overall, our work underscores the need for real-world aligned context utilisation studies to represent and improve performance in real-world RAG settings.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > Belize (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (31 more...)
- Media > Film (1.00)
- Leisure & Entertainment > Sports (1.00)
- Law (1.00)
- (3 more...)
The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal
Xie, Huiyuan, Steffek, Felix, de Faria, Joana Ribeiro, Carter, Christine, Rutherford, Jonathan
This paper explores the intersection of technological innovation and access to justice by developing a benchmark for predicting case outcomes in the UK Employment Tribunal (UKET). To address the challenge of extensive manual annotation, the study employs a large language model (LLM) for automatic annotation, resulting in the creation of the CLC-UKET dataset. The dataset consists of approximately 19,000 UKET cases and their metadata. Comprehensive legal annotations cover facts, claims, precedent references, statutory references, case outcomes, reasons and jurisdiction codes. Facilitated by the CLC-UKET data, we examine a multi-class case outcome prediction task in the UKET. Human predictions are collected to establish a performance reference for model comparison. Empirical results from baseline models indicate that finetuned transformer models outperform zero-shot and few-shot LLMs on the UKET prediction task. The performance of zero-shot LLMs can be enhanced by integrating task-related information into few-shot examples. We hope that the CLC-UKET dataset, along with human annotations and empirical findings, can serve as a valuable benchmark for employment-related dispute resolution.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
- Law > Labor & Employment Law (1.00)
- Law > Civil Rights & Constitutional Law (1.00)
- Law > Litigation (0.88)
Judgment2vec: Apply Graph Analytics to Searching and Recommendation of Similar Judgments
In court practice, legal professionals rely on their training to provide opinions that resolve cases, one of the most crucial aspects being the ability to identify similar judgments from previous courts efficiently. However, finding a similar case is challenging and often depends on experience, legal domain knowledge, and extensive labor hours, making veteran lawyers or judges indispensable. This research aims to automate the analysis of judgment text similarity. We utilized a judgment dataset labeled as the "golden standard" by experts, which includes human-verified features that can be converted into an "expert similarity score." We then constructed a knowledge graph based on "case-article" relationships, ranking each case using natural language processing to derive a "Node2vec similarity score." By evaluating these two similarity scores, we identified their discrepancies and relationships. The results can significantly reduce the labor hours required for legal searches and recommendations, with potential applications extending to various fields of information retrieval.
False consensus biases AI against vulnerable stakeholders
Dong, Mengchen, Bonnefon, Jean-François, Rahwan, Iyad
The deployment of AI systems for welfare benefit allocation allows for accelerated decision-making and faster provision of critical help, but has already led to an increase in unfair benefit denials and false fraud accusations. Collecting data in the US and the UK (N = 2449), we explore the public acceptability of such speed-accuracy trade-offs in populations of claimants and non-claimants. We observe a general willingness to trade off speed gains for modest accuracy losses, but this aggregate view masks notable divergences between claimants and non-claimants. Although welfare claimants comprise a relatively small proportion of the general population (e.g., 20% in the US representative sample), this vulnerable group is much less willing to accept AI deployed in welfare systems, raising concerns that solely using aggregate data for calibration could lead to policies misaligned with stakeholder preferences. Our study further uncovers asymmetric insights between claimants and non-claimants. The latter consistently overestimate claimant willingness to accept speed-accuracy trade-offs, even when financially incentivized for accurate perspective-taking. This suggests that policy decisions influenced by the dominant voice of non-claimants, however well-intentioned, may neglect the actual preferences of those directly affected by welfare AI systems. Our findings underline the need for stakeholder engagement and transparent communication in the design and deployment of these systems, particularly in contexts marked by power imbalances.
- North America > United States (1.00)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Automatic Information Extraction From Employment Tribunal Judgements Using Large Language Models
de Faria, Joana Ribeiro, Xie, Huiyuan, Steffek, Felix
Court transcripts and judgments are rich repositories of legal knowledge, detailing the intricacies of cases and the rationale behind judicial decisions. The extraction of key information from these documents provides a concise overview of a case, crucial for both legal experts and the public. With the advent of large language models (LLMs), automatic information extraction has become increasingly feasible and efficient. This paper presents a comprehensive study on the application of GPT-4, a large language model, for automatic information extraction from UK Employment Tribunal (UKET) cases. We meticulously evaluated GPT-4's performance in extracting critical information with a manual verification process to ensure the accuracy and relevance of the extracted data. Our research is structured around two primary extraction tasks: the first involves a general extraction of eight key aspects that hold significance for both legal specialists and the general public, including the facts of the case, the claims made, references to legal statutes, references to precedents, general case outcomes and corresponding labels, detailed order and remedies and reasons for the decision. The second task is more focused, aimed at analysing three of those extracted features, namely facts, claims and outcomes, in order to facilitate the development of a tool capable of predicting the outcome of employment law disputes. Through our analysis, we demonstrate that LLMs like GPT-4 can obtain high accuracy in legal information extraction, highlighting the potential of LLMs in revolutionising the way legal information is processed and utilised, offering significant implications for legal research and practice.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > United Kingdom > Wales (0.04)
- North America > United States > Ohio (0.04)
- (3 more...)
- Law > Labor & Employment Law (1.00)
- Law > Government & the Courts (1.00)
- Law > Civil Rights & Constitutional Law (1.00)
- Government > Regional Government (1.00)
How an algorithm denied food to thousands of poor in India's Telangana
This story was produced with support from the Pulitzer Center's AI Accountability Network. Hyderabad and New Delhi, India – Bismillah Bee can't conceive of owning a car. The 67-year-old widow and 12 members of her family live in a cramped three-room house in an urban slum in Hyderabad, the capital of the Indian state of Telangana. Since her rickshaw puller husband's death two years ago of mouth cancer, Bee makes a living by peeling garlic for a local business. But an algorithmic system, which the Telangana government deploys to digitally profile its more than 30 million residents, tagged Bee's husband as a car owner in 2021, when he was still alive.
Augmented Intelligence is a Second Set of Eyes on Casualty Claims
Claims adjusters make decisions every day--million-dollar decisions that have the potential to change a claimant's life. If anyone needs a second set of eyes--that helpful colleague with tons of experience and sharp attention to detail--it's claims adjusters. Here's the thing: even two of the best claims adjusters with 60 years of combined experience probably haven't seen everything (although they may be pretty close). Every day there are new cases and unseen factors that offer data about the best course for a particular claim. That's where augmented intelligence comes in.