Goto

Collaborating Authors

 Law


Anthropic Settles High-Profile AI Copyright Lawsuit Brought by Book Authors

WIRED

The move will allow Anthropic to avoid what could have been a financially devastating outcome in court. The settlement agreement is expected to be finalized September 3, with more details to follow, according to a legal filing published on Tuesday. Lawyers for the plaintiffs did not immediately respond to requests for comment. In 2024, three book writers, Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, sued Anthropic, alleging that the startup illegally used their work to train its artificial intelligence models. In June, California district court judge William Alsup issued a summary judgment in Bartz v. Anthropic that largely sided with Anthropic, finding that the company's usage of the books was "fair use" and thus legal.


Nikkei and Asahi Shimbun sue Perplexity AI over alleged copyright violations

The Japan Times

The newspapers are seeking an injunction and 2.2 billion ( 15 million) each in damages from Perplexity, they said in a joint statement Tuesday. The suit was filed at the Tokyo District Court. The legal action by the Nikkei, which owns Japan's biggest financial newspaper, and the left-leaning Asahi underscores a widening rift between publishers and AI companies over who controls -- and profits from -- the distribution of news. The media industry argues that AI tools using their work without licenses siphons away readership and ad revenue, threatening already fragile business models. "These actions amount to continuous and large-scale freeloading on journalists' time and effort," Nikkei and Asahi said in the statement.


Musk sues Apple and OpenAI, saying they hurt AI competition

The Japan Times

Elon Musk has accused Apple and OpenAI in a lawsuit of unfairly favoring the artificial intelligence company across iPhones and thwarting competition for other chatbot makers. Musk's X and xAI seek billions of dollars in damages in the suit filed Monday in U.S. federal court in Fort Worth, Texas, arguing that Apple's decision to integrate OpenAI into the iPhone's operating system inhibits rivalry and innovation within the AI industry and harms consumers by depriving them of choice. The billionaire founder of xAI, which now houses the Grok AI team and X social network, said Apple makes it impossible for anyone other than OpenAI's ChatGPT to reach the top of the App Store charts, a sought-after global spotlight for app developers.


The Statistical Fairness-Accuracy Frontier

arXiv.org Machine Learning

Machine learning models must balance accuracy and fairness, but these goals often conflict, particularly when data come from multiple demographic groups. A useful tool for understanding this trade-off is the fairness-accuracy (FA) frontier, which characterizes the set of models that cannot be simultaneously improved in both fairness and accuracy. Prior analyses of the FA frontier provide a full characterization under the assumption of complete knowledge of population distributions -- an unrealistic ideal. We study the FA frontier in the finite-sample regime, showing how it deviates from its population counterpart and quantifying the worst-case gap between them. In particular, we derive minimax-optimal estimators that depend on the designer's knowledge of the covariate distribution. For each estimator, we characterize how finite-sample effects asymmetrically impact each group's risk, and identify optimal sample allocation strategies. Our results transform the FA frontier from a theoretical construct into a practical tool for policymakers and practitioners who must often design algorithms with limited data.


Jinx: Unlimited LLMs for Probing Alignment Failures

arXiv.org Artificial Intelligence

Unlimited, or so-called helpful-only language models are trained without safety alignment constraints and never refuse user queries. They are widely used by leading AI companies as internal tools for red teaming and alignment evaluation. For example, if a safety-aligned model produces harmful outputs similar to an unlimited model, this indicates alignment failures that require further attention. Despite their essential role in assessing alignment, such models are not available to the research community. We introduce Jinx, a helpful-only variant of popular open-weight LLMs. Jinx responds to all queries without refusals or safety filtering, while preserving the base model's capabilities in reasoning and instruction following. It provides researchers with an accessible tool for probing alignment failures, evaluating safety boundaries, and systematically studying failure modes in language model safety.


A Retail-Corpus for Aspect-Based Sentiment Analysis with Large Language Models

arXiv.org Artificial Intelligence

Aspect-based sentiment analysis enhances sentiment detection by associating it with specific aspects, offering deeper insights than traditional sentiment analysis. This study introduces a manually annotated dataset of 10,814 multilingual customer reviews covering brick-and-mortar retail stores, labeled with eight aspect categories and their sentiment. Using this dataset, the performance of GPT-4 and LLaMA-3 in aspect based sentiment analysis is evaluated to establish a baseline for the newly introduced data. The results show both models achieving over 85% accuracy, while GPT-4 outperforms LLaMA-3 overall with regard to all relevant metrics.


A Feminist Account of Intersectional Algorithmic Fairness

arXiv.org Artificial Intelligence

Intersectionality has profoundly influenced research and political action by revealing how interconnected systems of privilege and oppression influence lived experiences, yet its integration into algorithmic fairness research remains limited. Existing approaches often rely on single - axis or formal subgroup frameworks that risk oversimplifying social realities and neglecting structural inequalities. We propose Substantive Intersectional Algorithmic Fairness, extending Green's (2022) notion of substantive algorithmic fairness with insights from intersectional feminist theory. Buil ding on this foundation, we introduce ten desiderata within the ROOF methodology to guide the design, assessment, and deployment of algorithmic systems in ways that address systemic inequities while mitigating harms to intersectionally marginalized communi ties . Rather than prescribing fixed operationalizations, these desiderata encourage reflection on assumptions of neutrality, the use of protect ed attributes, the inclusion of multiply marginalized groups, and enhancing algorithmic systems' potential. Our a pproach emphasizes that fairness cannot be separated from social context, and that in some cases, principled non - deployment may be necessary. By bridging computational and social science perspectives, we provide actionable guidance for more equitable, incl usive, and context - sensitive intersectional algorithmic practices.


AMELIA: A Family of Multi-task End-to-end Language Models for Argumentation

arXiv.org Artificial Intelligence

Argument mining is a subfield of argumentation that aims to automatically extract argumentative structures and their relations from natural language texts. This paper investigates how a single large language model can be leveraged to perform one or several argument mining tasks. Our contributions are two-fold. First, we construct a multi-task dataset by surveying and converting 19 well-known argument mining datasets from the literature into a unified format. Second, we explore various training strategies using Meta AI's Llama-3.1-8B-Instruct model: (1) fine-tuning on individual tasks, (2) fine-tuning jointly on multiple tasks, and (3) merging models fine-tuned separately on individual tasks. Our experiments show that task-specific fine-tuning significantly improves individual performance across all tasks. Moreover, multi-task fine-tuning maintains strong performance without degradation, suggesting effective transfer learning across related tasks. Finally, we demonstrate that model merging offers a viable compromise: it yields competitive performance while mitigating the computational costs associated with full multi-task fine-tuning.


UQ: Assessing Language Models on Unsolved Questions

arXiv.org Artificial Intelligence

Benchmarks shape progress in AI research. A useful benchmark should be both difficult and realistic: questions should challenge frontier models while also reflecting real-world usage. Yet, current paradigms face a difficulty-realism tension: exam-style benchmarks are often made artificially difficult with limited real-world value, while benchmarks based on real user interaction often skew toward easy, high-frequency problems. In this work, we explore a radically different paradigm: assessing models on unsolved questions. Rather than a static benchmark scored once, we curate unsolved questions and evaluate models asynchronously over time with validator-assisted screening and community verification. We introduce UQ, a testbed of 500 challenging, diverse questions sourced from Stack Exchange, spanning topics from CS theory and math to sci-fi and history, probing capabilities including reasoning, factuality, and browsing. UQ is difficult and realistic by construction: unsolved questions are often hard and naturally arise when humans seek answers, thus solving them yields direct real-world value. Our contributions are threefold: (1) UQ-Dataset and its collection pipeline combining rule-based filters, LLM judges, and human review to ensure question quality (e.g., well-defined and difficult); (2) UQ-Validators, compound validation strategies that leverage the generator-validator gap to provide evaluation signals and pre-screen candidate solutions for human review; and (3) UQ-Platform, an open platform where experts collectively verify questions and solutions. The top model passes UQ-validation on only 15% of questions, and preliminary human verification has already identified correct answers among those that passed. UQ charts a path for evaluating frontier models on real-world, open-ended challenges, where success pushes the frontier of human knowledge. We release UQ at https://uq.stanford.edu.


Chinese Court Simulation with LLM-Based Agent System

arXiv.org Artificial Intelligence

Mock trial has long served as an important platform for legal professional training and education. It not only helps students learn about realistic trial procedures, but also provides practical value for case analysis and judgment prediction. Traditional mock trials are difficult to access by the public because they rely on professional tutors and human participants. Fortunately, the rise of large language models (LLMs) provides new opportunities for creating more accessible and scalable court simulations. While promising, existing research mainly focuses on agent construction while ignoring the systematic design and evaluation of court simulations, which are actually more important for the credibility and usage of court simulation in practice. To this end, we present the first court simulation framework -- SimCourt -- based on the real-world procedure structure of Chinese courts. Our framework replicates all 5 core stages of a Chinese trial and incorporates 5 courtroom roles, faithfully following the procedural definitions in China. To simulate trial participants with different roles, we propose and craft legal agents equipped with memory, planning, and reflection abilities. Experiment on legal judgment prediction show that our framework can generate simulated trials that better guide the system to predict the imprisonment, probation, and fine of each case. Further annotations by human experts show that agents' responses under our simulation framework even outperformed judges and lawyers from the real trials in many scenarios. These further demonstrate the potential of LLM-based court simulation.