Law
Explainability by design: an experimental analysis of the legal coding process
Cristani, Matteo, Governatori, Guido, Olivieri, Francesco, Palmirani, Monica, Buriola, Gabriele
Behind a set of rules in Deontic Defeasible Logic, there is a mapping process of normative background fragments. This process goes from text to rules and implicitly encompasses an explanation of the coded fragments. In this paper we deliver a methodology for \textit{legal coding} that starts with a fragment and goes onto a set of Deontic Defeasible Logic rules, involving a set of \textit{scenarios} to test the correctness of the coded fragments. The methodology is illustrated by the coding process of an example text. We then show the results of a series of experiments conducted with humans encoding a variety of normative backgrounds and corresponding cases in which we have measured the efforts made in the coding process, as related to some measurable features. To process these examples, a recently developed technology, Houdini, that allows reasoning in Deontic Defeasible Logic, has been employed. Finally we provide a technique to forecast time required in coding, that depends on factors such as knowledge of the legal domain, knowledge of the coding processes, length of the text, and a measure of \textit{depth} that refers to the length of the paths of legal references.
Bye-bye, Bluebook? Automating Legal Procedure with Large Language Models
Legal practice requires careful adherence to procedural rules. In the United States, few are more complex than those found in The Bluebook: A Uniform System of Citation. Compliance with this system's 500+ pages of byzantine formatting instructions is the raison d'etre of thousands of student law review editors and the bete noire of lawyers everywhere. To evaluate whether large language models (LLMs) are able to adhere to the procedures of such a complicated system, we construct an original dataset of 866 Bluebook tasks and test flagship LLMs from OpenAI, Anthropic, Google, Meta, and DeepSeek. We show (1) that these models produce fully compliant Bluebook citations only 69%-74% of the time and (2) that in-context learning on the Bluebook's underlying system of rules raises accuracy only to 77%. These results caution against using off-the-shelf LLMs to automate aspects of the law where fidelity to procedure is paramount.
Study of the influence of a biased database on the prediction of standard algorithms for selecting the best candidate for an interview
Wang, Shuyu, Saillet, Angรฉlique, Gall, Philomรจne Le, Lacroux, Alain, Martin-Lacroux, Christelle, Brault, Vincent
Artificial Intelligence (AI) is extensively used across various stages of the recruitment process, from automated candidate sourcing on social media platforms to asynchronous video recruitment methods. A study of Human Resources (HR) professionals representing 500 mid-sized organisations from diverse industries across five countries revealed that 24% of businesses have already implemented AI for recruitment purposes, while 56% of hiring managers plan to adopt it within the next year [Sage, 2020]. AI is employed to augment human decision-making regarding job candidates (such as determining who should receive a job offer) and to support the actions of human decision-makers throughout the process (such as data collection and analysis; Gonzalez, Liu, Shirase, Tomczak, Lobbe, Justenhoven, and Martin [2022]). Some applications incorporating AI algorithms are widely accepted and relatively uncontroversial.
What Is AI Safety? What Do We Want It to Be?
Harding, Jacqueline, Kirk-Giannini, Cameron Domenico
The field of AI safety seeks to prevent or reduce the harms caused by AI systems. A simple and appealing account of what is distinctive of AI safety as a field holds that this feature is constitutive: a research project falls within the purview of AI safety just in case it aims to prevent or reduce the harms caused by AI systems. Call this appealingly simple account The Safety Conception of AI safety. Despite its simplicity and appeal, we argue that The Safety Conception is in tension with at least two trends in the ways AI safety researchers and organizations think and talk about AI safety: first, a tendency to characterize the goal of AI safety research in terms of catastrophic risks from future systems; second, the increasingly popular idea that AI safety can be thought of as a branch of safety engineering. Adopting the methodology of conceptual engineering, we argue that these trends are unfortunate: when we consider what concept of AI safety it would be best to have, there are compelling reasons to think that The Safety Conception is the answer. Descriptively, The Safety Conception allows us to see how work on topics that have historically been treated as central to the field of AI safety is continuous with work on topics that have historically been treated as more marginal, like bias, misinformation, and privacy. Normatively, taking The Safety Conception seriously means approaching all efforts to prevent or mitigate harms from AI systems based on their merits rather than drawing arbitrary distinctions between them.
Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use
Ho, Justin, Colby, Alexandra, Fisher, William
This paper presents a domain-specific implementation of Retrieval-Augmented Generation (RAG) tailored to the Fair Use Doctrine in U.S. copyright law. Motivated by the increasing prevalence of DMCA takedowns and the lack of accessible legal support for content creators, we propose a structured approach that combines semantic search with legal knowledge graphs and court citation networks to improve retrieval quality and reasoning reliability. Our prototype models legal precedents at the statutory factor level (e.g., purpose, nature, amount, market effect) and incorporates citation-weighted graph representations to prioritize doctrinally authoritative sources. We use Chain-of-Thought reasoning and interleaved retrieval steps to better emulate legal reasoning. Preliminary testing suggests this method improves doctrinal relevance in the retrieval process, laying groundwork for future evaluation and deployment of LLM-based legal assistance tools.
LLM-based Text Simplification and its Effect on User Comprehension and Cognitive Load
Guidroz, Theo, Ardila, Diego, Li, Jimmy, Mansour, Adam, Jhun, Paul, Gonzalez, Nina, Ji, Xiang, Sanchez, Mike, Kakarmath, Sujay, Bellaiche, Mathias MJ, Garrido, Miguel รngel, Ahmed, Faruk, Choudhary, Divyansh, Hartford, Jay, Xu, Chenwei, Echeverria, Henry Javier Serrano, Wang, Yifan, Shaffer, Jeff, Eric, null, Cao, null, Matias, Yossi, Hassidim, Avinatan, Webster, Dale R, Liu, Yun, Fujiwara, Sho, Bui, Peggy, Duong, Quang
Information on the web, such as scientific publications and Wikipedia, often surpasses users' reading level. To help address this, we used a self-refinement approach to develop a LLM capability for minimally lossy text simplification. To validate our approach, we conducted a randomized study involving 4563 participants and 31 texts spanning 6 broad subject areas: PubMed (biomedical scientific articles), biology, law, finance, literature/philosophy, and aerospace/computer science. Participants were randomized to viewing original or simplified texts in a subject area, and answered multiple-choice questions (MCQs) that tested their comprehension of the text. The participants were also asked to provide qualitative feedback such as task difficulty. Our results indicate that participants who read the simplified text answered more MCQs correctly than their counterparts who read the original text (3.9% absolute increase, p<0.05). This gain was most striking with PubMed (14.6%), while more moderate gains were observed for finance (5.5%), aerospace/computer science (3.8%) domains, and legal (3.5%). Notably, the results were robust to whether participants could refer back to the text while answering MCQs. The absolute accuracy decreased by up to ~9% for both original and simplified setups where participants could not refer back to the text, but the ~4% overall improvement persisted. Finally, participants' self-reported perceived ease based on a simplified NASA Task Load Index was greater for those who read the simplified text (absolute change on a 5-point scale 0.33, p<0.05). This randomized study, involving an order of magnitude more participants than prior works, demonstrates the potential of LLMs to make complex information easier to understand. Our work aims to enable a broader audience to better learn and make use of expert knowledge available on the web, improving information accessibility.
Parameterized Argumentation-based Reasoning Tasks for Benchmarking Generative Language Models
Steging, Cor, Renooij, Silja, Verheij, Bart
Generative large language models as tools in the legal domain have the potential to improve the justice system. However, the reasoning behavior of current generative models is brittle and poorly understood, hence cannot be responsibly applied in the domains of law and evidence. In this paper, we introduce an approach for creating benchmarks that can be used to evaluate the reasoning capabilities of generative language models. These benchmarks are dynamically varied, scalable in their complexity, and have formally unambiguous interpretations. In this study, we illustrate the approach on the basis of witness testimony, focusing on the underlying argument attack structure. We dynamically generate both linear and non-linear argument attack graphs of varying complexity and translate these into reasoning puzzles about witness testimony expressed in natural language. We show that state-of-the-art large language models often fail in these reasoning puzzles, already at low complexity. Obvious mistakes are made by the models, and their inconsistent performance indicates that their reasoning capabilities are brittle. Furthermore, at higher complexity, even state-of-the-art models specifically presented for reasoning capabilities make mistakes. We show the viability of using a parametrized benchmark with varying complexity to evaluate the reasoning capabilities of generative language models. As such, the findings contribute to a better understanding of the limitations of the reasoning capabilities of generative models, which is essential when designing responsible AI systems in the legal domain.
Securing the Future of IVR: AI-Driven Innovation with Agile Security, Data Regulation, and Ethical AI Integration
Shaikh, Khushbu Mehboob, Giannakopoulos, Georgios
Securing the Future of IVR: AI-Driven Innovation with Agile Security, Data Regulation, and Ethical AI Integration Khushbu Mehboob Shaikh T echnical Lead, Principal T echnical Account Manager Twilio Inc. Irving, Texas, United States ORCID: 0009-0000-8681-5830 Georgios Giannakopoulos Principal Engineer, Independent Researcher The Hague, The Netherlands ORCID: 0000-0002-3707-3276 Abstract --The rapid digitalization of communication systems has elevated Interactive V oice Response (IVR) technologies to become critical interfaces for customer engagement. With Artificial Intelligence (AI) now driving these platforms, ensuring secure, compliant, and ethically designed development practices is more imperative than ever . AI-powered IVRs leverage Natural Language Processing (NLP) and Machine Learning (ML) to personalize interactions, automate service delivery, and optimize user experiences. However, these innovations expose systems to heightened risks, including data privacy breaches, AI decision opacity, and model security vulnerabilities. We propose a practical governance framework that embeds agile security principles, compliance with global data legislation, and user-centric ethics. Emphasizing privacy-by-design, adaptive risk modeling, and transparency, the paper argues that ethical AI integration is not a feature but a strategic imperative. Through this multidimensional lens, we highlight how modern IVRs can transition from communication tools to intelligent, secure, and accountable digital frontlinesresilient against emerging threats and aligned with societal expectations. I NTRODUCTION Interactive V oice Response (IVR) systems have long served as essential digital entry points in customer service operations, enabling organizations to automate call handling, reduce wait times, and streamline user interactions [1].
One Search Fits All: Pareto-Optimal Eco-Friendly Model Selection
Betello, Filippo, Purificato, Antonio, Vineis, Vittoria, Tolomei, Gabriele, Silvestri, Fabrizio
The environmental impact of Artificial Intelligence (AI) is emerging as a significant global concern, particularly regarding model training. In this paper, we introduce GREEN (Guided Recommendations of Energy-Efficient Networks), a novel, inference-time approach for recommending Pareto-optimal AI model configurations that optimize validation performance and energy consumption across diverse AI domains and tasks. Our approach directly addresses the limitations of current eco-efficient neural architecture search methods, which are often restricted to specific architectures or tasks. Central to this work is EcoTaskSet, a dataset comprising training dynamics from over 1767 experiments across computer vision, natural language processing, and recommendation systems using both widely used and cutting-edge architectures. Leveraging this dataset and a prediction model, our approach demonstrates effectiveness in selecting the best model configuration based on user preferences. Experimental results show that our method successfully identifies energy-efficient configurations while ensuring competitive performance.
OpenAI Backs Down on Restructuring Amid Pushback
OpenAI on Monday announced a proposed restructuring that would give its nonprofit arm ongoing control of ChatGPT and the rest of the startup's AI products. The move is a reversal of an earlier announcement which called for the nonprofit to relinquish its authority to a newly created public-benefit corporation. The proposed company structure has to be approved by the attorney general offices in California and Delaware by early next year. Up to 30 billion in funding from SoftBank and other investors is contingent on this approval. That money is crucial for OpenAI to maintain its position as a leader in generative AI and give higher returns to investors.