Law
This man was killed four years ago. His AI clone just spoke in court.
People just can't stop using generative AI tools in legal proceedings, despite repeated pushback from frustrated judges. While AI initially appeared in courtrooms through bogus "hallucinated" cases the trend has taken a turn--driven by increasingly sophisticated AI video and audio tools. In some instances, AI is even being used to seemingly bring victims back from the dead. This week, a crime victim's family presented a brief video in an Arizona courtroom depicting an AI version of 37-year-old Chris Pelkey. Pelkey was shot and killed in 2021 in a road rage incident. Now, four years later, the AI-generated "clone" appeared to address his alleged killer in court.
Elevating Semantic Exploration: A Novel Approach Utilizing Distributed Repositories
Centralized and distributed systems are two main approaches to organizing ICT infrastructure, each with its pros and cons. Centralized systems concentrate resources in one location, making management easier but creating single points of failure. Distributed systems, on the other hand, spread resources across multiple nodes, offering better scalability and fault tolerance, but requiring more complex management. The choice between them depends on factors like application needs, scalability, and data sensitivity. Centralized systems suit applications with limited scalability and centralized control, while distributed systems excel in large-scale environments requiring high availability and performance. This paper explores a distributed document repository system developed for the Italian Ministry of Justice, using edge repositories to analyze textual data and metadata, enhancing semantic exploration capabilities.
Pushing the boundary on Natural Language Inference
Miralles-Gonzรกlez, Pablo, Huertas-Tato, Javier, Martรญn, Alejandro, Camacho, David
Natural Language Inference (NLI) is a central task in natural language understanding with applications in fact-checking, question answering, and information retrieval. Despite its importance, current NLI systems heavily rely on supervised learning with datasets that often contain annotation artifacts and biases, limiting generalization and real-world applicability. In this work, we apply a reinforcement learning-based approach using Group Relative Policy Optimization (GRPO) for Chain-of-Thought (CoT) learning in NLI, eliminating the need for labeled rationales and enabling this type of training on more challenging datasets such as ANLI. We fine-tune 7B, 14B, and 32B language models using parameter-efficient techniques (LoRA and QLoRA), demonstrating strong performance across standard and adversarial NLI benchmarks. Our 32B AWQ-quantized model surpasses state-of-the-art results on 7 out of 11 adversarial sets$\unicode{x2013}$or on all of them considering our replication$\unicode{x2013}$within a 22GB memory footprint, showing that robust reasoning can be retained under aggressive quantization. This work provides a scalable and practical framework for building robust NLI systems without sacrificing inference quality.
UnifyFL: Enabling Decentralized Cross-Silo Federated Learning
S, Sarang, Dhakshinamoorthy, Druva, Sharma, Aditya Shiva, Bhadauria, Yuvraj Singh, Vivek, Siddharth Chaitra, Bansal, Arihant, Paul, Arnab K.
Federated Learning (FL) is a decentralized machine learning (ML) paradigm in which models are trained on private data across several devices called clients and combined at a single node called an aggregator rather than aggregating the data itself. Many organizations employ FL to have better privacy-aware ML-driven decision-making capabilities. However, organizations often operate independently rather than collaborate to enhance their FL capabilities due to the lack of an effective mechanism for collaboration. The challenge lies in balancing trust and resource efficiency. One approach relies on trusting a third-party aggregator to consolidate models from all organizations (multilevel FL), but this requires trusting an entity that may be biased or unreliable. Alternatively, organizations can bypass a third party by sharing their local models directly, which requires significant computational resources for validation. Both approaches reflect a fundamental trade-off between trust and resource constraints, with neither offering an ideal solution. In this work, we develop a trust-based cross-silo FL framework called UnifyFL, which uses decentralized orchestration and distributed storage. UnifyFL provides flexibility to the participating organizations and presents synchronous and asynchronous modes to handle stragglers. Our evaluation on a diverse testbed shows that UnifyFL achieves a performance comparable to the ideal multilevel centralized FL while allowing trust and optimal use of resources.
Memorization or Interpolation ? Detecting LLM Memorization through Input Perturbation Analysis
Djirรฉ, Albรฉrick Euraste, Kaborรฉ, Abdoul Kader, Barr, Earl T., Klein, Jacques, Bissyandรฉ, Tegawendรฉ F.
While Large Language Models (LLMs) achieve remarkable performance through training on massive datasets, they can exhibit concerning behaviors such as verbatim reproduction of training data rather than true generalization. This memorization phenomenon raises significant concerns about data privacy, intellectual property rights, and the reliability of model evaluations. This paper introduces PEARL, a novel approach for detecting memorization in LLMs. PEARL assesses how sensitive an LLM's performance is to input perturbations, enabling memorization detection without requiring access to the model's internals. We investigate how input perturbations affect the consistency of outputs, enabling us to distinguish between true generalization and memorization. Our findings, following extensive experiments on the Pythia open model, provide a robust framework for identifying when the model simply regurgitates learned information. Applied on the GPT 4o models, the PEARL framework not only identified cases of memorization of classic texts from the Bible or common code from HumanEval but also demonstrated that it can provide supporting evidence that some data, such as from the New York Times news articles, were likely part of the training data of a given model.
Real-World Gaps in AI Governance Research
Strauss, Ilan, Moure, Isobel, O'Reilly, Tim, Rosenblat, Sruly
Drawing on 1,178 safety and reliability papers from 9,439 generative AI papers (January 2020 - March 2025), we compare research outputs of leading AI companies (Anthropic, Google DeepMind, Meta, Microsoft, and OpenAI) and AI universities (CMU, MIT, NYU, Stanford, UC Berkeley, and University of Washington). We find that corporate AI research increasingly concentrates on pre-deployment areas -- model alignment and testing & evaluation -- while attention to deployment-stage issues such as model bias has waned. Significant research gaps exist in high-risk deployment domains, including healthcare, finance, misinformation, persuasive and addictive features, hallucinations, and copyright. Without improved observability into deployed AI, growing corporate concentration could deepen knowledge deficits. We recommend expanding external researcher access to deployment data and systematic observability of in-market AI behaviors.
OpenAI's new for-profit plan leaves many unanswered questions
OpenAI has abandoned its controversial restructuring plan. In a dramatic reversal, the company said Monday it would no longer try to separate control of its for-profit arm from the non-profit board that currently oversees operations. "We made the decision for the nonprofit to retain control of OpenAI after hearing from civic leaders and engaging in constructive dialogue with the offices of the Attorney General of Delaware and the Attorney General of California," said Bret Taylor, the chairman of OpenAI. OpenAI had originally argued its existing structure would not allow its nonprofit to "easily do more than control the for-profit." It also said it needed more money, a mere two months after securing 6.6 billion in new investment.
Why the humanoid workforce is running late
But Rus and many others I spoke with at the expo suggest that this hype just doesn't add up. Humanoids "are mostly not intelligent," she said. Rus showed a video of herself speaking to an advanced humanoid that smoothly followed her instruction to pick up a watering can and water a nearby plant. But when she asked it to "water" her friend, the robot did not consider that humans don't need watering like plants and moved to douse the person. "These robots lack common sense," she said.
OpenAI dials back conversion plan, with nonprofit to retain control
OpenAI has dialed back a significant restructuring plan, with its nonprofit parent retaining control in a move that is likely to limit CEO Sam Altman's power over the pioneering maker of ChatGPT. The announcement follows a storm of criticism and legal challenges, including a high-profile lawsuit filed by rival and co-founder Elon Musk, who has accused OpenAI of straying from its founding mission to develop artificial intelligence for the benefit of humanity. "OpenAI was founded as a non-profit, is today a non-profit that oversees and controls the for-profit, and going forward will remain a non-profit that oversees and controls the for-profit. That will not change," Altman said in a blog post Monday.
Explainability by design: an experimental analysis of the legal coding process
Cristani, Matteo, Governatori, Guido, Olivieri, Francesco, Palmirani, Monica, Buriola, Gabriele
Behind a set of rules in Deontic Defeasible Logic, there is a mapping process of normative background fragments. This process goes from text to rules and implicitly encompasses an explanation of the coded fragments. In this paper we deliver a methodology for \textit{legal coding} that starts with a fragment and goes onto a set of Deontic Defeasible Logic rules, involving a set of \textit{scenarios} to test the correctness of the coded fragments. The methodology is illustrated by the coding process of an example text. We then show the results of a series of experiments conducted with humans encoding a variety of normative backgrounds and corresponding cases in which we have measured the efforts made in the coding process, as related to some measurable features. To process these examples, a recently developed technology, Houdini, that allows reasoning in Deontic Defeasible Logic, has been employed. Finally we provide a technique to forecast time required in coding, that depends on factors such as knowledge of the legal domain, knowledge of the coding processes, length of the text, and a measure of \textit{depth} that refers to the length of the paths of legal references.