Law
A Rapid Test for Accuracy and Bias of Face Recognition Technology
Knott, Manuel, Serna, Ignacio, Mann, Ethan, Perona, Pietro
Measuring the accuracy of face recognition (FR) systems is essential for improving performance and ensuring responsible use. Accuracy is typically estimated using large annotated datasets, which are costly and difficult to obtain. We propose a novel method for 1:1 face verification that benchmarks FR systems quickly and without manual annotation, starting from approximate labels (e.g., from web search results). Unlike previous methods for training set label cleaning, ours leverages the embedding representation of the models being evaluated, achieving high accuracy in smaller-sized test datasets. Our approach reliably estimates FR accuracy and ranking, significantly reducing the time and cost of manual labeling. We also introduce the first public benchmark of five FR cloud services, revealing demographic biases, particularly lower accuracy for Asian women. Our rapid test method can democratize FR testing, promoting scrutiny and responsible use of the technology.
Fundamental Limitations in Defending LLM Finetuning APIs
Davies, Xander, Winsor, Eric, Korbak, Tomek, Souly, Alexandra, Kirk, Robert, de Witt, Christian Schroeder, Gal, Yarin
LLM developers have imposed technical interventions to prevent fine-tuning misuse attacks, attacks where adversaries evade safeguards by fine-tuning the model using a public API. Previous work has established several successful attacks against specific fine-tuning API defences. In this work, we show that defences of fine-tuning APIs that seek to detect individual harmful training or inference samples ('pointwise' detection) are fundamentally limited in their ability to prevent fine-tuning attacks. We construct 'pointwise-undetectable' attacks that repurpose entropy in benign model outputs (e.g. semantic or syntactic variations) to covertly transmit dangerous knowledge. Our attacks are composed solely of unsuspicious benign samples that can be collected from the model before fine-tuning, meaning training and inference samples are all individually benign and low-perplexity. We test our attacks against the OpenAI fine-tuning API, finding they succeed in eliciting answers to harmful multiple-choice questions, and that they evade an enhanced monitoring system we design that successfully detects other fine-tuning attacks. We encourage the community to develop defences that tackle the fundamental limitations we uncover in pointwise fine-tuning API defences.
Entity Framing and Role Portrayal in the News
Mahmoud, Tarek, Xie, Zhuohan, Dimitrov, Dimitar, Nikolaidis, Nikolaos, Silvano, Purificaรงรฃo, Yangarber, Roman, Sharma, Shivam, Sartori, Elisa, Stefanovitch, Nicolas, Martino, Giovanni Da San, Piskorski, Jakub, Nakov, Preslav
We introduce a novel multilingual hierarchical corpus annotated for entity framing and role portrayal in news articles. The dataset uses a unique taxonomy inspired by storytelling elements, comprising 22 fine-grained roles, or archetypes, nested within three main categories: protagonist, antagonist, and innocent. Each archetype is carefully defined, capturing nuanced portrayals of entities such as guardian, martyr, and underdog for protagonists; tyrant, deceiver, and bigot for antagonists; and victim, scapegoat, and exploited for innocents. The dataset includes 1,378 recent news articles in five languages (Bulgarian, English, Hindi, European Portuguese, and Russian) focusing on two critical domains of global significance: the Ukraine-Russia War and Climate Change. Over 5,800 entity mentions have been annotated with role labels. This dataset serves as a valuable resource for research into role portrayal and has broader implications for news analysis. We describe the characteristics of the dataset and the annotation process, and we report evaluation results on fine-tuned state-of-the-art multilingual transformers and hierarchical zero-shot learning using LLMs at the level of a document, a paragraph, and a sentence.
A Mobile Robotic Approach to Autonomous Surface Scanning in Legal Medicine
Grube, Sarah, Latus, Sarah, Fischer, Martin, Raudonis, Vidas, Heinemann, Axel, Ondruschka, Benjamin, Schlaefer, Alexander
Purpose: Comprehensive legal medicine documentation includes both an internal but also an external examination of the corpse. Typically, this documentation is conducted manually during conventional autopsy. A systematic digital documentation would be desirable, especially for the external examination of wounds, which is becoming more relevant for legal medicine analysis. For this purpose, RGB surface scanning has been introduced. While a manual full surface scan using a handheld camera is timeconsuming and operator dependent, floor or ceiling mounted robotic systems require substantial space and a dedicated room. Hence, we consider whether a mobile robotic system can be used for external documentation. Methods: We develop a mobile robotic system that enables full-body RGB-D surface scanning. Our work includes a detailed configuration space analysis to identify the environmental parameters that need to be considered to successfully perform a surface scan. We validate our findings through an experimental study in the lab and demonstrate the system's application in a legal medicine environment. Results: Our configuration space analysis shows that a good trade-off between coverage and time is reached with three robot base positions, leading to a coverage of 94.96 %. Experiments validate the effectiveness of the system in accurately capturing body surface geometry with an average surface coverage of 96.90 +- 3.16 % and 92.45 +- 1.43 % for a body phantom and actual corpses, respectively. Conclusion: This work demonstrates the potential of a mobile robotic system to automate RGB-D surface scanning in legal medicine, complementing the use of post-mortem CT scans for inner documentation. Our results indicate that the proposed system can contribute to more efficient and autonomous legal medicine documentation, reducing the need for manual intervention.
Enhancing Portuguese Variety Identification with Cross-Domain Approaches
Sousa, Hugo, Almeida, Rรบben, Silvano, Purificaรงรฃo, Cantante, Inรชs, Campos, Ricardo, Jorge, Alรญpio
Recent advances in natural language processing have raised expectations for generative models to produce coherent text across diverse language varieties. In the particular case of the Portuguese language, the predominance of Brazilian Portuguese corpora online introduces linguistic biases in these models, limiting their applicability outside of Brazil. To address this gap and promote the creation of European Portuguese resources, we developed a cross-domain language variety identifier (LVI) to discriminate between European and Brazilian Portuguese. Motivated by the findings of our literature review, we compiled the PtBrVarId corpus, a cross-domain LVI dataset, and study the effectiveness of transformer-based LVI classifiers for cross-domain scenarios. Although this research focuses on two Portuguese varieties, our contribution can be extended to other varieties and languages. We open source the code, corpus, and models to foster further research in this task.
PredictaBoard: Benchmarking LLM Score Predictability
Pacchiardi, Lorenzo, Voudouris, Konstantinos, Slater, Ben, Martรญnez-Plumed, Fernando, Hernรกndez-Orallo, Josรฉ, Zhou, Lexin, Schellaert, Wout
Despite possessing impressive skills, Large Language Models (LLMs) often fail unpredictably, demonstrating inconsistent success in even basic common sense reasoning tasks. This unpredictability poses a significant challenge to ensuring their safe deployment, as identifying and operating within a reliable "safe zone" is essential for mitigating risks. To address this, we present PredictaBoard, a novel collaborative benchmarking framework designed to evaluate the ability of score predictors (referred to as assessors) to anticipate LLM errors on specific task instances (i.e., prompts) from existing datasets. PredictaBoard evaluates pairs of LLMs and assessors by considering the rejection rate at different tolerance errors. As such, PredictaBoard stimulates research into developing better assessors and making LLMs more predictable, not only with a higher average performance. We conduct illustrative experiments using baseline assessors and state-of-the-art LLMs. PredictaBoard highlights the critical need to evaluate predictability alongside performance, paving the way for safer AI systems where errors are not only minimised but also anticipated and effectively mitigated. Code for our benchmark can be found at https://github.com/Kinds-of-Intelligence-CFI/PredictaBoard
Jack the Ripper and the case of the missing DNA evidence
Feedback is New Scientist's popular sideways look at the latest science and technology news. You can submit items you believe may amuse readers to Feedback by emailing feedback@newscientist.com Feedback is as fond of true crime as the next morbidly curious ghoul, so we have occasionally dipped our toes into the never-ending well of speculation about the Whitechapel murders of 1888-91 and the near-mythical Jack the Ripper. Although frankly, we didn't get much further than Alan Moore and Eddie Campbell's From Hell, which (spoiler!) ties the killings to the British establishment and the Freemasons, who supposedly arranged the murders to create an evil psychic force that would perpetuate the patriarchy. But the field of "Ripperology" extends far beyond one eccentric graphic novel.
EU accused of leaving 'devastating' copyright loophole in AI Act
"What I do not understand is that we are supporting big tech instead of protecting European creative ideas and content." The EU's AI Act, which came into force last year, was already in the works when ChatGPT, an AI chatbot that can generate essays, jokes and job applications, burst into public consciousness in late 2022, becoming the fastest-growing consumer application in history. ChatGPT was developed by OpenAI, which is also behind the AI image generator Dall-E. He would like legislation to fill that gap, but said it would take years, after the European Commission's decision last week to withdraw the proposed AI Liability Act. "It might be getting very difficult.
Trump can rein in Biden's out-of-control antitrust operation
The Senate Judiciary Committee soon will hold confirmation hearings for Gail Slater for assistant attorney general, antitrust division. Slater's antitrust understanding is broad and deep; she previously worked in the Trump 45 administration, the Federal Trade Commission (FTC) and the private sector. She already has support from several senators and Attorney General Pam Bondi; she ought to be confirmed easily. Slater, once confirmed, FTC Chairman Andrew Ferguson, and their respective agencies should return to following the Consumer Welfare Standard ("CWS"), which has been the law of the land since the Supreme Court's 1979 Reiter v. Sonotone opinion. Reiter adopted CWS from Professor Robert Bork's seminal 1978 book, "The Antitrust Paradox," which explained that competition leads companies to benefit consumers through, for example, lowering prices, growing output, improving customer service, expanding research and development, and increasing innovation.
Multi-Objective Bayesian Optimization for Networked Black-Box Systems: A Path to Greener Profits and Smarter Designs
Kudva, Akshay, Tang, Wei-Ting, Paulson, Joel A.
Designing modern industrial systems requires balancing several competing objectives, such as profitability, resilience, and sustainability, while accounting for complex interactions between technological, economic, and environmental factors. Multi-objective optimization (MOO) methods are commonly used to navigate these tradeoffs, but selecting the appropriate algorithm to tackle these problems is often unclear, particularly when system representations vary from fully equation-based (white-box) to entirely data-driven (black-box) models. While grey-box MOO methods attempt to bridge this gap, they typically impose rigid assumptions on system structure, requiring models to conform to the underlying structural assumptions of the solver rather than the solver adapting to the natural representation of the system of interest. In this chapter, we introduce a unifying approach to grey-box MOO by leveraging network representations, which provide a general and flexible framework for modeling interconnected systems as a series of function nodes that share various inputs and outputs. Specifically, we propose MOBONS, a novel Bayesian optimization-inspired algorithm that can efficiently optimize general function networks, including those with cyclic dependencies, enabling the modeling of feedback loops, recycle streams, and multi-scale simulations - features that existing methods fail to capture. Furthermore, MOBONS incorporates constraints, supports parallel evaluations, and preserves the sample efficiency of Bayesian optimization while leveraging network structure for improved scalability. We demonstrate the effectiveness of MOBONS through two case studies, including one related to sustainable process design. By enabling efficient MOO under general graph representations, MOBONS has the potential to significantly enhance the design of more profitable, resilient, and sustainable engineering systems.