Goto

Collaborating Authors

 Law


'Facial recognition tech mistook me for wanted man'

BBC News

A man who is bringing a High Court challenge against the Metropolitan Police after live facial recognition technology wrongly identified him as a suspect has described it as "stop and search on steroids". Shaun Thompson, 39, was stopped by police in February last year outside London Bridge Tube station. Privacy campaign group Big Brother Watch said the judicial review, due to be heard in January, was the first legal case of its kind against the "intrusive technology". The Met, which announced last week that it would double its live facial recognition technology (LFR) deployments, said it was removing hundreds of dangerous offenders and remained confident its use is lawful.


China's cyber-abuse scandal: is the government unwilling to crack down on exploitation of women online?

The Guardian

When Ming* found a hidden camera in her bedroom, she prayed for a reasonable explanation, wondering whether her boyfriend had placed it there to record memories of their "happy life" together. But hope quickly turned to horror. Ming's boyfriend had been secretly taking sexually exploitative photos of not just Ming and her female friends, but also of other women in other locations, then using AI technology to generate pornographic images of them. After Ming confronted him, he "begged for mercy" but became angry when she refused to forgive him, Ming reportedly told Chinese news outlet Jimu News. Ming is just one of many women in China who have been covertly photographed or filmed โ€“ both in private and public spaces, including toilets โ€“ by voyeurs who have then circulated or sold the images online without consent.


On Conformal Machine Unlearning

arXiv.org Machine Learning

The increasing demand for data privacy, driven by regulations such as GDPR and CCPA, has made Machine Unlearning (MU) essential for removing the influence of specific training samples from machine learning models while preserving performance on retained data. However, most existing MU methods lack rigorous statistical guarantees, rely on heuristic metrics, and often require computationally expensive retraining baselines. To overcome these limitations, we introduce a new definition for MU based on Conformal Prediction (CP), providing statistically sound, uncertainty-aware guarantees without the need for the concept of naive retraining. We formalize conformal criteria that quantify how often forgotten samples are excluded from CP sets, and propose empirical metrics,the Efficiently Covered Frequency (ECF at c) and its complement, the Efficiently Uncovered Frequency (EuCF at d), to measure the effectiveness of unlearning. We further present a practical unlearning method designed to optimize these conformal metrics. Extensive experiments across diverse forgetting scenarios, datasets and models demonstrate the efficacy of our approach in removing targeted data.


Evaluating LLMs on Real-World Forecasting Against Expert Forecasters

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but their ability to forecast future events remains understudied. A year ago, large language models struggle to come close to the accuracy of a human crowd. I evaluate state-of-the-art LLMs on 464 forecasting questions from Metaculus, comparing their performance against top forecasters. Frontier models achieve Brier scores that ostensibly surpass the human crowd but still significantly underperform a group of experts.


Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs

arXiv.org Artificial Intelligence

The increasing adoption of web crawling opt-outs by copyright holders of online content raises critical questions about the impact of data compliance on large language model (LLM) performance. However, little is known about how these restrictions (and the resultant filtering of pretraining datasets) affect the capabilities of models trained using these corpora. In this work, we conceptualize this effect as the $\textit{data compliance gap}$ (DCG), which quantifies the performance difference between models trained on datasets that comply with web crawling opt-outs, and those that do not. We measure the data compliance gap in two settings: pretraining models from scratch and continual pretraining from existing compliant models (simulating a setting where copyrighted data could be integrated later in pretraining). Our experiments with 1.5B models show that, as of January 2025, compliance with web data opt-outs does not degrade general knowledge acquisition (close to 0\% DCG). However, in specialized domains such as biomedical research, excluding major publishers leads to performance declines. These findings suggest that while general-purpose LLMs can be trained to perform equally well using fully open data, performance in specialized domains may benefit from access to high-quality copyrighted sources later in training. Our study provides empirical insights into the long-debated trade-off between data compliance and downstream model performance, informing future discussions on AI training practices and policy decisions. Our website is available at https://data-compliance.github.io/.


Inherent and emergent liability issues in LLM-based agentic systems: a principal-agent perspective

arXiv.org Artificial Intelligence

Agentic systems powered by large language models (LLMs) are becoming progressively more complex and capable. Their increasing agency and expanding deployment settings attract growing attention to effective governance policies, monitoring, and control protocols. Based on the emerging landscape of the agentic market, we analyze potential liability issues arising from the delegated use of LLM agents and their extended systems through a principal-agent perspective. Our analysis complements existing risk-based studies on artificial agency and covers the spectrum of important aspects of the principal-agent relationship and their potential consequences at deployment. Furthermore, we motivate method developments for technical governance along the directions of interpretability and behavior evaluations, reward and conflict management, and the mitigation of misalignment and misconduct through principled engineering of detection and fail-safe mechanisms. By illustrating the outstanding issues in AI liability for LLM-based agentic systems, we aim to inform the system design, auditing, and tracing to enhance transparency and liability attribution.


CF-RAG: A Dataset and Method for Carbon Footprint QA Using Retrieval-Augmented Generation

arXiv.org Artificial Intelligence

Product sustainability reports provide valuable insights into the environmental impacts of a product and are often distributed in PDF format. These reports often include a combination of tables and text, which complicates their analysis. The lack of standardization and the variability in reporting formats further exacerbate the difficulty of extracting and interpreting relevant information from large volumes of documents. In this paper, we tackle the challenge of answering questions related to carbon footprints within sustainability reports available in PDF format. Unlike previous approaches, our focus is on addressing the difficulties posed by the unstructured and inconsistent nature of text extracted from PDF parsing. To facilitate this analysis, we introduce CarbonPDF-QA, an open-source dataset containing question-answer pairs for 1735 product report documents, along with human-annotated answers. Our analysis shows that GPT-4o struggles to answer questions with data inconsistencies. To address this limitation, we propose CarbonPDF, an LLM-based technique specifically designed to answer carbon footprint questions on such datasets. We develop CarbonPDF by fine-tuning Llama 3 with our training data. Our results show that our technique outperforms current state-of-the-art techniques, including question-answering (QA) systems finetuned on table and text data.


Software Fairness Dilemma: Is Bias Mitigation a Zero-Sum Game?

arXiv.org Artificial Intelligence

Fairness is a critical requirement for Machine Learning (ML) software, driving the development of numerous bias mitigation methods. Previous research has identified a leveling-down effect in bias mitigation for computer vision and natural language processing tasks, where fairness is achieved by lowering performance for all groups without benefiting the unprivileged group. However, it remains unclear whether this effect applies to bias mitigation for tabular data tasks, a key area in fairness research with significant real-world applications. This study evaluates eight bias mitigation methods for tabular data, including both widely used and cutting-edge approaches, across 44 tasks using five real-world datasets and four common ML models. Contrary to earlier findings, our results show that these methods operate in a zero-sum fashion, where improvements for unprivileged groups are related to reduced benefits for traditionally privileged groups. However, previous research indicates that the perception of a zero-sum trade-off might complicate the broader adoption of fairness policies. To explore alternatives, we investigate an approach that applies the state-of-the-art bias mitigation method solely to unprivileged groups, showing potential to enhance benefits of unprivileged groups without negatively affecting privileged groups or overall ML performance. Our study highlights potential pathways for achieving fairness improvements without zero-sum trade-offs, which could help advance the adoption of bias mitigation methods.


Strategic Hypothesis Testing

arXiv.org Artificial Intelligence

We examine hypothesis testing within a principal-agent framework, where a strategic agent, holding private beliefs about the effectiveness of a product, submits data to a principal who decides on approval. The principal employs a hypothesis testing rule, aiming to pick a p-value threshold that balances false positives and false negatives while anticipating the agent's incentive to maximize expected profitability. Building on prior work, we develop a game-theoretic model that captures how the agent's participation and reporting behavior respond to the principal's statistical decision rule. Despite the complexity of the interaction, we show that the principal's errors exhibit clear monotonic behavior when segmented by an efficiently computable critical p-value threshold, leading to an interpretable characterization of their optimal p-value threshold. We empirically validate our model and these insights using publicly available data on drug approvals. Overall, our work offers a comprehensive perspective on strategic interactions within the hypothesis testing framework, providing technical and regulatory insights.


Can Large Language Models Bridge the Gap in Environmental Knowledge?

arXiv.org Artificial Intelligence

The investigation employs a standardized tool, the Environmental Knowledge Test (EKT - 19), supple mented by targeted questions, to evaluate the environmental knowledge of university students in comparison to the responses generated by the AI models. The results of this study suggest that while AI models possess a vast, readily accessible, and valid kno wledge base with the potential to empower both students and academic staff, a human discipline specialist in environmental sciences may still be necessary to validate the accuracy of the information provided. Keywords: En vironmental Education; AI Models; EKT - 19 1. Introduction Extreme weather events, increasing global temperatures, rising sea - levels, and changes to ecosystems and biodiversity are all consequences of climate change, which is mostly caused by anthropogenic greenhouse gas emissions ( Masson - Delmotte et al., 2018). Meanwhile, the loss of biodiversity due to habitat degradation, pollution, overexploitation, and invasive species threatens the resilience of society's ecosystems (Nature, 2021). These consequences pose questions regarding food security, public he alth, and socioeconomic stability. Thus, effective access to accurate environmental knowledge is crucial for developing sustainable solutions and informed environmental policies.