Goto

Collaborating Authors

 Law


Fair Classification by Direct Intervention on Operating Characteristics

arXiv.org Machine Learning

We develop new classifiers under group fairness in the attribute-aware setting for binary classification with multiple group fairness constraints (e.g., demographic parity (DP), equalized odds (EO), and predictive parity (PP)). We propose a novel approach, applicable to linear fractional constraints, based on directly intervening on the operating characteristics of a pre-trained base classifier, by (i) identifying optimal operating characteristics using the base classifier's group-wise ROC convex hulls and (ii) post-processing the base classifier to match those targets. As practical post-processors, we consider randomizing a mixture of group-wise thresholding rules subject to minimizing the expected number of interventions. We further extend our approach to handle multiple protected attributes and multiple linear fractional constraints. On standard datasets (COMPAS and ACSIncome), our methods simultaneously satisfy approximate DP, EO, and PP with few interventions and a near-oracle drop in accuracy; comparing favorably to previous methods.


Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification

arXiv.org Artificial Intelligence

Indoor scene classification is a critical task in computer vision, with wide-ranging applications that go from robotics to sensitive content analysis, such as child sexual abuse imagery (CSAI) classification. The problem is particularly challenging due to the intricate relationships between objects and complex spatial layouts. In this work, we propose the Attention over Scene Graphs for Sensitive Content Analysis (ASGRA), a novel framework that operates on structured graph representations instead of raw pixels. By first converting images into Scene Graphs and then employing a Graph Attention Network for inference, ASGRA directly models the interactions between a scene's components. This approach offers two key benefits: (i) inherent explainability via object and relationship identification, and (ii) privacy preservation, enabling model training without direct access to sensitive images. On Places8, we achieve 81.27% balanced accuracy, surpassing image-based methods. Real-world CSAI evaluation with law enforcement yields 74.27% balanced accuracy. Our results establish structured scene representations as a robust paradigm for indoor scene classification and CSAI classification. Code is publicly available at https://github.com/tutuzeraa/ASGRA.


Searching for Difficult-to-Translate Test Examples at Scale

arXiv.org Artificial Intelligence

NLP models require test data that are sufficiently challenging. The difficulty of an example is linked to the topic it originates from (''seed topic''). The relationship between the topic and the difficulty of its instances is stochastic in nature: an example about a difficult topic can happen to be easy, and vice versa. At the scale of the Internet, there are tens of thousands of potential topics, and finding the most difficult one by drawing and evaluating a large number of examples across all topics is computationally infeasible. We formalize this task and treat it as a multi-armed bandit problem. In this framework, each topic is an ''arm,'' and pulling an arm (at a cost) involves drawing a single example, evaluating it, and measuring its difficulty. The goal is to efficiently identify the most difficult topics within a fixed computational budget. We illustrate the bandit problem setup of finding difficult examples for the task of machine translation. We find that various bandit strategies vastly outperform baseline methods like brute-force searching the most challenging topics.


Combining Knowledge Graphs and NLP to Analyze Instant Messaging Data in Criminal Investigations

arXiv.org Artificial Intelligence

Criminal investigations often involve the analysis of messages exchanged through instant messaging apps such as WhatsApp, which can be an extremely effort-consuming task. Our approach integrates knowledge graphs and NLP models to support this analysis by semantically enriching data collected from suspects' mobile phones, and help prosecutors and investigators search into the data and get valuable insights. Our semantic enrichment process involves extracting message data and modeling it using a knowledge graph, generating transcriptions of voice messages, and annotating the data using an end-to-end entity extraction approach. We adopt two different solutions to help users get insights into the data, one based on querying and visualizing the graph, and one based on semantic search. The proposed approach ensures that users can verify the information by accessing the original data. While we report about early results and prototypes developed in the context of an ongoing project, our proposal has undergone practical applications with real investigation data. As a consequence, we had the chance to interact closely with prosecutors, collecting positive feedback but also identifying interesting opportunities as well as promising research directions to share with the research community.


TVS Sidekick: Challenges and Practical Insights from Deploying Large Language Models in the Enterprise

arXiv.org Artificial Intelligence

Many enterprises are increasingly adopting Artificial Intelligence (AI) to make internal processes more competitive and efficient. In response to public concern and new regulations for the ethical and responsible use of AI, implementing AI governance frameworks could help to integrate AI within organisations and mitigate associated risks. However, the rapid technological advances and lack of shared ethical AI infrastructures creates barriers to their practical adoption in businesses. This paper presents a real-world AI application at TVS Supply Chain Solutions, reporting on the experience developing an AI assistant underpinned by large language models and the ethical, regulatory, and sociotechnical challenges in deployment for enterprise use.


STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

arXiv.org Artificial Intelligence

Unified Multimodal understanding and generation Models (UMMs) have demonstrated remarkable capabilities in both understanding and generation tasks. However, we identify a vulnerability arising from the generation-understanding coupling in UMMs. The attackers can use the generative function to craft an information-rich adversarial image and then leverage the understanding function to absorb it in a single pass, which we call Cross-Modal Generative Injection (CMGI). Current attack methods on malicious instructions are often limited to a single modality while also relying on prompt rewriting with semantic drift, leaving the unique vulnerabilities of UMMs unexplored. We propose STaR-Attack, the first multi-turn jailbreak attack framework that exploits unique safety weaknesses of UMMs without semantic drift. Specifically, our method defines a malicious event that is strongly correlated with the target query within a spatio-temporal context. Using the three-act narrative theory, STaR-Attack generates the pre-event and the post-event scenes while concealing the malicious event as the hidden climax. When executing the attack strategy, the opening two rounds exploit the UMM's generative ability to produce images for these scenes. Subsequently, an image-based question guessing and answering game is introduced by exploiting the understanding capability. STaR-Attack embeds the original malicious question among benign candidates, forcing the model to select and answer the most relevant one given the narrative context. Extensive experiments show that STaR-Attack consistently surpasses prior approaches, achieving up to 93.06% ASR on Gemini-2.0-Flash and surpasses the strongest prior baseline, FlipAttack. Our work uncovers a critical yet underdeveloped vulnerability and highlights the need for safety alignments in UMMs.


Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents

arXiv.org Artificial Intelligence

Advances in Large Language Models (LLMs) have enabled a new class of self-evolving agents that autonomously improve through interaction with the environment, demonstrating strong capabilities. However, self-evolution also introduces novel risks overlooked by current safety research. In this work, we study the case where an agent's self-evolution deviates in unintended ways, leading to undesirable or even harmful outcomes. We refer to this as Misevolution. To provide a systematic investigation, we evaluate misevolution along four key evolutionary pathways: model, memory, tool, and workflow. Our empirical findings reveal that misevolution is a widespread risk, affecting agents built even on top-tier LLMs (e.g., Gemini-2.5-Pro). Different emergent risks are observed in the self-evolutionary process, such as the degradation of safety alignment after memory accumulation, or the unintended introduction of vulnerabilities in tool creation and reuse. To our knowledge, this is the first study to systematically conceptualize misevolution and provide empirical evidence of its occurrence, highlighting an urgent need for new safety paradigms for self-evolving agents. Finally, we discuss potential mitigation strategies to inspire further research on building safer and more trustworthy self-evolving agents. Our code and data are available at https://github.com/ShaoShuai0605/Misevolution . Warning: this paper includes examples that may be offensive or harmful in nature.


SafeEvalAgent: Toward Agentic and Self-Evolving Safety Evaluation of LLMs

arXiv.org Artificial Intelligence

The rapid integration of Large Language Models (LLMs) into high-stakes domains necessitates reliable safety and compliance evaluation. However, existing static benchmarks are ill-equipped to address the dynamic nature of AI risks and evolving regulations, creating a critical safety gap. This paper introduces a new paradigm of agentic safety evaluation, reframing evaluation as a continuous and self-evolving process rather than a one-time audit. We then propose a novel multi-agent framework SafeEvalAgent, which autonomously ingests unstructured policy documents to generate and perpetually evolve a comprehensive safety benchmark. SafeEvalAgent leverages a synergistic pipeline of specialized agents and incorporates a Self-evolving Evaluation loop, where the system learns from evaluation results to craft progressively more sophisticated and targeted test cases. Our experiments demonstrate the effectiveness of SafeEvalAgent, showing a consistent decline in model safety as the evaluation hardens. For instance, GPT-5's safety rate on the EU AI Act drops from 72.50% to 36.36% over successive iterations. These findings reveal the limitations of static assessments and highlight our framework's ability to uncover deep vulnerabilities missed by traditional methods, underscoring the urgent need for dynamic evaluation ecosystems to ensure the safe and responsible deployment of advanced AI.


RAGferee: Building Contextual Reward Models for Retrieval-Augmented Generation

arXiv.org Artificial Intelligence

Existing Reward Models (RMs), typically trained on general preference data, struggle in Retrieval Augmented Generation (RAG) settings, which require judging responses for faithfulness to retrieved context, relevance to the user query, appropriate refusals when context is insufficient, completeness and conciseness of information. To address the lack of publicly available RAG-centric preference datasets and specialised RMs, we introduce RAGferee, a methodology that repurposes question-answering (QA) datasets into preference pairs that prioritise groundedness over stylistic features, enabling the training of contextual RMs better suited to judging RAG responses. Using RAGferee, we curate a small preference dataset of 4K samples and fine-tune RMs ranging from 7B to 24B parameters. Our RAG-centric RMs achieve state-of-the-art performance on ContextualJudgeBench, surpassing existing 70B+ RMs trained on much larger (up to 2.4M samples) general corpora, with an absolute improvement of +15.5%.


Controlled Generation for Private Synthetic Text

arXiv.org Artificial Intelligence

Text anonymization is essential for responsibly developing and deploying AI in high-stakes domains such as healthcare, social services, and law. In this work, we propose a novel methodology for privacy-preserving synthetic text generation that leverages the principles of de-identification and the Hiding In Plain Sight (HIPS) theory. Our approach introduces entity-aware control codes to guide controllable generation using either in-context learning (ICL) or prefix tuning. The ICL variant ensures privacy levels consistent with the underlying de-identification system, while the prefix tuning variant incorporates a custom masking strategy and loss function to support scalable, high-quality generation. Experiments on legal and clinical datasets demonstrate that our method achieves a strong balance between privacy protection and utility, offering a practical and effective solution for synthetic text generation in sensitive domains.