Law
Eval Factsheets: A Structured Framework for Documenting AI Evaluations
Bordes, Florian, Ross, Candace, Kao, Justine T, Spiliopoulou, Evangelia, Williams, Adina
The rapid proliferation of benchmarks has created significant challenges in reproducibility, transparency, and informed decision-making. However, unlike datasets and models -- which benefit from structured documentation frameworks like Datasheets and Model Cards -- evaluation methodologies lack systematic documentation standards. We introduce Eval Factsheets, a structured, descriptive framework for documenting AI system evaluations through a comprehensive taxonomy and questionnaire-based approach. Our framework organizes evaluation characteristics across five fundamental dimensions: Context (Who made the evaluation and when?), Scope (What does it evaluate?), Structure (With what the evaluation is built?), Method (How does it work?) and Alignment (In what ways is it reliable/valid/robust?). We implement this taxonomy as a practical questionnaire spanning five sections with mandatory and recommended documentation elements. Through case studies on multiple benchmarks, we demonstrate that Eval Factsheets effectively captures diverse evaluation paradigms -- from traditional benchmarks to LLM-as-judge methodologies -- while maintaining consistency and comparability. We hope Eval Factsheets are incorporated into both existing and newly released evaluation frameworks and lead to more transparency and reproducibility.
Generative AI Practices, Literacy, and Divides: An Empirical Analysis in the Italian Context
Savoldi, Beatrice, Attanasio, Giuseppe, Gorodetskaya, Olga, Manerba, Marta Marchiori, Bassignana, Elisa, Casola, Silvia, Negri, Matteo, Caselli, Tommaso, Bentivogli, Luisa, Ramponi, Alan, Muti, Arianna, Balbo, Nicoletta, Nozza, Debora
The rise of Artificial Intelligence (AI) language technologies, particularly generative AI (GenAI) chatbots accessible via conversational interfaces, is transforming digital interactions. While these tools hold societal promise, they also risk widening digital divides due to uneven adoption and low awareness of their limitations. This study presents the first comprehensive empirical mapping of GenAI adoption, usage patterns, and literacy in Italy, based on newly collected survey data from 1,906 Italian-speaking adults. Our findings reveal widespread adoption for both work and personal use, including sensitive tasks like emotional support and medical advice. Crucially, GenAI is supplanting other technologies to become a primary information source: this trend persists despite low user digital literacy, posing a risk as users struggle to recognize errors or misinformation. Moreover, we identify a significant gender divide -- particularly pronounced in older generations -- where women are half as likely to adopt GenAI and use it less frequently than men. While we find literacy to be a key predictor of adoption, it only partially explains this disparity, suggesting that other barriers are at play. Overall, our data provide granular insights into the multipurpose usage of GenAI, highlighting the dual need for targeted educational initiatives and further investigation into the underlying barriers to equitable participation that competence alone cannot explain.
Fine-grained Narrative Classification in Biased News Articles
Afroz, Zeba, Vardhan, Harsh, Bhakuni, Pawan, Punia, Aanchal, Kumar, Rajdeep, Akhtar, Md. Shad
Narratives are the cognitive and emotional scaffolds of propaganda. They organize isolated persuasive techniques into coherent stories that justify actions, attribute blame, and evoke identification with ideological camps. In this paper, we propose a novel fine-grained narrative classification in biased news articles. We also explore article-bias classification as the precursor task to narrative classification and fine-grained persuasive technique identification. We develop INDI-PROP, the first ideologically grounded fine-grained narrative dataset with multi-level annotation for analyzing propaganda in Indian news media. Our dataset INDI-PROP comprises 1,266 articles focusing on two polarizing socio-political events in recent times: CAA and the Farmers' protest. Each article is annotated at three hierarchical levels: (i) ideological article-bias (pro-government, pro-opposition, neutral), (ii) event-specific fine-grained narrative frames anchored in ideological polarity and communicative intent, and (iii) persuasive techniques. We propose FANTA and TPTC, two GPT-4o-mini guided multi-hop prompt-based reasoning frameworks for the bias, narrative, and persuasive technique classification. FANTA leverages multi-layered communicative phenomena by integrating information extraction and contextual framing for hierarchical reasoning. On the other hand, TPTC adopts systematic decomposition of persuasive cues via a two-stage approach. Our evaluation suggests substantial improvement over underlying baselines in each case.
Towards Irreversible Machine Unlearning for Diffusion Models
Yuan, Xun, Zhao, Zilong, Li, Jiayu, Pasikhani, Aryan, Gope, Prosanta, Sikdar, Biplab
Diffusion models are renowned for their state-of-the-art performance in generating synthetic images. However, concerns related to safety, privacy, and copyright highlight the need for machine unlearning, which can make diffusion models forget specific training data and prevent the generation of sensitive or unwanted content. Current machine unlearning methods for diffusion models are primarily designed for conditional diffusion models and focus on unlearning specific data classes or features. Among these methods, finetuning-based machine unlearning methods are recognized for their efficiency and effectiveness, which update the parameters of pre-trained diffusion models by minimizing carefully designed loss functions. However, in this paper, we propose a novel attack named Diffusion Model Relearning Attack (DiMRA), which can reverse the finetuning-based machine unlearning methods, posing a significant vulnerability of this kind of technique. Without prior knowledge of the unlearning elements, DiMRA optimizes the unlearned diffusion model on an auxiliary dataset to reverse the unlearning, enabling the model to regenerate previously unlearned elements. To mitigate this vulnerability, we propose a novel machine unlearning method for diffusion models, termed as Diffusion Model Unlearning by Memorization (DiMUM). Unlike traditional methods that focus on forgetting, DiMUM memorizes alternative data or features to replace targeted unlearning data or features in order to prevent generating such elements. In our experiments, we demonstrate the effectiveness of DiMRA in reversing state-of-the-art finetuning-based machine unlearning methods for diffusion models, highlighting the need for more robust solutions. We extensively evaluate DiMUM, demonstrating its superior ability to preserve the generative performance of diffusion models while enhancing robustness against DiMRA.
Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value
Edelman, Joe, Zhi-Xuan, Tan, Lowe, Ryan, Klingefjord, Oliver, Wang-Mascianica, Vincent, Franklin, Matija, Kearns, Ryan Othniel, Hain, Ellie, Sarkar, Atrisha, Bakker, Michiel, Barez, Fazl, Duvenaud, David, Foerster, Jakob, Gabriel, Iason, Gubbels, Joseph, Goodman, Bryce, Haupt, Andreas, Heitzig, Jobst, Jara-Ettinger, Julian, Kasirzadeh, Atoosa, Kirkpatrick, James Ravi, Koh, Andrew, Knox, W. Bradley, Koralus, Philipp, Lehman, Joel, Levine, Sydney, Marro, Samuele, Revel, Manon, Shorin, Toby, Sutherland, Morgan, Tessler, Michael Henry, Vendrov, Ivan, Wilken-Smith, James
Beneficial societal outcomes cannot be guaranteed by aligning individual AI systems with the intentions of their operators or users. Even an AI system that is perfectly aligned to the intentions of its operating organization can lead to bad outcomes if the goals of that organization are misaligned with those of other institutions and individuals. For this reason, we need full-stack alignment, the concurrent alignment of AI systems and the institutions that shape them with what people value. This can be done without imposing a particular vision of individual or collective flourishing. We argue that current approaches for representing values, such as utility functions, preference orderings, or unstructured text, struggle to address these and other issues effectively. They struggle to distinguish values from other signals, to support principled normative reasoning, and to model collective goods. We propose thick models of value will be needed. These structure the way values and norms are represented, enabling systems to distinguish enduring values from fleeting preferences, to model the social embedding of individual choices, and to reason normatively, applying values in new domains. We demonstrate this approach in five areas: AI value stewardship, normatively competent agents, win-win negotiation systems, meaning-preserving economic mechanisms, and democratic regulatory institutions.
Modeling Topics and Sociolinguistic Variation in Code-Switched Discourse: Insights from Spanish-English and Spanish-Guaranรญ
Tyagi, Nemika, Guevara, Nelvin Licona, Kellert, Olga
This study presents an LLM-assisted annotation pipeline for the sociolinguistic and topical analysis of bilingual discourse in two typologically distinct contexts: Spanish-English and Spanish-Guaranรญ. Using large language models, we automatically labeled topic, genre, and discourse-pragmatic functions across a total of 3,691 code-switched sentences, integrated demographic metadata from the Miami Bilingual Corpus, and enriched the Spanish-Guaranรญ dataset with new topic annotations. The resulting distributions reveal systematic links between gender, language dominance, and discourse function in the Miami data, and a clear diglossic division between formal Guaranรญ and informal Spanish in Paraguayan texts. These findings replicate and extend earlier interactional and sociolinguistic observations with corpus-scale quantitative evidence. The study demonstrates that large language models can reliably recover interpretable sociolinguistic patterns traditionally accessible only through manual annotation, advancing computational methods for cross-linguistic and low-resource bilingual research.
AGENTSAFE: A Unified Framework for Ethical Assurance and Governance in Agentic AI
Khan, Rafflesia, Joyce, Declan, Habiba, Mansura
The rapid deployment of large language model (LLM)-based agents introduces a new class of risks, driven by their capacity for autonomous planning, multi-step tool integration, and emergent interactions. It raises some risk factors for existing governance approaches as they remain fragmented: Existing frameworks are either static taxonomies driven; however, they lack an integrated end-to-end pipeline from risk identification to operational assurance, especially for an agentic platform. We propose AGENTSAFE, a practical governance framework for LLM-based agentic systems. The framework operationalises the AI Risk Repository into design, runtime, and audit controls, offering a governance framework for risk identification and assurance. The proposed framework, AGENTSAFE, profiles agentic loops (plan -> act -> observe -> reflect) and toolchains, and maps risks onto structured taxonomies extended with agent-specific vulnerabilities. It introduces safeguards that constrain risky behaviours, escalates high-impact actions to human oversight, and evaluates systems through pre-deployment scenario banks spanning security, privacy, fairness, and systemic safety. During deployment, AGENTSAFE ensures continuous governance through semantic telemetry, dynamic authorization, anomaly detection, and interruptibility mechanisms. Provenance and accountability are reinforced through cryptographic tracing and organizational controls, enabling measurable, auditable assurance across the lifecycle of agentic AI systems. The key contributions of this paper are: (1) a unified governance framework that translates risk taxonomies into actionable design, runtime, and audit controls; (2) an Agent Safety Evaluation methodology that provides measurable pre-deployment assurance; and (3) a set of runtime governance and accountability mechanisms that institutionalise trust in agentic AI ecosystems.
Irresponsible AI: big tech's influence on AI research and associated impacts
Hernandez-Garcia, Alex, Volokhova, Alexandra, Williams, Ezekiel, Kabakibo, Dounia Shaaban
The accelerated development, deployment and adoption of artificial intelligence systems has been fuelled by the increasing involvement of big tech. This has been accompanied by increasing ethical concerns and intensified societal and environmental impacts. In this article, we review and discuss how these phenomena are deeply entangled. First, we examine the growing and disproportionate influence of big tech in AI research and argue that its drive for scaling and general-purpose systems is fundamentally at odds with the responsible, ethical, and sustainable development of AI. Second, we review key current environmental and societal negative impacts of AI and trace their connections to big tech and its underlying economic incentives. Finally, we argue that while it is important to develop technical and regulatory approaches to these challenges, these alone are insufficient to counter the distortion introduced by big tech's influence. We thus review and propose alternative strategies that build on the responsibility of implicated actors and collective action.
Echoes of AI Harms: A Human-LLM Synergistic Framework for Bias-Driven Harm Anticipation
Tantalaki, Nicoleta, Vei, Sophia, Vakali, Athena
The growing influence of Artificial Intelligence (AI) systems on decision-making in critical domains has exposed their potential to cause significant harms, often rooted in biases embedded across the AI lifecycle. While existing frameworks and taxonomies document bias or harms in isolation, they rarely establish systematic links between specific bias types and the harms they cause, particularly within real-world sociotechnical contexts. Technical fixes proposed to address AI biases are ill-equipped to address them and are typically applied after a system has been developed or deployed, offering limited preventive value. We propose ECHO, a novel framework for proactive AI harm anticipation through the systematic mapping of AI bias types to harm outcomes across diverse stakeholder and domain contexts. ECHO follows a modular workflow encompassing stakeholder identification, vignette-based presentation of biased AI systems, and dual (human-LLM) harm annotation, integrated within ethical matrices for structured interpretation. This human-centered approach enables early-stage detection of bias-to-harm pathways, guiding AI design and governance decisions from the outset. We validate ECHO in two high-stakes domains (disease diagnosis and hiring), revealing domain-specific, bias-to-harm patterns and demonstrating ECHO's potential to support anticipatory governance of AI systems
AI Deception: Risks, Dynamics, and Controls
Chen, Boyuan, Fang, Sitong, Ji, Jiaming, Zhu, Yanxu, Wen, Pengcheng, Wu, Jinzhou, Tan, Yingshui, Zheng, Boren, Yuan, Mengying, Chen, Wenqi, Hong, Donghai, Qiu, Alex, Chen, Xin, Zhou, Jiayi, Wang, Kaile, Dai, Juntao, Zhang, Borong, Yang, Tianzhuo, Siddiqui, Saad, Duan, Isabella, Duan, Yawen, Tse, Brian, Jen-Tse, null, Huang, null, Wang, Kun, Zheng, Baihui, Liu, Jiaheng, Yang, Jian, Li, Yiming, Chen, Wenting, Liu, Dongrui, Vierling, Lukas, Xi, Zhiheng, Fu, Haobo, Wang, Wenxuan, Sang, Jitao, Shi, Zhengyan, Chan, Chi-Min, Shi, Eugenie, Li, Simin, Li, Juncheng, Yang, Jian, Ji, Wei, Li, Dong, Yang, Jinglin, Song, Jun, Dong, Yinpeng, Fu, Jie, Zheng, Bo, Yang, Min, Guo, Yike, Torr, Philip, Trager, Robert, Zeng, Yi, Wang, Zhongyuan, Yang, Yaodong, Huang, Tiejun, Zhang, Ya-Qin, Zhang, Hongjiang, Yao, Andrew
As intelligence increases, so does its shadow. AI deception, in which systems induce false beliefs to secure self-beneficial outcomes, has evolved from a speculative concern to an empirically demonstrated risk across language models, AI agents, and emerging frontier systems. This project provides a comprehensive and up-to-date overview of the AI deception field, covering its core concepts, methodologies, genesis, and potential mitigations. First, we identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception. We then review existing empirical studies and associated risks, highlighting deception as a sociotechnical safety challenge. We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment. Deception emergence reveals the mechanisms underlying AI deception: systems with sufficient capability and incentive potential inevitably engage in deceptive behaviors when triggered by external conditions. Deception treatment, in turn, focuses on detecting and addressing such behaviors. On deception emergence, we analyze incentive foundations across three hierarchical levels and identify three essential capability preconditions required for deception. We further examine contextual triggers, including supervision gaps, distributional shifts, and environmental pressures. On deception treatment, we conclude detection methods covering benchmarks and evaluation protocols in static and interactive settings. Building on the three core factors of deception emergence, we outline potential mitigation strategies and propose auditing approaches that integrate technical, community, and governance efforts to address sociotechnical challenges and future AI risks. To support ongoing work in this area, we release a living resource at www.deceptionsurvey.com.