Goto

Collaborating Authors

 accountability


Words Without Consequence

The Atlantic - Technology

What does it mean to have speech without a speaker? For the first time, speech has been decoupled from consequence. We now live alongside AI systems that converse knowledgeably and persuasively--deploying claims about the world, explanations, advice, encouragement, apologies, and promises--while bearing no vulnerability for what they say. Millions of people already rely on chatbots powered by large language models, and have integrated these synthetic interlocutors into their personal and professional lives. An LLM's words shape our beliefs, decisions, and actions, yet no speaker stands behind them. This dynamic is already familiar in everyday use. A chatbot gets something wrong. When corrected, it apologizes and changes its answer.


Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples

Neural Information Processing Systems

Learning controllers with offline data in decision-making systems is an essential area of research due to its potential to reduce the risk of applications in real-world systems. However, in responsibility-sensitive settings such as healthcare, decision accountability is of paramount importance, yet has not been adequately addressed by the literature.This paper introduces the Accountable Offline Controller (AOC) that employs the offline dataset as the Decision Corpus and performs accountable control based on a tailored selection of examples, referred to as the Corpus Subset. AOC operates effectively in low-data scenarios, can be extended to the strictly offline imitation setting, and displays qualities of both conservation and adaptability.We assess AOC's performance in both simulated and real-world healthcare scenarios, emphasizing its capability to manage offline control tasks with high levels of performance while maintaining accountability.


The SMART+ Framework for AI Systems

Kandikatla, Laxmiraju, Radeljic, Branislav

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) systems are now an integral part of multiple industries. In clinical research, AI supports automated adverse event detection in clinical trials, patient eligibility screening for protocol enrollment, and data quality validation. Beyond healthcare, AI is transforming finance through real-time fraud detection, automated loan risk assessment, and algorithmic decision-making. Similarly, in manufacturing, AI enables predictive maintenance to reduce equipment downtime, enhances quality control through computer-vision inspection, and optimizes production workflows using real-time operational data. While these technologies enhance operational efficiency, they introduce new challenges regarding safety, accountability, and regulatory compliance. To address these concerns, we introduce the SMART+ Framework - a structured model built on the pillars of Safety, Monitoring, Accountability, Reliability, and Transparency, and further enhanced with Privacy & Security, Data Governance, Fairness & Bias, and Guardrails. SMART+ offers a practical, comprehensive approach to evaluating and governing AI systems across industries. This framework aligns with evolving mechanisms and regulatory guidance to integrate operational safeguards, oversight procedures, and strengthened privacy and governance controls. SMART+ demonstrates risk mitigation, trust-building, and compliance readiness. By enabling responsible AI adoption and ensuring auditability, SMART+ provides a robust foundation for effective AI governance in clinical research.


AGENTSAFE: A Unified Framework for Ethical Assurance and Governance in Agentic AI

Khan, Rafflesia, Joyce, Declan, Habiba, Mansura

arXiv.org Artificial Intelligence

The rapid deployment of large language model (LLM)-based agents introduces a new class of risks, driven by their capacity for autonomous planning, multi-step tool integration, and emergent interactions. It raises some risk factors for existing governance approaches as they remain fragmented: Existing frameworks are either static taxonomies driven; however, they lack an integrated end-to-end pipeline from risk identification to operational assurance, especially for an agentic platform. We propose AGENTSAFE, a practical governance framework for LLM-based agentic systems. The framework operationalises the AI Risk Repository into design, runtime, and audit controls, offering a governance framework for risk identification and assurance. The proposed framework, AGENTSAFE, profiles agentic loops (plan -> act -> observe -> reflect) and toolchains, and maps risks onto structured taxonomies extended with agent-specific vulnerabilities. It introduces safeguards that constrain risky behaviours, escalates high-impact actions to human oversight, and evaluates systems through pre-deployment scenario banks spanning security, privacy, fairness, and systemic safety. During deployment, AGENTSAFE ensures continuous governance through semantic telemetry, dynamic authorization, anomaly detection, and interruptibility mechanisms. Provenance and accountability are reinforced through cryptographic tracing and organizational controls, enabling measurable, auditable assurance across the lifecycle of agentic AI systems. The key contributions of this paper are: (1) a unified governance framework that translates risk taxonomies into actionable design, runtime, and audit controls; (2) an Agent Safety Evaluation methodology that provides measurable pre-deployment assurance; and (3) a set of runtime governance and accountability mechanisms that institutionalise trust in agentic AI ecosystems.


Hillsborough police report 'may not give answers'

BBC News

Hillsborough police report'may not give answers' Families of some of those killed in the Hillsborough disaster fear they may once again be denied full accountability as the long-delayed report into police conduct surrounding the stadium crush is due to be published on Tuesday. Several people who worked on the Independent Office for Police Conduct (IOPC) investigation - including a former director - have told the BBC they doubt the report will deliver all the answers survivors and bereaved relatives were promised. Some have warned that it may lead to accusations of another Hillsborough cover-up. Families have also criticised the length and cost of the investigation - the largest of its kind ever carried out in England and Wales. The police watchdog has spent more than 13 years examining the actions of South Yorkshire Police and other forces in the aftermath of the 1989 disaster in which 97 Liverpool supporters were killed during an FA Cup semi-final at Sheffield Wednesday's Hillsborough ground.


EvalCards: A Framework for Standardized Evaluation Reporting

Dhar, Ruchira, Villegas, Danae Sanchez, Karamolegkou, Antonia, Schiavone, Alice, Yuan, Yifei, Chen, Xinyi, Li, Jiaang, Frank, Stella, De Grazia, Laura, Swain, Monorama, Brandl, Stephanie, Hershcovich, Daniel, Søgaard, Anders, Elliott, Desmond

arXiv.org Artificial Intelligence

Evaluation has long been a central concern in NLP, and transparent reporting practices are more critical than ever in today's landscape of rapidly released open-access models. Drawing on a survey of recent work on evaluation and documentation, we identify three persistent shortcomings in current reporting practices: reproducibility, accessibility, and governance. We argue that existing standardization efforts remain insufficient and introduce Evaluation Disclosure Cards (EvalCards) as a path forward. EvalCards are designed to enhance transparency for both researchers and practitioners while providing a practical foundation to meet emerging governance requirements.


Rigor in AI: Doing Rigorous AI Work Requires a Broader, Responsible AI-Informed Conception of Rigor

Olteanu, Alexandra, Blodgett, Su Lin, Balayn, Agathe, Wang, Angelina, Diaz, Fernando, Calmon, Flavio du Pin, Mitchell, Margaret, Ekstrand, Michael, Binns, Reuben, Barocas, Solon

arXiv.org Artificial Intelligence

In AI research and practice, rigor remains largely understood in terms of methodological rigor -- such as whether mathematical, statistical, or computational methods are correctly applied. We argue that this narrow conception of rigor has contributed to the concerns raised by the responsible AI community, including overblown claims about the capabilities of AI systems. Our position is that a broader conception of what rigorous AI research and practice should entail is needed. We believe such a conception -- in addition to a more expansive understanding of (1) methodological rigor -- should include aspects related to (2) what background knowledge informs what to work on (epistemic rigor); (3) how disciplinary, community, or personal norms, standards, or beliefs influence the work (normative rigor); (4) how clearly articulated the theoretical constructs under use are (conceptual rigor); (5) what is reported and how (reporting rigor); and (6) how well-supported the inferences from existing evidence are (interpretative rigor). In doing so, we also provide useful language and a framework for much-needed dialogue about the AI community's work by researchers, policymakers, journalists, and other stakeholders.


Hybrid Neuro-Symbolic Models for Ethical AI in Risk-Sensitive Domains

Kolli, Chaitanya Kumar

arXiv.org Artificial Intelligence

Artificial intelligence deployed in risk-sensitive domains such as healthcare, finance, and security must not only achieve predictive accuracy but also ensure transparency, ethical alignment, and compliance with regulatory expectations. Hybrid neuro symbolic models combine the pattern-recognition strengths of neural networks with the interpretability and logical rigor of symbolic reasoning, making them well-suited for these contexts. This paper surveys hybrid architectures, ethical design considerations, and deployment patterns that balance accuracy with accountability. We highlight techniques for integrating knowledge graphs with deep inference, embedding fairness-aware rules, and generating human-readable explanations. Through case studies in healthcare decision support, financial risk management, and autonomous infrastructure, we show how hybrid systems can deliver reliable and auditable AI. Finally, we outline evaluation protocols and future directions for scaling neuro symbolic frameworks in complex, high stakes environments.


Agentifying Agentic AI

Dignum, Virginia, Dignum, Frank

arXiv.org Artificial Intelligence

Agentic AI seeks to endow systems with sustained autonomy, reasoning, and interaction capabilities. To realize this vision, its assumptions about agency must be complemented by explicit models of cognition, cooperation, and governance. This paper argues that the conceptual tools developed within the Autonomous Agents and Multi-Agent Systems (AAMAS) community, such as BDI architectures, communication protocols, mechanism design, and institutional modelling, provide precisely such a foundation. By aligning adaptive, data-driven approaches with structured models of reasoning and coordination, we outline a path toward agentic systems that are not only capable and flexible, but also transparent, cooperative, and accountable. The result is a perspective on agency that bridges formal theory and practical autonomy.