Goto

Collaborating Authors

 lifecycle


The SMART+ Framework for AI Systems

Kandikatla, Laxmiraju, Radeljic, Branislav

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) systems are now an integral part of multiple industries. In clinical research, AI supports automated adverse event detection in clinical trials, patient eligibility screening for protocol enrollment, and data quality validation. Beyond healthcare, AI is transforming finance through real-time fraud detection, automated loan risk assessment, and algorithmic decision-making. Similarly, in manufacturing, AI enables predictive maintenance to reduce equipment downtime, enhances quality control through computer-vision inspection, and optimizes production workflows using real-time operational data. While these technologies enhance operational efficiency, they introduce new challenges regarding safety, accountability, and regulatory compliance. To address these concerns, we introduce the SMART+ Framework - a structured model built on the pillars of Safety, Monitoring, Accountability, Reliability, and Transparency, and further enhanced with Privacy & Security, Data Governance, Fairness & Bias, and Guardrails. SMART+ offers a practical, comprehensive approach to evaluating and governing AI systems across industries. This framework aligns with evolving mechanisms and regulatory guidance to integrate operational safeguards, oversight procedures, and strengthened privacy and governance controls. SMART+ demonstrates risk mitigation, trust-building, and compliance readiness. By enabling responsible AI adoption and ensuring auditability, SMART+ provides a robust foundation for effective AI governance in clinical research.


AGENTSAFE: A Unified Framework for Ethical Assurance and Governance in Agentic AI

Khan, Rafflesia, Joyce, Declan, Habiba, Mansura

arXiv.org Artificial Intelligence

The rapid deployment of large language model (LLM)-based agents introduces a new class of risks, driven by their capacity for autonomous planning, multi-step tool integration, and emergent interactions. It raises some risk factors for existing governance approaches as they remain fragmented: Existing frameworks are either static taxonomies driven; however, they lack an integrated end-to-end pipeline from risk identification to operational assurance, especially for an agentic platform. We propose AGENTSAFE, a practical governance framework for LLM-based agentic systems. The framework operationalises the AI Risk Repository into design, runtime, and audit controls, offering a governance framework for risk identification and assurance. The proposed framework, AGENTSAFE, profiles agentic loops (plan -> act -> observe -> reflect) and toolchains, and maps risks onto structured taxonomies extended with agent-specific vulnerabilities. It introduces safeguards that constrain risky behaviours, escalates high-impact actions to human oversight, and evaluates systems through pre-deployment scenario banks spanning security, privacy, fairness, and systemic safety. During deployment, AGENTSAFE ensures continuous governance through semantic telemetry, dynamic authorization, anomaly detection, and interruptibility mechanisms. Provenance and accountability are reinforced through cryptographic tracing and organizational controls, enabling measurable, auditable assurance across the lifecycle of agentic AI systems. The key contributions of this paper are: (1) a unified governance framework that translates risk taxonomies into actionable design, runtime, and audit controls; (2) an Agent Safety Evaluation methodology that provides measurable pre-deployment assurance; and (3) a set of runtime governance and accountability mechanisms that institutionalise trust in agentic AI ecosystems.


Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning

Gao, Minghe, Li, Juncheng, Lin, Yuze, Liu, Xuqi, Ji, Jiaming, Pan, Xiaoran, Xu, Zihan, Li, Xian, Li, Mingjie, Ji, Wei, Wei, Rong, Tang, Rui, Wang, Qizhou, Shen, Kai, Xiao, Jun, Wu, Qi, Tang, Siliang, Zhuang, Yueting

arXiv.org Artificial Intelligence

W e contend that embodied learning is fundamentally a life-cycle problem rather than a single-stage optimization. Systems that optimize only one link (data collection, simulation, learning, or deployment) rarely sustain improvement or generalize beyond narrow settings. W e introduce Arcadia, a closed-loop framework that operational-izes embodied lifelong learning by tightly coupling four stages: (1) Self-evolving exploration and grounding for autonomous data acquisition in physical environments, (2) Generative scene reconstruction and augmentation for realistic and extensible scene creation, (3) a Shared embodied representation architecture that unifies navigation and manipulation within a single multimodal backbone, and (4) Sim-from-real evaluation and evolution that closes the feedback loop through simulation-based adaptation. This coupling is non-decomposable: removing any stage breaks the improvement loop and reverts to one-shot training. Arcadia delivers consistent gains on navigation and manipulation benchmarks and transfers robustly to physical robots, indicating that a tightly coupled lifecycle: continuous real-world data acquisition, generative simulation update, and shared-representation learning, supports lifelong improvement and end-to-end generalization. W e release standardized interfaces enabling reproducible evaluation and cross-model comparison in reusable environments, positioning Arcadia as a scalable foundation for general-purpose embodied agents.


Embedding Explainable AI in NHS Clinical Safety: The Explainability-Enabled Clinical Safety Framework (ECSF)

Gigiu, Robert

arXiv.org Artificial Intelligence

Artificial intelligence (AI) is increasingly embedded in NHS workflows, but its probabilistic and adaptive behaviour conflicts with the deterministic assumptions underpinning existing clinical-safety standards. DCB0129 and DCB0160 provide strong governance for conventional software yet do not define how AI-specific transparency, interpretability, or model drift should be evidenced within Safety Cases, Hazard Logs, or post-market monitoring. This paper proposes an Explainability-Enabled Clinical Safety Framework (ECSF) that integrates explainability into the DCB0129/0160 lifecycle, enabling Clinical Safety Officers to use interpretability outputs as structured safety evidence without altering compliance pathways. A cross-regulatory synthesis mapped DCB clauses to principles from Good Machine Learning Practice, the NHS AI Assurance and T.E.S.T. frameworks, and the EU AI Act. The resulting matrix links regulatory clauses, principles, ECSF checkpoints, and suitable explainability outputs. ECSF introduces five checkpoints: global transparency for hazard identification, case-level interpretability for verification, clinician usability for evaluation, traceable decision pathways for risk control, and longitudinal interpretability monitoring for post-market surveillance. Techniques such as SHAP, LIME, Integrated Gradients, saliency mapping, and attention visualisation are mapped to corresponding DCB artefacts. ECSF reframes explainability as a core element of clinical-safety assurance, bridging deterministic risk governance with the probabilistic behaviour of AI and supporting alignment with GMLP, the EU AI Act, and NHS AI Assurance principles.


Multimodal Deep Learning for ATCO Command Lifecycle Modeling and Workload Prediction

Tan, Kaizhen

arXiv.org Artificial Intelligence

-- Air traffic controllers (ATCOs) issue high - intensity voice commands in dense airspace, where accurate workload modeling is critical for safety and efficiency. This paper proposes a multimodal deep learning framework that integrates structured data, trajectory sequences, and image features to estimate two key parameters in the ATCO command lifecycle: the time offset between a command and the resulting aircraft maneuver, and the command duration. A hi gh - quality dataset was constructed, with maneuver points detected using sliding window and histogram - based methods. A CNN - Transformer ensemble model was developed for accurate, generalizable, and interpretable predictions. By linking trajectories to voice commands, this work offers the first model of its kind to support intelligent command generation and provides practical value for workload assessment, staffing, and scheduling. A. Background As global air traffic demand increases, airspace operations have become more complex and congested, presenting major challenges for air traffic control (ATC) systems. Although surveillance and communication technologies have improved, ATC performance still largely depends on human operators, particularly air traffic controllers (ATCOs), who monitor flights, assess conditions, and issue maneuver instructions to ensure safe and efficient operations. This human bottleneck has become a key constraint on ATC efficiency and safety, emphasizing the importance of quantifying task intensity and evaluating workload to support fatigue management, staff scheduling, and the development of in telligent ATC solutions . Early studies on ATCO workload modeling primarily focused on statistical methods and subjective assessments such as NASA Task Load Index (NASA - TLX) [1] .


A robust methodology for long-term sustainability evaluation of Machine Learning models

Paz-Ruza, Jorge, Gama, João, Alonso-Betanzos, Amparo, Guijarro-Berdiñas, Bertha

arXiv.org Artificial Intelligence

Among the many desirable properties of Artificial Intelligence systems, sustainability and efficiency have become increasingly important in the context of worsening climate change, massive water use in data centres, and the need for simpler, faster models in IoT settings. Consequently, there have been not few attempts to both promote and regulate the sustainability of Machine Learning models; the EU's AI Act indicates that the sustainability of AI - in terms of its environmental and social footprint-should be considered when developing and deploying AI pipelines [1], and manifests like that of UNESCO highlight sustainability as one of the core principles of the broader Responsible AI paradigm [2]. However, this seemingly consensual agreement on the importance of sustainability and efficiency for real-world AI systems and the social and regulatory efforts heavily contrasts with the practical applicability of such regulations; without looking further, the AI Act itself defines the requirement for sustainability, but does not indicate what metrics and evaluation pipelines should be considered for a robust, reliable, and practically relevant assessment of the environmental impact of a model. We argue that this lack of comprehensiveness in sustainability recommendations for AI systems does not stem from a careless or sloppy construction of the regulations themselves, but rather from an actual absence of suitable evaluation protocols that are formal, model-agnostic, reproducible, and grounded in real-life usage protocols for the ML lifecycle. The authors of this preprint are aware of a single regulatory standard for measuring AI sustainability, namely UNE 0086 [3], which limits evaluation to the epoch-batch training paradigm of supervised learning systems, rendering it useless for any task or type of learning that deviates from that standard. Although many researchers and companies have made it a habit to report efficiency figures and comparisons (e.g., in terms of emitted CO


Co-Producing AI: Toward an Augmented, Participatory Lifecycle

Mushkani, Rashid, Berard, Hugo, Ammar, Toumadher, Chatonnier, Cassandre, Koseki, Shin

arXiv.org Artificial Intelligence

Despite efforts to mitigate the inherent risks and biases of artificial intelligence (AI) algorithms, these algorithms can disproportionately impact culturally marginalized groups. A range of approaches has been proposed to address or reduce these risks, including the development of ethical guidelines and principles for responsible AI, as well as technical solutions that promote algorithmic fairness. Drawing on design justice, expansive learning theory, and recent empirical work on participatory AI, we argue that mitigating these harms requires a fundamental re-architecture of the AI production pipeline. This re-design should center co-production, diversity, equity, inclusion (DEI), and multidisciplinary collaboration. We introduce an augmented AI lifecycle consisting of five interconnected phases: co-framing, co-design, co-implementation, co-deployment, and co-maintenance. The lifecycle is informed by four multidisciplinary workshops and grounded in themes of distributed authority and iterative knowledge exchange. Finally, we relate the proposed lifecycle to several leading ethical frameworks and outline key research questions that remain for scaling participatory governance.


Out-of-Distribution Detection for Safety Assurance of AI and Autonomous Systems

Hodge, Victoria J., Paterson, Colin, Habli, Ibrahim

arXiv.org Artificial Intelligence

The operational capabilities and application domains of AI-enabled autonomous systems have expanded significantly in recent years due to advances in robotics and machine learning (ML). Demonstrating the safety of autonomous systems rigorously is critical for their responsible adoption but it is challenging as it requires robust methodologies that can handle novel and uncertain situations throughout the system lifecycle, including detecting out-of-distribution (OoD) data. Thus, OOD detection is receiving increased attention from the research, development and safety engineering communities. This comprehensive review analyses OOD detection techniques within the context of safety assurance for autonomous systems, in particular in safety-critical domains. We begin by defining the relevant concepts, investigating what causes OOD and exploring the factors which make the safety assurance of autonomous systems and OOD detection challenging. Our review identifies a range of techniques which can be used throughout the ML development lifecycle and we suggest areas within the lifecycle in which they may be used to support safety assurance arguments. We discuss a number of caveats that system and safety engineers must be aware of when integrating OOD detection into system lifecycles. We conclude by outlining the challenges and future work necessary for the safe development and operation of autonomous systems across a range of domains and applications.


Embedding the MLOps Lifecycle into OT Reference Models

Schindler, Simon, Binder, Christoph, Lürzer, Lukas, Huber, Stefan

arXiv.org Artificial Intelligence

Machine Learning Operations (MLOps) practices are increas- ingly adopted in industrial settings, yet their integration with Opera- tional Technology (OT) systems presents significant challenges. This pa- per analyzes the fundamental obstacles in combining MLOps with OT en- vironments and proposes a systematic approach to embed MLOps prac- tices into established OT reference models. We evaluate the suitability of the Reference Architectural Model for Industry 4.0 (RAMI 4.0) and the International Society of Automation Standard 95 (ISA-95) for MLOps integration and present a detailed mapping of MLOps lifecycle compo- nents to RAMI 4.0 exemplified by a real-world use case. Our findings demonstrate that while standard MLOps practices cannot be directly transplanted to OT environments, structured adaptation using existing reference models can provide a pathway for successful integration.


The Cost-Benefit of Interdisciplinarity in AI for Mental Health

Drakos, Katerina, Paraschou, Eva, Toplu, Simay, Clemmensen, Line Harder, Lütge, Christoph, Lønfeldt, Nicole Nadine, Das, Sneha

arXiv.org Artificial Intelligence

Artificial intelligence has been introduced as a way to improve access to mental health support. However, most AI mental health chatbots rely on a limited range of disciplinary input, and fail to integrate expertise across the chatbot's lifecycle. This paper examines the cost-benefit trade-off of interdisciplinary collaboration in AI mental health chatbots. We argue that involving experts from technology, healthcare, ethics, and law across key lifecycle phases is essential to ensure value-alignment and compliance with the high-risk requirements of the AI Act. We also highlight practical recommendations and existing frameworks to help balance the challenges and benefits of interdisciplinarity in mental health chatbots.