Goto

Collaborating Authors

 regulatory requirement


Enabling Ethical AI: A case study in using Ontological Context for Justified Agentic AI Decisions

McGee, Liam, Harvey, James, Cull, Lucy, Vermeulen, Andreas, Visscher, Bart-Floris, Sharan, Malvika

arXiv.org Artificial Intelligence

Agentic AI systems, software agents with autonomy, decision-making ability, and adaptability, are increasingly used to execute complex tasks on behalf of organisations. Most such systems rely on Large Language Models (LLMs), whose broad semantic capabilities enable powerful language processing but lack explicit, institution-specific grounding. In enterprises, data rarely comes with an inspectable semantic layer, and constructing one typically requires labour-intensive "data archaeology": cleaning, modelling, and curating knowledge into ontologies, taxonomies, and other formal structures. At the same time, explainability methods such as saliency maps expose an "interpretability gap": they highlight what the model attends to but not why, leaving decision processes opaque. In this preprint, we present a case study, developed by Kaiasm and Avantra AI through their work with The Turing Way Practitioners Hub, a forum developed under the InnovateUK BridgeAI program. This study presents a collaborative human-AI approach to building an inspectable semantic layer for Agentic AI. AI agents first propose candidate knowledge structures from diverse data sources; domain experts then validate, correct, and extend these structures, with their feedback used to improve subsequent models. Authors show how this process captures tacit institutional knowledge, improves response quality and efficiency, and mitigates institutional amnesia. We argue for a shift from post-hoc explanation to justifiable Agentic AI, where decisions are grounded in explicit, inspectable evidence and reasoning accessible to both experts and non-specialists.


Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models

Berger, Armin, Hillebrand, Lars, Leonhard, David, Deußer, Tobias, de Oliveira, Thiago Bell Felix, Dilmaghani, Tim, Khaled, Mohamed, Kliem, Bernd, Loitz, Rüdiger, Bauckhage, Christian, Sifa, Rafet

arXiv.org Artificial Intelligence

The auditing of financial documents, historically a labor-intensive process, stands on the precipice of transformation. AI-driven solutions have made inroads into streamlining this process by recommending pertinent text passages from financial reports to align with the legal requirements of accounting standards. However, a glaring limitation remains: these systems commonly fall short in verifying if the recommended excerpts indeed comply with the specific legal mandates. Hence, in this paper, we probe the efficiency of publicly available Large Language Models (LLMs) in the realm of regulatory compliance across different model configurations. We place particular emphasis on comparing cutting-edge open-source LLMs, such as Llama-2, with their proprietary counterparts like OpenAI's GPT models. This comparative analysis leverages two custom datasets provided by our partner PricewaterhouseCoopers (PwC) Germany. We find that the open-source Llama-2 70 billion model demonstrates outstanding performance in detecting non-compliance or true negative occurrences, beating all their proprietary counterparts. Nevertheless, proprietary models such as GPT-4 perform the best in a broad variety of scenarios, particularly in non-English contexts.


AI-Supported Platform for System Monitoring and Decision-Making in Nuclear Waste Management with Large Language Models

Chang, Dongjune, Kim, Sola, Park, Young Soo

arXiv.org Artificial Intelligence

Argonne National Laboratory ABSTRACT Nuclear waste management requires rigorous regulatory compliance assessment, demanding advanced decision - support systems capable of addressing complex legal, environmental, and safety considerations. This paper presents a multi - agent Retrieval - Augmented Generation (RAG) system that integrates large language models (LLMs) with document retrieval mechanisms to enhance decision accuracy through structured agent collaboration. Through a structured 10 - round discussion model, agents collaborate to assess regulatory compliance and safety requirements while maintaining document - grounded responses. A case study of a proposed temporary nuclear waste storage site near Winslow, Arizona, demonstrates the framework ' s effectiveness. Results show the Regulatory Agent achieves consistently higher relevance scores in maintaining alignment with legal frameworks, while the Safety Agent effectively manages complex risk assessments requi ring multifaceted analysis. The system demonstrates progressive improvement in agreement rates between agents across discussion rounds while semantic drift decreases, indicating enhanced decision - making consistency and response coherence. The system ensure s regulatory decisions remain factually grounded, dynamically adapting to evolving regulatory frameworks through real - time document retrieval. By balancing automated assessment with human oversight, this framework offers a scalable and transparent approach to regulatory governance. Future research will explore multi - modal data integration and reinforcement learning to enhance response coherence and decision efficiency. These findings underscore the potential of AI - driven, multi - agent systems in advancing ev idence - based, accountable, and adaptive decision - making for high - stakes environmental management scenarios.


Adaptive PII Mitigation Framework for Large Language Models

Asthana, Shubhi, Mahindru, Ruchi, Zhang, Bing, Sanz, Jorge

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) faces growing challenges from evolving data protection laws and enforcement practices worldwide. Regulations like GDPR and CCPA impose strict compliance requirements on Machine Learning (ML) models, especially concerning personal data use. These laws grant individuals rights such as data correction and deletion, complicating the training and deployment of Large Language Models (LLMs) that rely on extensive datasets. Public data availability does not guarantee its lawful use for ML, amplifying these challenges. This paper introduces an adaptive system for mitigating risk of Personally Identifiable Information (PII) and Sensitive Personal Information (SPI) in LLMs. It dynamically aligns with diverse regulatory frameworks and integrates seamlessly into Governance, Risk, and Compliance (GRC) systems. The system uses advanced NLP techniques, context-aware analysis, and policy-driven masking to ensure regulatory compliance. Benchmarks highlight the system's effectiveness, with an F1 score of 0.95 for Passport Numbers, outperforming tools like Microsoft Presidio (0.33) and Amazon Comprehend (0.54). In human evaluations, the system achieved an average user trust score of 4.6/5, with participants acknowledging its accuracy and transparency. Observations demonstrate stricter anonymization under GDPR compared to CCPA, which permits pseudonymization and user opt-outs. These results validate the system as a scalable and robust solution for enterprise privacy compliance.


NLP-based Regulatory Compliance -- Using GPT 4.0 to Decode Regulatory Documents

Kumar, Bimal, Roussinov, Dmitri

arXiv.org Artificial Intelligence

Large Language Models (LLMs) such as GPT-4.0 have shown significant promise in addressing the semantic complexities of regulatory documents, particularly in detecting inconsistencies and contradictions. This study evaluates GPT-4.0's ability to identify conflicts within regulatory requirements by analyzing a curated corpus with artificially injected ambiguities and contradictions, designed in collaboration with architects and compliance engineers. Using metrics such as precision, recall, and F1 score, the experiment demonstrates GPT-4.0's effectiveness in detecting inconsistencies, with findings validated by human experts. The results highlight the potential of LLMs to enhance regulatory compliance processes, though further testing with larger datasets and domain-specific fine-tuning is needed to maximize accuracy and practical applicability. Future work will explore automated conflict resolution and real-world implementation through pilot projects with industry partners.


COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act

Guldimann, Philipp, Spiridonov, Alexander, Staab, Robin, Jovanović, Nikola, Vero, Mark, Vechev, Velko, Gueorguieva, Anna, Balunović, Mislav, Konstantinov, Nikola, Bielik, Pavol, Tsankov, Petar, Vechev, Martin

arXiv.org Artificial Intelligence

The EU's Artificial Intelligence Act (AI Act) is a significant step towards responsible AI development, but lacks clear technical interpretation, making it difficult to assess models' compliance. This work presents COMPL-AI, a comprehensive framework consisting of (i) the first technical interpretation of the EU AI Act, translating its broad regulatory requirements into measurable technical requirements, with the focus on large language models (LLMs), and (ii) an open-source Act-centered benchmarking suite, based on thorough surveying and implementation of state-of-the-art LLM benchmarks. By evaluating 12 prominent LLMs in the context of COMPL-AI, we reveal shortcomings in existing models and benchmarks, particularly in areas like robustness, safety, diversity, and fairness. This work highlights the need for a shift in focus towards these aspects, encouraging balanced development of LLMs and more comprehensive regulation-aligned benchmarks. Simultaneously, COMPL-AI for the first time demonstrates the possibilities and difficulties of bringing the Act's obligations to a more concrete, technical level. As such, our work can serve as a useful first step towards having actionable recommendations for model providers, and contributes to ongoing efforts of the EU to enable application of the Act, such as the drafting of the GPAI Code of Practice.


New strategies to manage clinical trial risk

#artificialintelligence

It is essential for healthcare and pharmaceutical companies to be aware of both critical and non-critical risks when conducting quality clinical trials. However, managing both takes time and money -- resources that clinical teams are often strapped for. Additionally, the risks that organisations define at the start of the trial may change, meaning the data they need to collect will also change. In order to address these challenges, researchers must break down silos and create a centralised process for monitoring and managing risk. Many organisations are turning to risk-based quality management (RBQM) practices to make that happen.


How Do You Define Unfair Bias in AI? G.R. Jenkin & Associates

#artificialintelligence

Art is subjective and everyone has their own opinion about it. When I saw the expressionist painting Blue Poles, by Jackson Pollock, I was reminded of the famous quote by Rudyard Kipling, "It's clever, but is it Art?" Pollock's piece looks like paint messily spilled onto a drop sheet protecting the floor. The debate of what constitutes art has a long history that will probably never be settled, there is no definitive definition of art. Similarly, there is no broadly accepted objective definition for the quality of a piece of art, with the closest definition being from Orson Welles, "I don't know anything about art but I know what I like." Similarly, people recognize unfair bias when they see it, but it is quite difficult to create a single objective definition.


The Next ChatGPT Revolution: Intelligent Document Processing

#artificialintelligence

ChatGPT, the state-of-the-art language model developed by OpenAI, is poised to have a significant impact on the B2B industry. This powerful technology has the potential to disrupt traditional business processes and open up new opportunities for companies across a wide range of industries when it comes to intelligent document processing. One of the key areas where ChatGPT is likely to have an impact is in automating routine tasks and customer interactions. Another area where ChatGPT is likely to be disruptive is in the generation of written content. This technology can be used to quickly and accurately generate reports, product descriptions, and other written materials.


How Do You Define Unfair Bias in AI?

#artificialintelligence

Art is subjective and everyone has their own opinion about it. When I saw the expressionist painting Blue Poles, by Jackson Pollock, I was reminded of the famous quote by Rudyard Kipling, "It's clever, but is it Art?" Pollock's piece looks like paint messily spilled onto a drop sheet protecting the floor. The debate of what constitutes art has a long history that will probably never be settled, there is no definitive definition of art. Similarly, there is no broadly accepted objective definition for the quality of a piece of art, with the closest definition being from Orson Welles, "I don't know anything about art but I know what I like." Similarly, people recognize unfair bias when they see it, but it is quite difficult to create a single objective definition.