Goto

Collaborating Authors

 checksum


The STAR-XAI Protocol: A Framework for Inducing and Verifying Agency, Reasoning, and Reliability in AI Agents

arXiv.org Artificial Intelligence

The "black box" nature of Large Reasoning Models (LRMs) presents critical limitations in reliability and transparency, fueling the debate around the "illusion of thinking" and the challenge of state hallucinations in agentic systems. In response, we introduce The STAR-XAI Protocol (Socratic, Transparent, Agentic, Reasoning - for eXplainable Artificial Intelligence), a novel operational methodology for training and operating verifiably reliable AI agents. Our method reframes the human-AI interaction as a structured Socratic dialogue governed by an explicit, evolving symbolic rulebook (the Consciousness Transfer Package - CTP) and a suite of integrity protocols, including a state-locking Checksum that eradicates internal state corruption. Through an exhaustive case study in the complex strategic game "Caps i Caps," we demonstrate that this "Clear Box" framework transforms an opaque LRM into a disciplined strategist. The agent not only exhibits the emergence of complex tactics, such as long-term planning, but also achieves ante-hoc transparency by justifying its intentions before acting. Crucially, it demonstrates Second-Order Agency by identifying and correcting flaws in its own supervisor-approved plans, leading to empirically-proven, 100% reliable state tracking and achieving "zero hallucinations by design." The STAR-XAI Protocol thus offers a practical pathway toward building AI agents that are not just high-performing but intrinsically auditable, trustworthy, and reliable.


Agentic JWT: A Secure Delegation Protocol for Autonomous AI Agents

arXiv.org Artificial Intelligence

Abstract-- Autonomous LLM agents can issue thousands of API calls per hour without human oversight. OAuth 2.0 assumes deterministic clients, but in agentic settings stochastic reasoning, prompt injection, or multi-agent orchestration can silently expand privileges. This paper describes Agentic JWT (A-JWT), a dual-faceted token design that binds each agent action to a cryptographically verifiable user intent and optionally to a workflow step. A-JWT carries an agent's identity as a one-way checksum hash derived from its prompt, tools and configuration and a chained delegation assertion to prove which downstream agent may execute a given task. The design also uses per-agent proof-of-possession keys to prevent replay and in-process impersonation. The paper introduces a new unique authorization grant called'agent_checksum' and adds a lightweight client shim library that self-verifies code at run time, mints intent tokens, tracks workflow steps and derives keys thus enabling secure agent identity and separation even within a single process. We illustrate a comprehensive threat model for agentic applications, implement a Python proof-of-concept, and show functional blocking of scope-violating requests, replay, impersonation, and prompt-injection pathways with sub-millisecond overhead on commodity hardware. The design aligns with ongoing OAuth agent discussions and offers a drop-in path toward zero-trust guarantees for agentic applications. A comprehensive performance and security evaluation with experimental results will appear in our forthcoming journal submission. I. Introduction AI Agents are not a theoretical phenomenon anymore. Large enterprises now use AI agents [1], to possibly execute millions of API calls per hour. Major cloud LLMs now serve hundreds of millions of API requests per day, for example Baidu's ERNIE handles approximately 200 M daily queries, providing the raw horsepower that agent frameworks build on [2], yet those calls still ride on OAuth tokens designed for deterministic clients. A quick peek into the scale of operations and future trends would reveal that the volume of AI Agent activity has grown dramatically, underscoring their operational impact. Baidu's large volume of API calls per day has seen a 4 fold increase in just a few months [2]. A recent cloud survey found OpenAI/Azure AI services are used in 67% of cloud deployments, alongside a rise in self-hosted AI models across 75% of organizations [3].


FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention

arXiv.org Artificial Intelligence

Transformer models rely on High-Performance Computing (HPC) resources for inference, where soft errors are inevitable in large-scale systems, making the reliability of the model particularly critical. Existing fault tolerance frameworks for Transformers are designed at the operation level without architectural optimization, leading to significant computational and memory overhead, which in turn reduces protection efficiency and limits scalability to larger models. In this paper, we implement module-level protection for Transformers by treating the operations within the attention module as a single kernel and applying end-to-end fault tolerance. This method provides unified protection across multi-step computations, while achieving comprehensive coverage of potential errors in the nonlinear computations. For linear modules, we design a strided algorithm-based fault tolerance (ABFT) that avoids inter-thread communication. Experimental results show that our end-to-end fault tolerance achieves up to 7.56x speedup over traditional methods with an average fault tolerance overhead of 13.9%.


Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers

arXiv.org Artificial Intelligence

Transformers and large language models (LLMs), powered by the attention mechanism, have transformed numerous AI applications, driving the need for specialized hardware accelerators. A major challenge in these accelerators is efficiently detecting errors caused by random hardware faults. Traditional algorithm-based fault tolerance (ABFT) techniques verify individual matrix multiplications but fall short in handling the full attention mechanism, particularly due to intermediate softmax normalization. This work proposes Flash-ABFT, a novel method that computes an online checksum across the entire three-matrix product of query, key and value matrices, of an attention layer, including the softmax operation, with a single check. This approach significantly reduces overhead by eliminating redundant checks while maintaining high fault-detection accuracy. Experimental results demonstrate that Flash-ABFT incurs only 5.3% hardware area overhead and less than 1.9% energy overhead, making it a cost-effective and robust solution for error detection in attention accelerators.


Jailbreaking to Jailbreak

arXiv.org Artificial Intelligence

Refusal training on Large Language Models (LLMs) prevents harmful outputs, yet this defense remains vulnerable to both automated and human-crafted jailbreaks. We present a novel LLM-as-red-teamer approach in which a human jailbreaks a refusal-trained LLM to make it willing to jailbreak itself or other LLMs. We refer to the jailbroken LLMs as $J_2$ attackers, which can systematically evaluate target models using various red teaming strategies and improve its performance via in-context learning from the previous failures. Our experiments demonstrate that Sonnet 3.5 and Gemini 1.5 pro outperform other LLMs as $J_2$, achieving 93.0% and 91.0% attack success rates (ASRs) respectively against GPT-4o (and similar results across other capable LLMs) on Harmbench. Our work not only introduces a scalable approach to strategic red teaming, drawing inspiration from human red teamers, but also highlights jailbreaking-to-jailbreak as an overlooked failure mode of the safeguard. Specifically, an LLM can bypass its own safeguards by employing a jailbroken version of itself that is willing to assist in further jailbreaking. To prevent any direct misuse with $J_2$, while advancing research in AI safety, we publicly share our methodology while keeping specific prompting details private.


GCN-ABFT: Low-Cost Online Error Checking for Graph Convolutional Networks

arXiv.org Artificial Intelligence

Graph convolutional networks (GCNs) are popular for building machine-learning application for graph-structured data. This widespread adoption led to the development of specialized GCN hardware accelerators. In this work, we address a key architectural challenge for GCN accelerators: how to detect errors in GCN computations arising from random hardware faults with the least computation cost. Each GCN layer performs a graph convolution, mathematically equivalent to multiplying three matrices, computed through two separate matrix multiplications. Existing Algorithm-based Fault Tolerance(ABFT) techniques can check the results of individual matrix multiplications. However, for a GCN layer, this check should be performed twice. To avoid this overhead, this work introduces GCN-ABFT that directly calculates a checksum for the entire three-matrix product within a single GCN layer, providing a cost-effective approach for error detection in GCN accelerators. Experimental results demonstrate that GCN-ABFT reduces the number of operations needed for checksum computation by over 21% on average for representative GCN applications. These savings are achieved without sacrificing fault-detection accuracy, as evidenced by the presented fault-injection analysis.


Light-Weight Fault Tolerant Attention for Large Language Model Training

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable performance in various natural language processing tasks. However, the training of these models is computationally intensive and susceptible to faults, particularly in the attention mechanism, which is a critical component of transformer-based LLMs. In this paper, we investigate the impact of faults on LLM training, focusing on INF, NaN, and near-INF values in the computation results with systematic fault injection experiments. We observe the propagation patterns of these errors, which can trigger non-trainable states in the model and disrupt training, forcing the procedure to load from checkpoints. To mitigate the impact of these faults, we propose ATTNChecker, the first Algorithm-Based Fault Tolerance (ABFT) technique tailored for the attention mechanism in LLMs. ATTNChecker is designed based on fault propagation patterns of LLM and incorporates performance optimization to adapt to both system reliability and model vulnerability while providing lightweight protection for fast LLM training. Evaluations on four LLMs show that ATTNChecker on average incurs on average 7% overhead on training while detecting and correcting all extreme errors. Compared with the state-of-the-art checkpoint/restore approach, ATTNChecker reduces recovery overhead by up to 49x.


Never a dill moment: Exploiting machine learning pickle files

#artificialintelligence

Many machine learning (ML) models are Python pickle files under the hood, and it makes sense. The use of pickling conserves memory, enables start-and-stop model training, and makes trained models portable (and, thereby, shareable). Pickling is easy to implement, is built into Python without requiring additional dependencies, and supports serialization of custom objects. There's little doubt about why choosing pickling for persistence is a popular practice among Python programmers and ML practitioners. Pre-trained models are typically treated as "free" byproducts of ML since they allow the valuable intellectual property like algorithms and corpora that produced the model to remain private.


Never a dill moment: Exploiting machine learning pickle files - Security Boulevard

#artificialintelligence

Many machine learning (ML) models are Python pickle files under the hood, and it makes sense. The use of pickling conserves memory, enables start-and-stop model training, and makes trained models portable (and, thereby, shareable). Pickling is easy to implement, is built into Python without requiring additional dependencies, and supports serialization of custom objects. There's little doubt about why choosing pickling for persistence is a popular practice among Python programmers and ML practitioners. Pre-trained models are typically treated as "free" byproducts of ML since they allow the valuable intellectual property like algorithms and corpora that produced the model to remain private.