Government
Beyond Satisfaction: From Placebic to Actionable Explanations For Enhanced Understandability
Shymanski, Joe, Brue, Jacob, Sen, Sandip
Explainable AI (XAI) presents useful tools to facilitate transparency and trustworthiness in machine learning systems. However, current evaluations of system explainability often rely heavily on subjective user surveys, which may not adequately capture the effectiveness of explanations. This paper critiques the overreliance on user satisfaction metrics and explores whether these can differentiate between meaningful (actionable) and vacuous (placebic) explanations. In experiments involving optimal Social Security filing age selection tasks, participants used one of three protocols: no explanations, placebic explanations, and actionable explanations. Participants who received actionable explanations significantly outperformed the other groups in objective measures of their mental model, but users rated placebic and actionable explanations as equally satisfying. This suggests that subjective surveys alone fail to capture whether explanations truly support users in building useful domain understanding. We propose that future evaluations of agent explanation capabilities should integrate objective task performance metrics alongside subjective assessments to more accurately measure explanation quality.
SUGAR: A Sweeter Spot for Generative Unlearning of Many Identities
Nguyen, Dung Thuy, Nguyen, Quang, Robinette, Preston K., Jiang, Eli, Johnson, Taylor T., Leach, Kevin
Recent advances in 3D-aware generative models have enabled high-fidelity image synthesis of human identities. However, this progress raises urgent questions around user consent and the ability to remove specific individuals from a model's output space. W e address this by introducing SUGAR, a framework for scalable generative unlearning that enables the removal of many identities (simultaneously or sequentially) without retraining the entire model. Rather than projecting unwanted identities to unrealistic outputs or relying on static template faces, SUGAR learns a personalized surrogate latent for each identity, diverting reconstructions to visually coherent alternatives while preserving the model's quality and diversity. W e further introduce a continual utility preservation objective that guards against degradation as more identities are forgotten. SUGAR achieves state-of-the-art performance in removing up to 200 identities, while delivering up to a 700% improvement in retention utility compared to existing baselines.
BEACON: A Unified Behavioral-Tactical Framework for Explainable Cybercrime Analysis with Large Language Models
Sachdeva, Arush, Saravanan, Rajendraprasad, Sarkar, Gargi, Vemuri, Kavita, Shukla, Sandeep Kumar
Cybercrime has emerged as one of the most pervasive and economically destructive consequences of global digitalization. Contemporary online fraud and deception-based crimes now account for unprecedented financial losses worldwide, exceeding trillions of United States dollars (USD) annually (Morgan, 2016), while also inflicting severe psychological, social, and reputational harm on victims. Unlike classical cyberattacks targeting systems and networks, modern cybercrime increasingly exploits human vulnerabilities rather than purely technical weaknesses, relying on deception, persuasion, impersonation, emotional coercion, and trust manipulation as primary attack vectors (Holt, 2019; Yao, Zheng, Wu, Wu, Gao, Wang and Yang, 2025; Sarkar and Shukla, 2023; Sarkar, Singh, Kumar and Shukla, 2023). Existing cybersecurity frameworks, such as the Cyber Kill Chain and the MITRE ATT&CK framework, provide powerful abstractions for understanding technically sophisticated cyberattacks targeting enterprise systems and critical infrastructure (MITRE Corporation, 2025b,a). However, these models are fundamentally system-centric: they describe how adversaries compromise digital infrastructure, escalate privileges, and exfiltrate data. In contrast, cybercrime, particularly scams, fraud, impersonation, and extortion, primarily targets individual decision-making processes (Louderback and Antonaccio, 2017), often without exploiting any software vulnerability at all. Consequently, the investigative needs of cybercrime differ substantially from those of traditional cyberattacks.
Smart Spatial Planning in Egypt: An Algorithm-Driven Approach to Public Service Evaluation in Qena City
Shamroukh, Mohamed, Aziz, Mohamed Alkhuzamy
The availability and sophistication degree of such services are fair measures of progress for any city. In this context, Geographic information systems " GIS " offers solutions that support the decision - making processes regarding management, planning and distribution of services, ultimately improving the standard of living in cities (Aziz, 2007, p. 11). Investigating services planning standards is one of the most relevant issues concerning human progress regarding its proper definition and needs. Planning standards can be reconsidered by studying the variation in the distribution of geographical phenomena and the characteristi cs of geographic areas. More effort should be exerted in defining these standards parallel to the characteristics of each region. Such efforts will facilitate appropriate allocation s of services and accurate definitions of future developmental efforts. The problem of the study is that the planning standards are not suitable for the characteristics of the Egyptian cities, which include more population and intensive daily use of services. The solution to this problem is to create new planning standards that suit the rapidly changing nature of cities, and to generate these criteria current services and their intensity and the built - up areas are going to be used to reflect the characteristics of the city, taking this abroach is a new way to generate such criteria. This study attempts to derive planning standards for public services in the city of Qena that are compatible with the characteristics of the city, the geographical distribution of the population, the built - up area, and the services therein.
AgenticCyber: A GenAI-Powered Multi-Agent System for Multimodal Threat Detection and Adaptive Response in Cybersecurity
The increasing complexity of cyber threats in distributed environments demands advanced frameworks for real-time detection and response across multimodal data streams. This paper introduces AgenticCyber, a generative AI powered multi-agent system that orchestrates specialized agents to monitor cloud logs, surveillance videos, and environmental audio concurrently. The solution achieves 96.2% F1-score in threat detection, reduces response latency to 420 ms, and enables adaptive security posture management using multimodal language models like Google's Gemini coupled with LangChain for agent orchestration. Benchmark datasets, such as AWS CloudTrail logs, UCF-Crime video frames, and UrbanSound8K audio clips, show greater performance over standard intrusion detection systems, reducing mean time to respond (MTTR) by 65% and improving situational awareness. This work introduces a scalable, modular proactive cybersecurity architecture for enterprise networks and IoT ecosystems that overcomes siloed security technologies with cross-modal reasoning and automated remediation.
Web Technologies Security in the AI Era: A Survey of CDN-Enhanced Defenses
Hosain, Mehrab, Shuvo, Sabbir Alom, Ogbe, Matthew, Mazumder, Md Shah Jalal, Rahman, Yead, Hakim, Md Azizul, Pandey, Anukul
The modern web stack, which is dominated by browser-based applications and API-first backends, now operates under an adversarial equilibrium where automated, AI-assisted attacks evolve continuously. Content Delivery Networks (CDNs) and edge computing place programmable defenses closest to users and bots, making them natural enforcement points for machine-learning (ML) driven inspection, throttling, and isolation. This survey synthesizes the landscape of AI-enhanced defenses deployed at the edge: (i) anomaly- and behavior-based Web Application Firewalls (WAFs) within broader Web Application and API Protection (WAAP), (ii) adaptive DDoS detection and mitigation, (iii) bot management that resists human-mimicry, and (iv) API discovery, positive security modeling, and encrypted-traffic anomaly analysis. We add a systematic survey method, a threat taxonomy mapped to edge-observable signals, evaluation metrics, deployment playbooks, and governance guidance. We conclude with a research agenda spanning XAI, adversarial robustness, and autonomous multi-agent defense. Our findings indicate that edge-centric AI measurably improves time-to-detect and time-to-mitigate while reducing data movement and enhancing compliance, yet introduces new risks around model abuse, poisoning, and governance.
Why They Disagree: Decoding Differences in Opinions about AI Risk on the Lex Fridman Podcast
Truong, Nghi, Puranam, Phanish, Koรงak, รzgecan
The emergence of transformative technologies often surfaces deep societal divisions, nowhere more evident than in contemporary debates about artificial intelligence (AI). A striking feature of these divisions is that they persist despite shared interests in ensuring that AI benefits humanity and avoiding catastrophic outcomes. This paper analyzes contemporary debates about AI risk, parsing the differences between the "doomer" and "boomer" perspectives into definitional, factual, causal, and moral premises to identify key points of contention. We find that differences in perspectives about existential risk ("X-risk") arise fundamentally from differences in causal premises about design vs. emergence in complex systems, while differences in perspectives about employment risks ("E-risks") pertain to different causal premises about the applicability of past theories (evolution) vs their inapplicability (revolution). Disagreements about these two forms of AI risk appear to share two properties: neither involves significant disagreements on moral values and both can be described in terms of differing views on the extent of boundedness of human rationality. Our approach to analyzing reasoning chains at scale, using an ensemble of LLMs to parse textual data, can be applied to identify key points of contention in debates about risk to the public in any arena.
Semantic Faithfulness and Entropy Production Measures to Tame Your LLM Demons and Manage Hallucinations
Evaluating faithfulness of Large Language Models (LLMs) to a given task is a complex challenge. We propose two new unsupervised metrics for faithfulness evaluation using insights from information theory and thermodynamics. Our approach treats an LLM as a bipartite information engine where hidden layers act as a Maxwell demon controlling transformations of context $C $ into answer $A$ via prompt $Q$. We model Question-Context-Answer (QCA) triplets as probability distributions over shared topics. Topic transformations from $C$ to $Q$ and $A$ are modeled as transition matrices ${\bf Q}$ and ${\bf A}$ encoding the query goal and actual result, respectively. Our semantic faithfulness (SF) metric quantifies faithfulness for any given QCA triplet by the Kullback-Leibler (KL) divergence between these matrices. Both matrices are inferred simultaneously via convex optimization of this KL divergence, and the final SF metric is obtained by mapping the minimal divergence onto the unit interval [0,1], where higher scores indicate greater faithfulness. Furthermore, we propose a thermodynamics-based semantic entropy production (SEP) metric in answer generation, and show that high faithfulness generally implies low entropy production. The SF and SEP metrics can be used jointly or separately for LLM evaluation and hallucination control. We demonstrate our framework on LLM summarization of corporate SEC 10-K filings.
The Loss of Control Playbook: Degrees, Dynamics, and Preparedness
Stix, Charlotte, Hallensleben, Annika, Ortega, Alejandro, Pistillo, Matteo
This research report addresses the absence of an actionable definition for Loss of Control (LoC) in AI systems by developing a novel taxonomy and preparedness framework. Despite increasing policy and research attention, existing LoC definitions vary significantly in scope and timeline, hindering effective LoC assessment and mitigation. To address this issue, we draw from an extensive literature review and propose a graded LoC taxonomy, based on the metrics of severity and persistence, that distinguishes between Deviation, Bounded LoC, and Strict LoC. We model pathways toward a societal state of vulnerability in which sufficiently advanced AI systems have acquired or could acquire the means to cause Bounded or Strict LoC once a catalyst, either misalignment or pure malfunction, materializes. We argue that this state becomes increasingly likely over time, absent strategic intervention, and propose a strategy to avoid reaching a state of vulnerability. Rather than focusing solely on intervening on AI capabilities and propensities potentially relevant for LoC or on preventing potential catalysts, we introduce a complementary framework that emphasizes three extrinsic factors: Deployment context, Affordances, and Permissions (the DAP framework). Compared to work on intrinsic factors and catalysts, this framework has the unfair advantage of being actionable today. Finally, we put forward a plan to maintain preparedness and prevent the occurrence of LoC outcomes should a state of societal vulnerability be reached, focusing on governance measures (threat modeling, deployment policies, emergency response) and technical controls (pre-deployment testing, control measures, monitoring) that could maintain a condition of perennial suspension.
Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping
Mujtaba, Dena, Hu, Brian, Hoogs, Anthony, Basharat, Arslan
The deployment of decision-making AI agents presents a critical challenge in maintaining alignment with human values or guidelines while operating in complex, dynamic environments. Agents trained solely to achieve their objectives may adopt harmful behavior, exposing a key trade-off between maximizing the reward function and maintaining alignment. For pre-trained agents, ensuring alignment is particularly challenging, as retraining can be a costly and slow process. This is further complicated by the diverse and potentially conflicting attributes representing the ethical values for alignment. To address these challenges, we propose a test-time alignment technique based on model-guided policy shaping. Our method allows precise control over individual behavioral attributes, generalizes across diverse reinforcement learning (RL) environments, and facilitates a principled trade-off between ethical alignment and reward maximization without requiring agent retraining. We evaluate our approach using the MACHIAVELLI benchmark, which comprises 134 text-based game environments and thousands of annotated scenarios involving ethical decisions. The RL agents are first trained to maximize the reward in their respective games. At test time, we apply policy shaping via scenario-action attribute classifiers to ensure decision alignment with ethical attributes. We compare our approach against prior training-time methods and general-purpose agents, as well as study several types of ethical violations and power-seeking behavior. Our results demonstrate that test-time policy shaping provides an effective and scalable solution for mitigating unethical behavior across diverse environments and alignment attributes.