Government
Artificial Intelligence Applications in Horizon Scanning for Infectious Diseases
Miles, Ian, Wakimoto, Mayumi, Meira, Wagner Jr., Paula, Daniela, Ticiane, Daylene, Rosa, Bruno, Biddulph, Jane, Georgiou, Stelios, Ermida, Valdir
This review explores the integration of Artificial Intelligence into Horizon Scanning, focusing on identifying and responding to emerging threats and opportunities linked to Infectious Diseases. We examine how AI tools can enhance signal detection, data monitoring, scenario analysis, and decision support. We also address the risks associated with AI adoption and propose strategies for effective implementation and governance. The findings contribute to the growing body of Foresight literature by demonstrating the potential and limitations of AI in Public Health preparedness.
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
Nghiem, Huy, Panda, Swetasudha, Khatwani, Devashish, Nguyen, Huy V., Kenthapadi, Krishnaram, Daumรฉ, Hal III
Large Language Models (LLMs) are increasingly used in healthcare, yet ensuring their safety and trustworthiness remains a barrier to deployment. Conversational medical assistants must avoid unsafe compliance without over-refusing benign queries. We present an iterative post-deployment alignment framework that applies Kahneman-Tversky Optimization (KTO) and Direct Preference Optimization (DPO) to refine models against domain-specific safety signals. Using the CARES-18K benchmark for adversarial robustness, we evaluate four LLMs (Llama-3B/8B, Meditron-8B, Mistral-7B) across multiple cycles. Our results show up to 42% improvement in safety-related metrics for harmful query detection, alongside interesting trade-offs against erroneous refusals, thereby exposing architecture-dependent calibration biases. We also perform ablation studies to identify when self-evaluation is reliable and when external or finetuned judges are necessary to maximize performance gains. Our findings underscore the importance of adopting best practices that balance patient safety, user trust, and clinical utility in the design of conversational medical assistants.
RippleBench: Capturing Ripple Effects Using Existing Knowledge Repositories
Rinberg, Roy, Bhalla, Usha, Shilov, Igor, Calmon, Flavio P., Gandikota, Rohit
Targeted interventions on language models, such as unlearning, debiasing, or model editing, are a central method for refining model behavior and keeping knowledge up to date. While these interventions aim to modify specific information within models (e.g., removing virology content), their effects often propagate to related but unintended areas (e.g., allergies); these side-effects are commonly referred to as the ripple effect. In this work, we present RippleBench-Maker, an automatic tool for generating Q&A datasets that allow for the measurement of ripple effects in any model-editing task. RippleBench-Maker builds on a Wikipedia-based RAG pipeline (WikiRAG) to generate multiple-choice questions at varying semantic distances from the target concept (e.g., the knowledge being unlearned). Using this framework, we construct RippleBench-Bio, a benchmark derived from the WMDP (Weapons of Mass Destruction Paper) dataset, a common unlearning benchmark. We evaluate eight state-of-the-art unlearning methods and find that all exhibit non-trivial accuracy drops on topics increasingly distant from the unlearned knowledge, each with distinct propagation profiles. To support ongoing research, we release our codebase for on-the-fly ripple evaluation, along with the benchmark, RippleBench-Bio.
Towards Contextual Sensitive Data Detection
Telkamp, Liang, Hulsebos, Madelon
The emergence of open data portals necessitates more attention to protecting sensitive data before datasets get published and exchanged. While an abundance of methods for suppressing sensitive data exist, the conceptualization of sensitive data and methods to detect it, focus particularly on personal data that, if disclosed, may be harmful or violate privacy. We observe the need for refining and broadening our definitions of sensitive data, and argue that the sensitivity of data depends on its context. Based on this definition, we introduce two mechanisms for contextual sensitive data detection that consider the broader context of a dataset at hand. First, we introduce type contextualization, which first detects the semantic type of particular data values, then considers the overall context of the data values within the dataset or document. Second, we introduce domain contextualization which determines sensitivity of a given dataset in the broader context based on the retrieval of relevant rules from documents that specify data sensitivity (e.g., data topic and geographic origin). Experiments with these mechanisms, assisted by large language models (LLMs), confirm that: 1) type-contextualization significantly reduces the number of false positives for type-based sensitive data detection and reaches a recall of 94% compared to 63% with commercial tools, and 2) domain-contextualization leveraging sensitivity rule retrieval is effective for context-grounded sensitive data detection in non-standard data domains such as humanitarian datasets. Evaluation with humanitarian data experts also reveals that context-grounded LLM explanations provide useful guidance in manual data auditing processes, improving consistency. We open-source mechanisms and annotated datasets for contextual sensitive data detection at https://github.com/trl-lab/sensitive-data-detection.
ReVeal-MT: A Physics-Informed Neural Network for Multi-Transmitter Radio Environment Mapping
Shahid, Mukaram, Das, Kunal, Ushaq, Hadia, Zhang, Hongwei, Song, Jiming, Qiao, Daji, Babu, Sarath, Guan, Yong, Zhu, Zhengyuan, Ahmad, Arsalan
This manuscript has been submitted for peer review and possible publication in an IEEE journal. The content herein represents the version prepared by the authors and may be subject to further revision during the review. Abstract--Accurately mapping the radio environment (e.g., identifying wireless signal strength at specific frequency bands and geographic locations) is crucial for efficient spectrum sharing, enabling Secondary Users (SUs) to access underutilized spectrum bands while protecting Primary Users (PUs). While existing models have made progress, they often degrade in performance when multiple transmitters coexist, due to the compounded effects of shadowing, interference from adjacent transmitters. T o address this challenge, we extend our prior work on Physics-Informed Neural Networks (PINNs) for single-transmitter mapping to derive a new multi-transmitter Partial Differential Equation (PDE) formulation of the Received Signal Strength Indicator (RSSI). We then propose ReV eal-MT (Re-constructor and Visualizer of Spectrum Landscape for Multiple Transmitters), a novel PINN which integrates the multi-source PDE residual into a neural network loss function, enabling accurate spectrum landscape reconstruction from sparse RF sensor measurements. ReV eal-MT is validated using real-world measurements from the ARA wireless living lab across rural and suburban environments, and benchmarked against 3GPP and ITU-R channel models and a baseline PINN model for a single transmitter use-case. Results show that ReV eal-MT achieves substantial accuracy gains in multi-transmitter scenarios, e.g., achieving an RMSE of only 2.66 dB with as few as 45 samples over a 370-square-kilometer region, while maintaining low computational complexity. These findings demonstrate that ReV eal-MT significantly advances radio environment mapping under realistic multi-transmitter conditions, with strong potential for enabling fine-grained spectrum management and precise coexistence between PUs and SUs. I. INTRODUCTION Existing spectrum sharing frameworks, such as those implemented in the TV White Space (TVWS) database and Citizens Broadband Radio Service (CBRS) Spectrum Access System (SAS), rely heavily on traditional statistical models. However, such models struggle to accurately capture the real-world spectrum occupancy and do not generalize well enough to capture shadowing and fading caused by different kinds of terrain and environmental conditions, leading to conservative approaches that over-protect the primary users (PUs) and cause discrepancies in channel availability for spectrum re-use [1]- [3].
In-Context Representation Hijacking
Yona, Itay, Sarid, Amir, Karasik, Michael, Gandelsman, Yossi
We introduce $\textbf{Doublespeak}$, a simple in-context representation hijacking attack against large language models (LLMs). The attack works by systematically replacing a harmful keyword (e.g., bomb) with a benign token (e.g., carrot) across multiple in-context examples, provided a prefix to a harmful request. We demonstrate that this substitution leads to the internal representation of the benign token converging toward that of the harmful one, effectively embedding the harmful semantics under a euphemism. As a result, superficially innocuous prompts (e.g., "How to build a carrot?") are internally interpreted as disallowed instructions (e.g., "How to build a bomb?"), thereby bypassing the model's safety alignment. We use interpretability tools to show that this semantic overwrite emerges layer by layer, with benign meanings in early layers converging into harmful semantics in later ones. Doublespeak is optimization-free, broadly transferable across model families, and achieves strong success rates on closed-source and open-source systems, reaching 74% ASR on Llama-3.3-70B-Instruct with a single-sentence context override. Our findings highlight a new attack surface in the latent space of LLMs, revealing that current alignment strategies are insufficient and should instead operate at the representation level.
Modelling the Doughnut of social and planetary boundaries with frugal machine learning
Vrizzi, Stefano, O'Neill, Daniel W.
The 'Doughnut' of social and planetary boundaries has emerged as a popular framework for assessing environmental and social sustainability. Here, we provide a proof-of-concept analysis that shows how machine learning (ML) methods can be applied to a simple macroeconomic model of the Doughnut. First, we show how ML methods can be used to find policy parameters that are consistent with 'living within the Doughnut'. Second, we show how a reinforcement learning agent can identify the optimal trajectory towards desired policies in the parameter space. The approaches we test, which include a Random Forest Classifier and $Q$-learning, are frugal ML methods that are able to find policy parameter combinations that achieve both environmental and social sustainability. The next step is the application of these methods to a more complex ecological macroeconomic model.
AI summaries in online search influence users' attitudes
Xu, Yiwei, Dash, Saloni, Kang, Sungha, Liao, Wang, Spiro, Emma S.
This study examined how AI-generated summaries, which have become visually prominent in online search results, affect how users think about different issues. In a preregistered randomized controlled experiment, participants (N = 2,004) viewed mock search result pages varying in the presence (vs. absence), placement (top vs. middle), and stance (benefit-framed vs. harm-framed) of AI-generated summaries across four publicly debated topics. Compared to a no-summary control group, participants exposed to AI-generated summaries reported issue attitudes, behavioral intentions, and policy support that aligned more closely with the AI summary stance. The summaries placed at the top of the page produced stronger shifts in users' issue attitudes (but not behavioral intentions or policy support) than those placed at the middle of the page. We also observed moderating effects from issue familiarity and general trust toward AI. In addition, users perceived the AI summaries more useful when it emphasized health harms versus benefits. These findings suggest that AI-generated search summaries can significantly shape public perceptions, raising important implications for the design and regulation of AI-integrated information ecosystems.
Computational Measurement of Political Positions: A Review of Text-Based Ideal Point Estimation Algorithms
Parschan, Patrick, Jakob, Charlott
This article presents the first systematic review of unsupervised and semi-supervised computational text-based ideal point estimation (CT-IPE) algorithms, methods designed to infer latent political positions from textual data. These algorithms are widely used in political science, communication, computational social science, and computer science to estimate ideological preferences from parliamentary speeches, party manifestos, and social media. Over the past two decades, their development has closely followed broader NLP trends -- beginning with word-frequency models and most recently turning to large language models (LLMs). While this trajectory has greatly expanded the methodological toolkit, it has also produced a fragmented field that lacks systematic comparison and clear guidance for applied use. To address this gap, we identified 25 CT-IPE algorithms through a systematic literature review and conducted a manual content analysis of their modeling assumptions and development contexts. To compare them meaningfully, we introduce a conceptual framework that distinguishes how algorithms generate, capture, and aggregate textual variance. On this basis, we identify four methodological families -- word-frequency, topic modeling, word embedding, and LLM-based approaches -- and critically assess their assumptions, interpretability, scalability, and limitations. Our review offers three contributions. First, it provides a structured synthesis of two decades of algorithm development, clarifying how diverse methods relate to one another. Second, it translates these insights into practical guidance for applied researchers, highlighting trade-offs in transparency, technical requirements, and validation strategies that shape algorithm choice. Third, it emphasizes that differences in estimation outcomes across algorithms are themselves informative, underscoring the need for systematic benchmarking.
Ground-Truth Subgraphs for Better Training and Evaluation of Knowledge Graph Augmented LLMs
Cattaneo, Alberto, Luschi, Carlo, Justus, Daniel
Retrieval of information from graph-structured knowledge bases represents a promising direction for improving the factuality of LLMs. While various solutions have been proposed, a comparison of methods is difficult due to the lack of challenging QA datasets with ground-truth targets for graph retrieval. We present SynthKGQA, an LLM-powered framework for generating high-quality Knowledge Graph Question Answering datasets from any Knowledge Graph, providing the full set of ground-truth facts in the KG to reason over questions. We show how, in addition to enabling more informative benchmarking of KG retrievers, the data produced with SynthKGQA also allows us to train better models.We apply SynthKGQA to Wikidata to generate GTSQA, a new dataset designed to test zero-shot generalization abilities of KG retrievers with respect to unseen graph structures and relation types, and benchmark popular solutions for KG-augmented LLMs on it.