accessed
Generative Stochastic Optimal Transport: Guided Harmonic Path-Integral Diffusion
We introduce Guided Harmonic Path-Integral Diffusion (GH-PID), a linearly-solvable framework for guided Stochastic Optimal Transport (SOT) with a hard terminal distribution and soft, application-driven path costs. A low-dimensional guidance protocol shapes the trajectory ensemble while preserving analytic structure: the forward and backward Kolmogorov equations remain linear, the optimal score admits an explicit Green-function ratio, and Gaussian-Mixture Model (GMM) terminal laws yield closed-form expressions. This enables stable sampling and differentiable protocol learning under exact terminal matching. We develop guidance-centric diagnostics -- path cost, centerline adherence, variance flow, and drift effort -- that make GH-PID an interpretable variational ansatz for empirical SOT. Three navigation scenarios illustrated in 2D: (i) Case A: hand-crafted protocols revealing how geometry and stiffness shape lag, curvature effects, and mode evolution; (ii) Case B: single-task protocol learning, where a PWC centerline is optimized to minimize integrated cost; (iii) Case C: multi-expert fusion, in which a commander reconciles competing expert/teacher trajectories and terminal beliefs through an exact product-of-experts law and learns a consensus protocol. Across all settings, GH-PID generates geometry-aware, trust-aware trajectories that satisfy the prescribed terminal distribution while systematically reducing integrated cost.
- North America > United States > Arizona (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Kansas > Rawlins County (0.04)
- Europe > France (0.04)
Automatic Fact-checking in English and Telugu
Chikkala, Ravi Kiran, Anikina, Tatiana, Skachkova, Natalia, Vykopal, Ivan, Agerri, Rodrigo, van Genabith, Josef
False information poses a significant global challenge, and manually verifying claims is a time-consuming and resource-intensive process. In this research paper, we experiment with different approaches to investigate the effectiveness of large language models (LLMs) in classifying factual claims by their veracity and generating justifications in English and Telugu. The key contributions of this work include the creation of a bilingual English-Telugu dataset and the benchmarking of different veracity classification approaches based on LLMs.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Germany > Saarland (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (19 more...)
- Government (1.00)
- Media > News (0.47)
Financial Instruction Following Evaluation (FIFE)
Matlin, Glenn, Siddharth, null, JM, Anirudh, Shukla, Aditya, Hassan, Yahya, Chava, Sudheer
Language Models (LMs) struggle with complex, interdependent instructions, particularly in high-stakes domains like finance where precision is critical. We introduce FIFE, a novel, high-difficulty benchmark designed to assess LM instruction-following capabilities for financial analysis tasks. FIFE comprises 88 human-authored prompts and employs a verification system with chainable, verifiable constraints for fine-grained reward signals. We evaluate 53 models (proprietary, open-weight, open-source) in a zero-shot setting. Our key findings reveal a clear performance hierarchy: the top open-weight model (76.1 strict / 79.5 loose) surpasses the leading proprietary system (65.9 strict / 70.5 loose), while the best open-source models lag significantly (45.5 strict / 48.9 loose). However, even top-performing models struggle with FIFE's complex requirements, failing to achieve perfect compliance. We release our dataset and code as an open-source resource to promote research in Reinforcement Learning for the financial domain.
- Europe > Austria > Vienna (0.14)
- North America > United States (0.14)
High-Resolution Water Sampling via a Solar-Powered Autonomous Surface Vehicle
Mamani, Misael, Fernandez, Mariel, Luna, Grace, Limachi, Steffani, Apaza, Leonel, Montes-Dávalos, Carolina, Herrera, Marcelo, Salcedo, Edwin
Accurate water quality assessment requires spatially resolved sampling, yet most unmanned surface vehicles (USVs) can collect only a limited number of samples or rely on single-point sensors with poor representativeness. This work presents a solar-powered, fully autonomous USV featuring a novel syringe-based sampling architecture capable of acquiring 72 discrete, contamination-minimized water samples per mission. The vehicle incorporates a ROS 2 autonomy stack with GPS-RTK navigation, LiDAR and stereo-vision obstacle detection, Nav2-based mission planning, and long-range LoRa supervision, enabling dependable execution of sampling routes in unstructured environments. The platform integrates a behavior-tree autonomy architecture adapted from Nav2, enabling mission-level reasoning and perception-aware navigation. A modular 6x12 sampling system, controlled by distributed micro-ROS nodes, provides deterministic actuation, fault isolation, and rapid module replacement, achieving spatial coverage beyond previously reported USV-based samplers. Field trials in Achocalla Lagoon (La Paz, Bolivia) demonstrated 87% waypoint accuracy, stable autonomous navigation, and accurate physicochemical measurements (temperature, pH, conductivity, total dissolved solids) comparable to manually collected references. These results demonstrate that the platform enables reliable high-resolution sampling and autonomous mission execution, providing a scalable solution for aquatic monitoring in remote environments.
- North America > Canada (0.28)
- South America > Bolivia > La Paz Department > Pedro Domingo Murillo Province > La Paz (0.24)
- Asia > Malaysia (0.04)
- (5 more...)
- Water & Waste Management > Water Management > Water Supplies & Services (1.00)
- Government (1.00)
- Energy > Renewable > Solar (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Soft Inductive Bias Approach via Explicit Reasoning Perspectives in Inappropriate Utterance Detection Using Large Language Models
Kim, Ju-Young, Park, Ji-Hong, Lee, Se-Yeon, Park, Sujin, Kim, Gun-Woo
Recent incidents in certain online games and communities, where anonymity is guaranteed, show that unchecked inappropriate remarks frequently escalate into verbal abuse and even criminal behavior, raising significant social concerns. Consequently, there is a growing need for research on techniques that can detect inappropriate utterances within conversational texts to help build a safer communication environment. Although large-scale language models trained on Korean corpora and chain-of-thought reasoning have recently gained attention, research applying these approaches to inappropriate utterance detection remains limited. In this study, we propose a soft inductive bias approach that explicitly defines reasoning perspectives to guide the inference process, thereby promoting rational decision-making and preventing errors that may arise during reasoning. We fine-tune a Korean large language model using the proposed method and conduct both quantitative performance comparisons and qualitative evaluations across different training strategies. Experimental results show that the Kanana-1.5 model achieves an average accuracy of 87.0046, improving by approximately 3.89 percent over standard supervised learning. These findings indicate that the proposed method goes beyond simple knowledge imitation by large language models and enables more precise and consistent judgments through constrained reasoning perspectives, demonstrating its effectiveness for inappropriate utterance detection.
How Not to Detect Prompt Injections with an LLM
Choudhary, Sarthak, Anshumaan, Divyam, Palumbo, Nils, Jha, Somesh
LLM-integrated applications and agents are vulnerable to prompt injection attacks, where adversaries embed malicious instructions within seemingly benign input data to manipulate the LLM's intended behavior. Recent defenses based on known-answer detection (KAD) scheme have reported near-perfect performance by observing an LLM's output to classify input data as clean or contaminated. KAD attempts to repurpose the very susceptibility to prompt injection as a defensive mechanism. We formally characterize the KAD scheme and uncover a structural vulnerability that invalidates its core security premise. To exploit this fundamental vulnerability, we methodically design an adaptive attack, DataFlip. It consistently evades KAD defenses, achieving detection rates as low as $0\%$ while reliably inducing malicious behavior with a success rate of $91\%$, all without requiring white-box access to the LLM or any optimization procedures.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- Asia > Taiwan > Taiwan Province > Taipei (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
High-Performance Variance-Covariance Matrix Construction Using an Uncentered Gram Formulation
Reichel (2025) defined the bariance as a pairwise-difference measure that can be rewritten in linear time using only scalar sums. We extend this idea to the covariance matrix by showing that the standard matrix expression involving the uncentered Gram matrix and a correction term is algebraically identical to the pairwise-difference definition while avoiding explicit centering. The computation then reduces to one outer product of dimension p-by-p and a single subtraction. Benchmarks in Python show clear runtime gains, especially when BLAS optimizations are absent. Optionally faster Gram-matrix routines such as RXTX (Rybin et al., 2025) further reduce overall cost.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Austria > Upper Austria > Linz (0.04)
- Asia > Middle East > Israel (0.04)
Towards a Generalisable Cyber Defence Agent for Real-World Computer Networks
Recent advances in deep reinforcement learning for autonomous cyber defence have resulted in agents that can successfully defend simulated computer networks against cyber-attacks. However, many of these agents would need retraining to defend networks with differing topology or size, making them poorly suited to real-world networks where topology and size can vary over time. In this research we introduce a novel set of Topological Extensions for Reinforcement Learning Agents (TERLA) that provide generalisability for the defence of networks with differing topology and size, without the need for retraining. Our approach involves the use of heterogeneous graph neural network layers to produce a fixed-size latent embedding representing the observed network state. This representation learning stage is coupled with a reduced, fixed-size, semantically meaningful and interpretable action space. We apply TERLA to a standard deep reinforcement learning Proximal Policy Optimisation (PPO) agent model, and to reduce the sim-to-real gap, conduct our research using Cyber Autonomy Gym for Experimentation (CAGE) Challenge 4. This Cyber Operations Research Gym environment has many of the features of a real-world network, such as realistic Intrusion Detection System (IDS) events and multiple agents defending network segments of differing topology and size. TERLA agents retain the defensive performance of vanilla PPO agents whilst showing improved action efficiency. Generalisability has been demonstrated by showing that all TERLA agents have the same network-agnostic neural network architecture, and by deploying a single TERLA agent multiple times to defend network segments with differing topology and size, showing improved defensive performance and efficiency.
- North America > United States (0.28)
- North America > Canada (0.14)
- Europe > United Kingdom > England > Bristol (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (1.00)
Decoding the Black Box: Discerning AI Rhetorics About and Through Poetic Prompting
-- Prompt engineering has emerged as a useful way studying the algorithmic tendencies and biases of large language models (LLMs). Meanwhile c reatives and academics have leveraged LLMs to develop creative works and explore the boundaries of their writing capabilities through text - generation and code. This study suggests that creative text prompting, specifically "Poetry Prompt Patterns," may be a useful addition to the prompt engineer's toolbox, and outlines the process by which this approach may be taken. Then, the paper uses poetic prompts to assess three models' descriptions and evaluations of a renowned poet and test the consequences of models' willingness to adapt or rewrite original creative works for presumed audiences. Since the release of public - facing chat - style large language model (LLM) natural language generators (NLGs) like ChatGPT and Claude, public debate has acknowledged their great potential for creativity, as well as the ways in which they can be leveraged to make representations that don't reflect reality.
- North America > United States > Florida > Orange County > Orlando (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Michigan (0.04)
- (3 more...)
- Law (0.46)
- Transportation > Air (0.40)
Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement
Strassenburg, Nils, Glavic, Boris, Rabl, Tilmann
Businesses increasingly rely on large language models (LLMs) to automate simple repetitive tasks instead of developing custom machine learning models. LLMs require few, if any, training examples and can be utilized by users without expertise in model development. However, this comes at the cost of substantially higher resource and energy consumption compared to smaller models, which often achieve similar predictive performance for simple tasks. In this paper, we present our vision for just-in-time model replacement (JITR), where, upon identifying a recurring task in calls to an LLM, the model is replaced transparently with a cheaper alternative that performs well for this specific task. JITR retains the ease of use and low development effort of LLMs, while saving significant cost and energy. We discuss the main challenges in realizing our vision regarding the identification of recurring tasks and the creation of a custom model. Specifically, we argue that model search and transfer learning will play a crucial role in JITR to efficiently identify and fine-tune models for a recurring task. Using our JITR prototype Poodle, we achieve significant savings for exemplary tasks.
- Europe > Germany > Brandenburg > Potsdam (0.40)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)