AITopics

2509.1783

Genre: Research Report (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceSep-24-2025

Symbolic Feedforward Networks for Probabilistic Finite Automata: Exact Simulation and Learnability

Dhayalkar, Sahil Rajesh

We present a formal and constructive theory showing that probabilistic finite automata can be exactly simulated using symbolic feedforward neural networks. Our architecture represents state distributions as vectors and transitions as stochastic matrices, enabling probabilistic state propagation via matrix-vector products. This yields a parallel, interpretable, and differentiable simulation of probabilistic finite automata dynamics using soft updates without recurrence. We formally characterize probabilistic subset construction, finite ε-closure, and exact simulation via layered symbolic computation, and prove equivalence between probabilistic finite automata and specific classes of neural networks. We further show that these symbolic simulators are not only expressive but learnable: trained with standard gradient descent-based optimization on labeled sequence data, they recover the exact behavior of ground-truth probabilistic finite automata. This learnability, formalized in Proposition 5.1, is the crux of this work. Our results unify probabilistic automata theory with classical neural architectures under a rigorous algebraic framework, bridging the gap between symbolic computation and deep learning.

artificial intelligence, automata, machine learning, (15 more...)

2509.10034

Country: North America (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

arXiv.org Machine LearningSep-23-2025

Near-Optimal Sample Complexity Bounds for Constrained Average-Reward MDPs

Wei, Yukuan, Li, Xudong, Yang, Lin F.

Recent advances have significantly improved our understanding of the sample complexity of learning in average-reward Markov decision processes (AMDPs) under the generative model. However, much less is known about the constrained average-reward MDP (CAMDP), where policies must satisfy long-run average constraints. In this work, we address this gap by studying the sample complexity of learning an $ε$-optimal policy in CAMDPs under a generative model. We propose a model-based algorithm that operates under two settings: (i) relaxed feasibility, which allows small constraint violations, and (ii) strict feasibility, where the output policy satisfies the constraint. We show that our algorithm achieves sample complexities of $\tilde{O}\left(\frac{S A (B+H)}{ ε^2}\right)$ and $\tilde{O} \left(\frac{S A (B+H)}{ε^2 ζ^2} \right)$ under the relaxed and strict feasibility settings, respectively. Here, $ζ$ is the Slater constant indicating the size of the feasible region, $H$ is the span bound of the bias function, and $B$ is the transient time bound. Moreover, a matching lower bound of $\tildeΩ\left(\frac{S A (B+H)}{ ε^2ζ^2}\right)$ for the strict feasibility case is established, thus providing the first minimax-optimal bounds for CAMDPs. Our results close the theoretical gap in understanding the complexity of constrained average-reward MDPs.

algorithm, conference paper, sample complexity, (15 more...)

arXiv.org Machine Learning

2509.16586

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
(4 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
(2 more...)

Test-Time Learning and Inference-Time Deliberation for Efficiency-First Offline Reinforcement Learning in Care Coordination and Population Health Management

Basu, Sanjay, Patel, Sadiq Y., Sheth, Parth, Muralidharan, Bhairavi, Elamaran, Namrata, Kinra, Aakriti, Batniji, Rajaie

Care coordination and population health management (PHM) are core functions of health systems and community partners, impacting large numbers of Americans enrolled in Medicaid and other safety-net programs. These efforts aim to proactively identify needs, prioritize outreach, and escalate appropriately, all within finite staffing and budget constraints. While outreach modalities (text, phone, video, in-person) carry low clinical risk, their time and opportunity costs vary significantly, making efficiency a primary design goal. In practice, the central operational question is when to deploy expensive in-person outreach versus efficient virtual modalities to maximize value and equity under capacity constraints. These decisions must be made in strictly offline settings, where policies are learned from logged data without exploration at deployment [1]. Classical approaches include constrained Markov decision processes [2], risk-sensitive objectives, and conservative offline RL (e.g., CQL/IQL) [3, 4]. Conformal prediction can provide calibrated error control [5, 6]; ensembles provide practical uncertainty quantification [7]; and decision-time computation is common in control [8]. In health services research and health economic evaluation, cost-effectiveness and cost-benefit analyses (CEA/CBA) guide program-level choices [9-12], but they are not designed for per-patient, per-decision recommendations that adapt to granular state features and logged behavior constraints. 1

deliberation, machine learning, reinforcement learning, (12 more...)

2509.16291

Country: North America > United States > California > San Francisco County > San Francisco (0.29)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Government Relations & Public Policy (1.00)
Health & Medicine > Health Care Providers & Services > Reimbursement (0.93)
Government > Regional Government > North America Government > United States Government (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

High-Precision and High-Efficiency Trajectory Tracking for Excavators Based on Closed-Loop Dynamics

Zou, Ziqing, Wang, Cong, Hu, Yue, Liu, Xiao, Xu, Bowen, Xiong, Rong, Fan, Changjie, Chen, Yingfeng, Wang, Yue

Abstract-- The complex nonlinear dynamics of hydraulic excavators, such as time delays and control coupling, pose significant challenges to achieving high-precision trajectory tracking. Traditional control methods often fall short in such applications due to their inability to effectively handle these nonlinearities, while commonly used learning-based methods require extensive interactions with the environment, leading to inefficiency. T o address these issues, we introduce EfficientTrack, a trajectory tracking method that integrates model-based learning to manage nonlinear dynamics and leverages closed-loop dynamics to improve learning efficiency, ultimately minimizing tracking errors. Comparative experiments in simulation demonstrate that our method outperforms existing learning-based approaches, achieving the highest tracking precision and smoothness with the fewest interactions. Real-world experiments further show that our method remains effective under load conditions and possesses the ability for continual learning, highlighting its practical applicability. Excavators are primarily used in earthworks, mining, and construction projects, playing a vital role in tasks such as digging, loading, trenching, and leveling [1], [2], [3].

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2509.17387

Genre: Research Report (0.82)

Industry:

Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.65)
Machinery > Construction Machinery & Heavy Trucks (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
(3 more...)

Grislain, Clemence, Rahimi, Hamed, Sigaud, Olivier, Chetouani, Mohamed

I-FailSense: Towards General Robotic Failure Detection with Vision-Language Models

Language-conditioned robotic manipulation in open-world settings requires not only accurate task execution but also the ability to detect failures for robust deployment in real-world environments. Although recent advances in vision-language models (VLMs) have significantly improved the spatial reasoning and task-planning capabilities of robots, they remain limited in their ability to recognize their own failures. In particular, a critical yet underexplored challenge lies in detecting semantic misalignment errors, where the robot executes a task that is semantically meaningful but inconsistent with the given instruction. To address this, we propose a method for building datasets targeting Semantic Misalignment Failures detection, from existing language-conditioned manipulation datasets. We also present I-FailSense, an open-source VLM framework with grounded arbitration designed specifically for failure detection. Our approach relies on post-training a base VLM, followed by training lightweight classification heads, called FS blocks, attached to different internal layers of the VLM and whose predictions are aggregated using an ensembling mechanism. Experiments show that I-FailSense outperforms state-of-the-art VLMs, both comparable in size and larger, in detecting semantic misalignment errors. Notably, despite being trained only on semantic misalignment detection, I-FailSense generalizes to broader robotic failure categories and effectively transfers to other simulation environments and real-world with zero-shot or minimal post-training. The datasets and models are publicly released on HuggingFace (Webpage: https://clemgris.github.io/I-FailSense/).

large language model, machine learning, natural language, (18 more...)

2509.16072

Country: Europe (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

HyperTASR: Hypernetwork-Driven Task-Aware Scene Representations for Robust Manipulation

Sun, Li, Wu, Jiefeng, Chen, Feng, Liu, Ruizhe, Yang, Yanchao

Effective policy learning for robotic manipulation requires scene representations that selectively capture task-relevant environmental features. Current approaches typically employ task-agnostic representation extraction, failing to emulate the dynamic perceptual adaptation observed in human cognition. We present HyperTASR, a hypernetwork-driven framework that modulates scene representations based on both task objectives and the execution phase. Our architecture dynamically generates representation transformation parameters conditioned on task specifications and progression state, enabling representations to evolve contextually throughout task execution. This approach maintains architectural compatibility with existing policy learning frameworks while fundamentally reconfiguring how visual features are processed. Unlike methods that simply concatenate or fuse task embeddings with task-agnostic representations, HyperTASR establishes computational separation between task-contextual and state-dependent processing paths, enhancing learning efficiency and representational quality. Comprehensive evaluations in both simulation and real-world environments demonstrate substantial performance improvements across different representation paradigms. Through ablation studies and attention visualization, we confirm that our approach selectively prioritizes task-relevant scene information, closely mirroring human adaptive perception during manipulation tasks. The project website is at https://lisunphil.github.io/HyperTASR_projectpage/.

artificial intelligence, machine learning, natural language, (15 more...)

2508.18802

Genre:

Research Report (1.00)
Overview (0.67)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Synthetic POMDPs to Challenge Memory-Augmented RL: Memory Demand Structure Modeling

Wang, Yongyi, Li, Lingfeng, Chen, Bozhou, Li, Ang, Liu, Hanyu, Zheng, Qirui, Yang, Xionghui, Li, Wenxin

Recent research has developed benchmarks for memory-augmented reinforcement learning (RL) algorithms, providing Partially Observable Markov Decision Process (POMDP) environments where agents depend on past observations to make decisions. While many benchmarks incorporate sufficiently complex real-world problems, they lack controllabil-ity over the degree of challenges posed to memory models. In contrast, synthetic environments enable fine-grained manipulation of dynamics, making them critical for detailed and rigorous evaluation of memory-augmented RL. Our study focuses on POMDP synthesis with three key contributions: 1. A theoretical framework for analyzing POMDPs, grounded in Memory Demand Structure (MDS), transition invariance, and related concepts; 2. A methodology leveraging linear process dynamics, state aggregation, and reward redistribution to construct customized POMDPs with predefined properties; 3. Empirically validated series of POMDP environments with increasing difficulty levels, designed based on our theoretical insights. Our work clarifies the challenges of memory-augmented RL in solving POMDPs, provides guidelines for analyzing and designing POMDP environments, and offers empirical support for selecting memory models in RL tasks.

artificial intelligence, machine learning, trajectory, (17 more...)

2508.04282

Country: Asia (0.28)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Reinforcement Learning for Decision-Level Interception Prioritization in Drone Swarm Defense

Palmas, Alessandro

The growing threat of low-cost kamikaze drone swarms poses a critical challenge to modern defense systems demanding rapid and strategic decision-making to prioritize interceptions across multiple effectors and high-value target zones. In this work, we present a case study demonstrating the practical advantages of reinforcement learning in addressing this challenge. We introduce a high-fidelity simulation environment that captures realistic operational constraints, within which a decision-level reinforcement learning agent learns to coordinate multiple effectors for optimal interception prioritization. Operating in a discrete action space, the agent selects which drone to engage per effector based on observed state features such as positions, classes, and effector status. We evaluate the learned policy against a handcrafted rule-based baseline across hundreds of simulated attack scenarios. The reinforcement learning based policy consistently achieves lower average damage and higher defensive efficiency in protecting critical zones. This case study highlights the potential of reinforcement learning as a strategic layer within defense architectures, enhancing resilience without displacing existing control systems. All code and simulation assets are publicly released for full reproducibility, and a video demonstration illustrates the policy's qualitative behavior.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2508.00641

Country: Asia > Singapore (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Estienne, Lautaro, Zenou, Gabriel Ben, Naderi, Nona, Cheung, Jackie, Piantanida, Pablo

Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog

As AI systems take on collaborative roles, they must reason about shared goals and beliefs-not just generate fluent language. The Rational Speech Act (RSA) framework offers a principled approach to pragmatic reasoning, but existing extensions face challenges in scaling to multi-turn, collaborative scenarios. In this paper, we introduce Collaborative Rational Speech Act (CRSA), an information-theoretic (IT) extension of RSA that models multi-turn dialog by optimizing a gain function adapted from rate-distortion theory. This gain is an extension of the gain model that is maximized in the original RSA model but takes into account the scenario in which both agents in a conversation have private information and produce utterances conditioned on the dialog. We demonstrate the effectiveness of CRSA on referential games and template-based doctor-patient dialogs in the medical domain. Empirical results show that CRSA yields more consistent, interpretable, and collaborative behavior than existing baselines-paving the way for more pragmatic and socially aware language agents.

computational linguistic, large language model, machine learning, (16 more...)

2507.14063

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)