AITopics

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
(3 more...)

Neural Information Processing SystemsOct-9-2025, 01:13:14 GMT

Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions

A notable example that motivates our work is the problem of minimizing costs (or maximizing rewards) in a single-trajectory Markov Decision Process (MDP).

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.46)

Industry: Transportation > Air (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Neural Information Processing SystemsOct-9-2025, 01:05:42 GMT

Energy Discrepancies: A Score-Independent Loss for Energy-Based Models

We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo.

artificial intelligence, energy discrepancy, machine learning, (15 more...)

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
Asia > Middle East > Jordan (0.04)

Industry:

Health & Medicine (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
(3 more...)

Neural Information Processing SystemsOct-9-2025, 00:34:34 GMT

87ee1bbac4635e7c948f3eea83c1f262-Paper-Conference.pdf

history, machine learning, reinforcement learning, (17 more...)

Country:

North America > Canada > Alberta (0.14)
North America > United States > Texas (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Game Theory (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Neural Information Processing SystemsOct-9-2025, 00:13:55 GMT

Interactive Visual Reasoning under Uncertainty

One of the fundamental cognitive abilities of humans is to quickly resolve uncertainty by generating hypotheses and testing them via active trials.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law (0.67)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)
Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(3 more...)

Laplante, Félix, Ambroise, Christophe, Kuhn, Estelle, Lemler, Sarah

jmstate, a Flexible Python Package for Multi-State Joint Modeling

Classical joint modeling approaches often rely on competing risks or recurrent event formulations to account for complex real-world processes involving evolving longitudinal markers and discrete event occurrences. However, these frameworks typically capture only limited aspects of the underlying event dynamics. Multi-state joint models offer a more flexible alternative by representing full event histories through a network of possible transitions, including recurrent cycles and terminal absorptions, all potentially influenced by longitudinal covariates. In this paper, we propose a general framework that unifies longitudinal biomarker modeling with multi-state event processes defined on arbitrary directed graphs. Our approach accommodates both Markovian and semi-Markovian transition structures, and extends classical joint models by coupling nonlinear mixed-effects longitudinal submodels with multi-state survival processes via shared latent structures. We derive the full likelihood and develop scalable inference procedures based on stochastic gradient descent. Furthermore, we introduce a dynamic prediction framework, enabling individualized risk assessments along complex state-transition trajectories. To facilitate reproducibility and dissemination, we provide an open-source Python library \texttt{jmstate} implementing the proposed methodology, available on \href{https://pypi.org/project/jmstate/}{PyPI}. Simulation experiments and a biomedical case study demonstrate the flexibility and performance of the framework in representing complex longitudinal and multi-state event dynamics. The full Python notebooks used to reproduce the experiments as well as the source code of this paper are available on \href{https://gitlab.com/felixlaplante0/jmstate-paper/}{GitLab}.

torch, trajectory, transition, (17 more...)

2510.07128

Country:

Europe > Netherlands > South Holland > Rotterdam (0.04)
Europe > France (0.04)
North America > United States (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Zhang, Haochen, Zheng, Zhong, Xue, Lingzhou

Q-Learning with Fine-Grained Gap-Dependent Regret

We study fine-grained gap-dependent regret bounds for model-free reinforcement learning in episodic tabular Markov Decision Processes. Existing model-free algorithms achieve minimax worst-case regret, but their gap-dependent bounds remain coarse and fail to fully capture the structure of suboptimality gaps. We address this limitation by establishing fine-grained gap-dependent regret bounds for both UCB-based and non-UCB-based algorithms. In the UCB-based setting, we develop a novel analytical framework that explicitly separates the analysis of optimal and suboptimal state-action pairs, yielding the first fine-grained regret upper bound for UCB-Hoeffding (Jin et al., 2018). To highlight the generality of this framework, we introduce ULCB-Hoeffding, a new UCB-based algorithm inspired by AMB (Xu et al.,2021) but with a simplified structure, which enjoys fine-grained regret guarantees and empirically outperforms AMB. In the non-UCB-based setting, we revisit the only known algorithm AMB, and identify two key issues in its algorithm design and analysis: improper truncation in the $Q$-updates and violation of the martingale difference condition in its concentration argument. We propose a refined version of AMB that addresses these issues, establishing the first rigorous fine-grained gap-dependent regret for a non-UCB-based method, with experiments demonstrating improved performance over AMB.

algorithm, equation, neural information processing system, (9 more...)

2510.06647

Country:

North America > United States > Pennsylvania (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Anjarlekar, Ameya, Etesami, Rasoul, Srikant, R

Scalable Policy-Based RL Algorithms for POMDPs

The continuous nature of belief states in POMDPs presents significant computational challenges in learning the optimal policy. In this paper, we consider an approach that solves a Partially Observable Reinforcement Learning (PORL) problem by approximating the corresponding POMDP model into a finite-state Markov Decision Process (MDP) (called Superstate MDP). We first derive theoretical guarantees that improve upon prior work that relate the optimal value function of the transformed Superstate MDP to the optimal value function of the original POMDP. Next, we propose a policy-based learning approach with linear function approximation to learn the optimal policy for the Superstate MDP. Consequently, our approach shows that a POMDP can be approximately solved using TD-learning followed by Policy Optimization by treating it as an MDP, where the MDP state corresponds to a finite history. We show that the approximation error decreases exponentially with the length of this history. To the best of our knowledge, our finite-time bounds are the first to explicitly quantify the error introduced when applying standard TD learning to a setting where the true dynamics are not Markovian.

algorithm, pomdp, superstate mdp, (15 more...)

2510.0654

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois (0.04)
(5 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.46)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Nearly Instance-Optimal Parameter Recovery from Many Trajectories via Hellinger Localization

Shekhtman, Eliot, Zhou, Yichen, Ziemann, Ingvar, Matni, Nikolai, Tu, Stephen

Learning from temporally-correlated data is a core facet of modern machine learning. Yet our understanding of sequential learning remains incomplete, particularly in the multi-trajectory setting where data consists of many independent realizations of a time-indexed stochastic process. This important regime both reflects modern training pipelines such as for large foundation models, and offers the potential for learning without the typical mixing assumptions made in the single-trajectory case. However, instance-optimal bounds are known only for least-squares regression with dependent covariates; for more general models or loss functions, the only broadly applicable guarantees result from a reduction to either i.i.d. learning, with effective sample size scaling only in the number of trajectories, or an existing single-trajectory result when each individual trajectory mixes, with effective sample size scaling as the full data budget deflated by the mixing-time. In this work, we significantly broaden the scope of instance-optimal rates in multi-trajectory settings via the Hellinger localization framework, a general approach for maximum likelihood estimation. Our method proceeds by first controlling the squared Hellinger distance at the path-measure level via a reduction to i.i.d. learning, followed by localization as a quadratic form in parameter space weighted by the trajectory Fisher information. This yields instance-optimal bounds that scale with the full data budget under a broad set of conditions. We instantiate our framework across four diverse case studies: a simple mixture of Markov chains, dependent linear regression under non-Gaussian noise, generalized linear models with non-monotonic activations, and linear-attention sequence models. In all cases, our bounds nearly match the instance-optimal rates from asymptotic normality, substantially improving over standard reductions.

assumption, diag, inequality, (15 more...)

2510.06434

Country:

North America > United States > California (0.13)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
(4 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Choi, Jason J., Aloor, Jasmine Jerry, Li, Jingqi, Mendoza, Maria G., Balakrishnan, Hamsa, Tomlin, Claire J.

Resolving Conflicting Constraints in Multi-Agent Reinforcement Learning with Layered Safety

arXiv.org Artificial IntelligenceOct-9-2025

Preventing collisions in multi-robot navigation is crucial for deployment. This requirement hinders the use of learning-based approaches, such as multi-agent reinforcement learning (MARL), on their own due to their lack of safety guarantees. Traditional control methods, such as reachability and control barrier functions, can provide rigorous safety guarantees when interactions are limited only to a small number of robots. However, conflicts between the constraints faced by different agents pose a challenge to safe multi-agent coordination. To overcome this challenge, we propose a method that integrates multiple layers of safety by combining MARL with safety filters. First, MARL is used to learn strategies that minimize multiple agent interactions, where multiple indicates more than two. Particularly, we focus on interactions likely to result in conflicting constraints within the engagement distance. Next, for agents that enter the engagement distance, we prioritize pairs requiring the most urgent corrective actions. Finally, a dedicated safety filter provides tactical corrective actions to resolve these conflicts. Crucially, the design decisions for all layers of this framework are grounded in reachability analysis and a control barrier-value function-based filtering mechanism. We validate our Layered Safe MARL framework in 1) hardware experiments using Crazyflie drones and 2) high-density advanced aerial mobility (AAM) operation scenarios, where agents navigate to designated waypoints while avoiding collisions. The results show that our method significantly reduces conflict while maintaining safety without sacrificing much efficiency (i.e., shorter travel time and distance) compared to baselines that do not incorporate layered safety. The project website is available at https://dinamo-mit.github.io/Layered-Safe-MARL/

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.15607/RSS.2025.XXI.094

2505.02293

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Air (1.00)
Transportation > Infrastructure & Services (0.95)
Transportation > Passenger (0.94)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)