AITopics

Distributionally robust reinforcement learning (DRRL) focuses on designing policies that achieve good performance under model uncertainties. In particular, we are interested in maximizing the worst-case long-term discounted reward, where the data for RL comes from a nominal model while the deployed environment can deviate from the nominal model within a prescribed uncertainty set. Existing convergence guarantees for robust temporal-difference (TD) learning for policy evaluation are limited to tabular MDPs or are dependent on restrictive discount-factor assumptions when function approximation is used. We present the first robust TD learning with linear function approximation, where robustness is measured with respect to the total-variation distance and Wasserstein-l distance uncertainty set. Additionally, our algorithm is both model-free and does not require generative access to the MDP. Our algorithm combines a two-time-scale stochastic-approximation update with an outer-loop target-network update. We establish an $\tilde{O}(1/ε^2)$ sample complexity to obtain an $ε$-accurate value estimate. Our results close a key gap between the empirical success of robust RL algorithms and the non-asymptotic guarantees enjoyed by their non-robust counterparts. The key ideas in the paper also extend in a relatively straightforward fashion to robust Q-learning with function approximation.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2510.01721

Country: North America > United States (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Han, Seong Woo, Vo, Daniel Duy, Brown, Brielin C.

Large-Scale Bayesian Causal Discovery with Interventional Data

Inferring the causal relationships among a set of variables in the form of a directed acyclic graph (DAG) is an important but notoriously challenging problem. Recently, advancements in high-throughput genomic perturbation screens have inspired development of methods that leverage interventional data to improve model identification. However, existing methods still suffer poor performance on large-scale tasks and fail to quantify uncertainty. Here, we propose Interventional Bayesian Causal Discovery (IBCD), an empirical Bayesian framework for causal discovery with interventional data. Our approach models the likelihood of the matrix of total causal effects, which can be approximated by a matrix normal distribution, rather than the full data matrix. We place a spike-and-slab horseshoe prior on the edges and separately learn data-driven weights for scale-free and Erdős-Rényi structures from observational data, treating each edge as a latent variable to enable uncertainty-aware inference. Through extensive simulation, we show that IBCD achieves superior structure recovery compared to existing baselines. We apply IBCD to CRISPR perturbation (Perturb-seq) data on 521 genes, demonstrating that edge posterior inclusion probabilities enable identification of robust graph structures.

artificial intelligence, intervention, machine learning, (15 more...)

2510.01562

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Sonawane, Akshay Bhagwan, Swamikannan, Lena D., Tamil, Lakshman

Robust Classification of Oral Cancer with Limited Training Data

Oral cancer ranks among the most prevalent cancers globally, with a particularly high mortality rate in regions lacking adequate healthcare access. Early diagnosis is crucial for reducing mortality; however, challenges persist due to limited oral health programs, inadequate infrastructure, and a shortage of healthcare practitioners. Conventional deep learning models, while promising, often rely on point estimates, leading to overconfidence and reduced reliability. Critically, these models require large datasets to mitigate overfitting and ensure generalizability, an unrealistic demand in settings with limited training data. To address these issues, we propose a hybrid model that combines a convolutional neural network (CNN) with Bayesian deep learning for oral cancer classification using small training sets. This approach employs variational inference to enhance reliability through uncertainty quantification. The model was trained on photographic color images captured by smartphones and evaluated on three distinct test datasets. The proposed method achieved 94% accuracy on a test dataset with a distribution similar to that of the training data, comparable to traditional CNN performance. Notably, for real-world photographic image data, despite limitations and variations differing from the training dataset, the proposed model demonstrated superior generalizability, achieving 88% accuracy on diverse datasets compared to 72.94% for traditional CNNs, even with a smaller dataset. Confidence analysis revealed that the model exhibits low uncertainty (high confidence) for correctly classified samples and high uncertainty (low confidence) for misclassified samples. These results underscore the effectiveness of Bayesian inference in data-scarce environments in enhancing early oral cancer diagnosis by improving model reliability and generalizability.

artificial intelligence, bayesian inference, machine learning, (14 more...)

2510.01547

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Head & Neck Cancer (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Modeling Others' Minds as Code

Jha, Kunal, Huang, Aydan Yuenan, Ye, Eric, Jaques, Natasha, Kleiman-Weiner, Max

Accurate prediction of human behavior is essential for robust and safe human-AI collaboration. However, existing approaches for modeling people are often data-hungry and brittle because they either make unrealistic assumptions about rationality or are too computationally demanding to adapt rapidly. Our key insight is that many everyday social interactions may follow predictable patterns; efficient "scripts" that minimize cognitive load for actors and observers, e.g., "wait for the green light, then go." We propose modeling these routines as behavioral programs instantiated in computer code rather than policies conditioned on beliefs and desires. We introduce ROTE, a novel algorithm that leverages both large language models (LLMs) for synthesizing a hypothesis space of behavioral programs, and probabilistic inference for reasoning about uncertainty over that space. We test ROTE in a suite of gridworld tasks and a large-scale embodied household simulator. ROTE predicts human and AI behaviors from sparse observations, outperforming competitive baselines -- including behavior cloning and LLM-based methods -- by as much as 50% in terms of in-sample accuracy and out-of-sample generalization. By treating action understanding as a program synthesis problem, ROTE opens a path for AI systems to efficiently and effectively predict human behavior in the real-world.

large language model, machine learning, reinforcement learning, (19 more...)

2510.01272

Country: North America > United States > Massachusetts (0.28)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment > Games (0.45)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(3 more...)

Neural Information Processing SystemsOct-2-2025, 23:53:11 GMT

On the equivalence between graph isomorphism testing and function approximation with GNNs

Zhengdao Chen, Soledad Villar, Lei Chen, Joan Bruna

In light of this, there has been increasing interest in studying their representation power.

approximation, graph, neural network, (13 more...)

Country: North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.41)

Neural Information Processing SystemsOct-2-2025, 23:42:22 GMT

Asynchronous Anytime Sequential Monte Carlo

Brooks Paige, Frank Wood, Arnaud Doucet, Yee Whye Teh

Neural Information Processing Systems http://nips.cc/

asynchronous anytime sequential monte carlo

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.40)

Neural Information Processing SystemsOct-2-2025, 23:32:10 GMT

Meta Learning with Relational Information for Short Sequences

Yujia Xie, Haoming Jiang, Feng Liu, Tuo Zhao, Hongyuan Zha

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, sequence, (15 more...)

Country:

North America (0.28)
Asia > China (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Jie Ding, Robert Calderbank, Vahid Tarokh

Gradient Information for Representation and Modeling

Neural Information Processing SystemsOct-2-2025, 23:01:54 GMT

Some commonly used objective loss function such as cross-entropy loss for classification and squared loss for regression can be regarded as special cases of the logarithmic loss function.

artificial intelligence, information, machine learning, (17 more...)

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Neural Information Processing SystemsOct-2-2025, 22:56:31 GMT

Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model

There are two broad strategies for this type of likelihood-free inference problem.

artificial intelligence, machine learning, simulator, (17 more...)

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (0.96)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-2-2025, 22:53:44 GMT

Export Reviews, Discussions, Author Feedback and Meta-Reviews

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The underlying principle of this type of approach is to maintain a Bayesian posterior over dynamics (conditioned on past experienced transitions) and to seek at each time step for the action optimizing the related augmented MDP (on state-history meta-states and related meta-dynamics), which is generally an intractable problem (for exact solving). This contribution relies on two previous ideas, simulation-based search (with root sampling, which avoids updating the belief over dynamics during planning) and value function approximation (which requires introducing proper features for handling histories), combining them to provide a new approach. In addition to this general approach, called BAFA, the authors provide an alternative and more general proof for the validity of root sampling and provide some experimental results. Overall, the paper is well written and clear, it proposes a sound approach, based on known ideas but combining them smartly (especially for the history features, which seems to be the newest part/the core contribution). My major comment is that experiences, if convincing, are not detailed enough to be reproducible (for example, some meta-parameters are not provided, nor discussed).

algorithm, function approximation, simulation, (12 more...)

Country: North America > Canada > Quebec > Montreal (0.04)

Genre:

Research Report (0.47)
Summary/Review (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)