AITopics | Edmonton

Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear q-pi Realizability and Concentrability

Neural Information Processing SystemsOct-10-2025, 10:37:16 GMT

The hope in this setting is that learning a good policy will be possible without requiring a sample size that scales with the number of states in the MDP . Foster et al. [ 2021 ] have shown this to be impossible even under concentrability, a data coverage assumption where a coefficient C

min null 1, optimization problem 1, probability, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Distributional Reinforcement Learning with Regularized Wasserstein Loss Ke Sun

Neural Information Processing SystemsOct-10-2025, 06:13:43 GMT

Empirically, we show that SinkhornDRL consistently outperforms or matches existing algorithms on the Atari games suite and particularly stands out in the multi-dimensional reward setting.

algorithm, sinkhorn divergence, sinkhorndrl, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games > Computer Games (0.56)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch

Neural Information Processing SystemsOct-10-2025, 06:04:55 GMT

Detecting and handling misspecified objectives, such as reward functions, has been widely recognized as one of the central challenges within the domain of Artificial Intelligence (AI) safety research.

occupancy frequency, optimal policy, reward function, (15 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Colorado (0.04)
North America > United States > Arizona (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

Optimal Algorithms for Augmented Testing of Discrete Distributions

Neural Information Processing SystemsOct-9-2025, 19:10:29 GMT

We consider the problem of hypothesis testing for discrete distributions.

algorithm, closeness testing, probability, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(16 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management (0.92)

Add feedback

Weighted importance sampling for off-policy learning with linear function approximation

A. Rupam Mahmood, Hado P. van Hasselt, Richard S. Sutton

Neural Information Processing SystemsOct-9-2025, 14:09:03 GMT

Importance sampling is an essential component of off-policy model-free reinforcement learning algorithms. However, its most effective variant, weighted importance sampling, does not carry over easily to function approximation and, because of this, it is not utilized in existing off-policy learning algorithms. In this paper, we take two steps toward bridging this gap. First, we show that weighted importance sampling can be viewed as a special case of weighting the error of individual training samples, and that this weighting has theoretical and empirical benefits similar to those of weighted importance sampling. Second, we show that these benefits extend to a new weighted-importance-sampling version of off-policy LSTD(). We show empirically that our new WIS-LSTD() algorithm can result in much more rapid and reliable convergence than conventional off-policy LSTD() (Y u 2010, Bertsekas & Y u 2009).

algorithm, function approximation, wis-lstd, (13 more...)

Neural Information Processing Systems

Country: