AITopics

We study off-policy evaluation (OPE) in partially observable environments with complex observations, with the goal of developing estimators whose guarantee avoids exponential dependence on the horizon. While such estimators exist for MDPs and POMDPs can be converted to history-based MDPs, their estimation errors depend on the state-density ratio for MDPs which becomes history ratios after conversion, an exponential object. Recently, Uehara et al. [2022a] proposed future-dependent value functions as a promising framework to address this issue, where the guarantee for memoryless policies depends on the density ratio over the latent state space. However, it also depends on the boundedness of the futuredependent value function and other related quantities, which we show could be exponential-in-length and thus erasing the advantage of the method. In this paper, we discover novel coverage assumptions tailored to the structure of POMDPs, such as outcome coverage and belief coverage, which enable polynomial bounds on the aforementioned quantities. As a side product, our analyses also lead to the discovery of new algorithms with complementary properties.

artificial intelligence, assumption, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.14)
North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Strategic Distribution Shift of Interacting Agents via Coupled Gradient Flows

Neural Information Processing SystemsMar-27-2025, 12:12:32 GMT

We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems that captures the feedback loop between learning algorithms and the distributions on which they are deployed.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.40)

Add feedback

Transductive Active Learning: Theory and Applications Bhavya Sukhija Department of Computer Science Department of Computer Science ETH Zürich, Switzerland ETH Zürich, Switzerland Lenart Treven

Neural Information Processing SystemsMar-27-2025, 12:12:23 GMT

We study a generalization of classical active learning to real-world settings with concrete prediction targets where sampling is restricted to an accessible region of the domain, while prediction targets may lie outside this region. We analyze a family of decision rules that sample adaptively to minimize uncertainty about prediction targets. We are the first to show, under general regularity assumptions, that such decision rules converge uniformly to the smallest possible uncertainty obtainable from the accessible data. We demonstrate their strong sample efficiency in two key applications: active fine-tuning of large neural networks and safe Bayesian optimization, where they achieve state-of-the-art performance.

artificial intelligence, machine learning, survey article, (19 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (1.00)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.92)
Research Report > New Finding (0.67)

Industry:

Education (1.00)
Health & Medicine (0.92)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits

Neural Information Processing SystemsMar-27-2025, 12:12:21 GMT

We present a unified likelihood ratio-based confidence sequence (CS) for any (selfconcordant) generalized linear model (GLM) that is guaranteed to be convex and numerically tight. We show that this is on par or improves upon known CSs for various GLMs, including Gaussian, Bernoulli, and Poisson. In particular, for the first time, our CS for Bernoulli has a poly(S)-free radius where S is the norm of the unknown parameter. Our first technical novelty is its derivation, which utilizes a time-uniform PAC-Bayesian bound with a uniform prior/posterior, despite the latter being a rather unpopular choice for deriving CSs. As a direct application of our new CS, we propose a simple and natural optimistic algorithm called OFUGLB, applicable to any generalized linear bandits (GLB; Filippi et al. (2010)). Our analysis shows that the celebrated optimistic approach simultaneously attains stateof-the-art regrets for various self-concordant (not necessarily bounded) GLBs, and even poly(S)-free for bounded GLBs, including logistic bandits. The regret analysis, our second technical novelty, follows from combining our new CS with a new proof technique that completely avoids the previously widely used selfconcordant control lemma (Faury et al., 2020, Lemma 9). Numerically, OFUGLB outperforms or is at par with prior algorithms for logistic bandits.

artificial intelligence, machine learning, proceedings, (12 more...)

Neural Information Processing Systems

Country: North America > United States > Arizona (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.46)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.45)

Add feedback

Support Recovery in Sparse PCA with Incomplete Data

Neural Information Processing SystemsMar-27-2025, 12:12:10 GMT

We study a practical algorithm for sparse principal component analysis (PCA) of incomplete and noisy data.

artificial intelligence, machine learning, sparse pca, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.68)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Distributional Successor Features Enable Zero-Shot Policy Optimization

Neural Information Processing SystemsMar-27-2025, 12:12:02 GMT

Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through planning. However, autoregressive model rollouts suffer from compounding error, making model-based RL ineffective for long-horizon problems. Successor features offer an alternative by modeling a policy's long-term state occupancy, reducing policy evaluation under new rewards to linear regression. Yet, policy optimization with successor features can be challenging. This work proposes a novel class of models, i.e., Distributional Successor Features for Zero-Shot Policy Optimization (DiSPOs), that learn a distribution of successor features of a stationary dataset's behavior policy, along with a policy that acts to realize different successor features within the dataset. By directly modeling long-term outcomes in the dataset, DiSPOs avoid compounding error while enabling a simple scheme for zero-shot policy optimization across reward functions. We present a practical instantiation of DiSPOs using diffusion models and show their efficacy as a new class of transferable models, both theoretically and empirically across various simulated robotics problems.

large language model, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.14)
North America > United States > Maryland (0.14)
North America > United States > Hawaii (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Provable benefits of annealing for estimating normalizing constants Anonymous Author(s) Affiliation Address email

Neural Information Processing SystemsMar-27-2025, 12:11:49 GMT

In fact, in a particular limit, the optimal path is arithmetic.

artificial intelligence, estimation error, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

90080022263cddafddd4a0726f1fb186-Paper-Conference.pdf

Neural Information Processing SystemsMar-27-2025, 12:11:45 GMT

Recent research has developed several Monte Carlo methods for estimating the normalization constant (partition function) based on the idea of annealing. This means sampling successively from a path of distributions that interpolate between a tractable "proposal" distribution and the unnormalized "target" distribution. Prominent estimators in this family include annealed importance sampling and annealed noise-contrastive estimation (NCE). Such methods hinge on a number of design choices: which estimator to use, which path of distributions to use and whether to use a path at all; so far, there is no definitive theory on which choices are efficient. Here, we evaluate each design choice by the asymptotic estimation error it produces.

artificial intelligence, estimation error, machine learning, (14 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

Filters

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

af0ad514b9cda46bd49e14ee11e2672f-Supplemental-Conference.pdf

af0ad514b9cda46bd49e14ee11e2672f-Paper-Conference.pdf

On the Curses of Future and History in Future-dependent Value Functions for OPE

Strategic Distribution Shift of Interacting Agents via Coupled Gradient Flows

Transductive Active Learning: Theory and Applications Bhavya Sukhija Department of Computer Science Department of Computer Science ETH Zürich, Switzerland ETH Zürich, Switzerland Lenart Treven

A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits

Support Recovery in Sparse PCA with Incomplete Data

Distributional Successor Features Enable Zero-Shot Policy Optimization

Provable benefits of annealing for estimating normalizing constants Anonymous Author(s) Affiliation Address email

90080022263cddafddd4a0726f1fb186-Paper-Conference.pdf