AITopics | Statistical Learning

Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation

Neural Information Processing SystemsApr-25-2026, 05:46:55 GMT

Recent studies in reinforcement learning (RL) have made significant progress by leveraging function approximation to alleviate the sample complexity hurdle for better performance. Despite the success, existing provably efficient algorithms typically rely on the accessibility of immediate feedback upon taking actions. The failure to account for the impact of delay in observations can significantly degrade the performance of real-world systems due to the regret blow-up. In this work, we tackle the challenge of delayed feedback in RL with linear function approximation by employing posterior sampling, which has been shown to empirically outperform the popular UCB algorithms in a wide range of regimes. We first introduce Delayed-PSVI, an optimistic value-based algorithm that effectively explores the value function space via noise perturbation with posterior sampling. We provide the first analysis for posterior sampling algorithms with delayed feedback in RL and show our algorithm achieves eO( d3H3T +d2H2E[τ])worst-case regret in the presence of unknown stochastic delays. Here E[τ] is the expected delay. To further improve its computational efficiency and to expand its applicability in high-dimensional RL problems, we incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI, which maintains the same order-optimal regret guarantee with eO(dHK) computational cost. Empirical evaluations are performed to demonstrate the statistical and computational efficacy of our algorithms.

machine learning, probability 1, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report (0.87)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.48)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)

Add feedback

299dc35e747eb77177d9cea10a802da2-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 05:46:49 GMT

artificial intelligence, machine learning, vector, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Information Management > Search (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

29ef811e72b2b97cf18dd5d866b0f472-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 05:30:57 GMT

artificial intelligence, excess risk, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.15)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

29ef811e72b2b97cf18dd5d866b0f472-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 05:30:54 GMT

artificial intelligence, excess risk, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.15)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Multitask Learning with No Regret: from Improved Confidence Bounds to Active Learning

Neural Information Processing SystemsApr-25-2026, 05:30:39 GMT

Multitask learning is a powerful framework that enables one to simultaneously learn multiple related tasks by sharing information between them. Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning. In this work, we provide novel confidence intervals for multitask regression in the challenging agnostic setting, i.e., when neither the similarity between tasks nor the tasks' features are available to the learner. The obtained intervals do not require i.i.d.

artificial intelligence, confidence interval, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Industry:

Education > Educational Setting > Online (0.47)
Health & Medicine > Pharmaceuticals & Biotechnology (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.61)

Add feedback

Multitask Learning with No Regret: from Improved Confidence Bounds to Active Learning

Neural Information Processing SystemsApr-25-2026, 05:30:36 GMT

Multitask learning is a powerful framework that enables one to simultaneously learn multiple related tasks by sharing information between them. Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning. In this work, we provide novel confidence intervals for multitask regression in the challenging agnostic setting, i.e., when neither the similarity between tasks nor the tasks' features are available to the learner. The obtained intervals do not require i.i.d.

artificial intelligence, confidence interval, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Industry:

Education > Educational Setting > Online (0.49)
Health & Medicine > Pharmaceuticals & Biotechnology (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.61)

Add feedback

DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method

Neural Information Processing SystemsApr-25-2026, 05:29:40 GMT

This paper proposes a new easy-to-implement parameter-free gradient-based optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient--matching the convergence rate of optimally tuned gradient descent in convex optimization up to a logarithmic factor without tuning any parameters, and universal--automatically adapting to both smooth and nonsmooth problems. While popular algorithms following the AdaGrad framework compute a running average of the squared gradients to use for normalization, DoWG maintains a new distance-based weighted version of the running average, which is crucial to achieve the desired properties. To complement our theory, we also show empirically that DoWG trains at the edge of stability, and validate its effectiveness on practical machine learning tasks.

artificial intelligence, gradient descent, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.68)
Europe (0.67)
Asia (0.67)
North America > United States > California (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Add feedback

297f7c6c56af81239f7c47d21558b75a-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 05:12:30 GMT

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Japan (0.28)
Asia > China (0.28)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Data Science (0.93)

Add feedback

291d43c696d8c3704cdbe0a72ade5f6c-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 05:12:20 GMT

artificial intelligence, machine learning, segmentation, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Data Science (0.93)
(2 more...)

Add feedback

Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions

Neural Information Processing SystemsApr-25-2026, 05:10:25 GMT

Off-policy evaluation often refers to two related tasks: estimating the expected return of a policy and estimating its value function (or other functions of interest, such as density ratios). While recent works on marginalized importance sampling (MIS) show that the former can enjoy provable guarantees under realizable function approximation, the latter is only known to be feasible under much stronger assumptions such as prohibitively expressive discriminators. In this work, we provide guarantees for off-policy function estimation under only realizability, by imposing proper regularization on the MIS objectives. Compared to commonly used regularization in MIS, our regularizer is much more flexible and can account for an arbitrary user-specified distribution, under which the learned function will be close to the groundtruth. We provide exact characterization of the optimal dual solution that needs to be realized by the discriminator class, which determines the datacoverage assumption in the case of value-function learning. As another surprising observation, the regularizer can be altered to relax the data-coverage requirement, and completely eliminate it in the ideal case with strong side information.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Filters

Collaborating Authors

Statistical Learning

Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation

299dc35e747eb77177d9cea10a802da2-Paper.pdf

29ef811e72b2b97cf18dd5d866b0f472-Supplemental-Conference.pdf

29ef811e72b2b97cf18dd5d866b0f472-Paper-Conference.pdf

Multitask Learning with No Regret: from Improved Confidence Bounds to Active Learning

Multitask Learning with No Regret: from Improved Confidence Bounds to Active Learning

DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method

297f7c6c56af81239f7c47d21558b75a-Paper-Conference.pdf

291d43c696d8c3704cdbe0a72ade5f6c-Paper.pdf

Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions