AITopics | Statistical Learning

Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

Neural Information Processing SystemsApr-24-2026, 12:11:06 GMT

Estimating the per-state expected cumulative rewards is a critical aspect of reinforcement learning approaches, however the experience is obtained, but standard deep neural-network function-approximation methods are often inefficient in this setting. An alternative approach, exemplified by value iteration networks, is to learn transition and reward models of a latent Markov decision process whose value predictions fit the data. This approach has been shown empirically to converge faster to a more robust solution in many cases, but there has been little theoretical study of this phenomenon. In this paper, we explore such implicit representations of value functions via theory and focused experimentation. We prove that, for a linear parametrization, gradient descent converges to global optima despite nonlinearity and non-convexity introduced by the implicit representation. Furthermore, we derive convergence rates for both cases which allow us to identify conditions under which stochastic gradient descent (SGD) with this implicit representation converges substantially faster than its explicit counterpart. Finally, we provide empirical results in some simple domains that illustrate the theoretical findings.

machine learning, parameterization, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

0678c572b0d5597d2d4a6b5bd135754c-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 12:10:47 GMT

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.93)

Industry:

Government > Regional Government > North America Government > United States Government (0.47)
Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Data Science (0.68)

Add feedback

063e26c670d07bb7c4d30e6fc69fe056-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 12:10:09 GMT

artificial intelligence, data mining, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking

Neural Information Processing SystemsApr-24-2026, 11:50:20 GMT

The backtracking line-search is an effective technique to automatically tune the step-size in smooth optimization. It guarantees similar performance to using the theoretically optimal step-size. Many approaches have been developed to instead tune per-coordinate step-sizes, also known as diagonal preconditioners, but none of the existing methods are provably competitive with the optimal per-coordinate stepsizes. We propose multidimensional backtracking, an extension of the backtracking line-search to find good diagonal preconditioners for smooth convex problems. Our key insight is that the gradient with respect to the step-sizes, also known as hypergradients, yields separating hyperplanes that let us search for good preconditioners using cutting-plane methods. As black-box cutting-plane approaches like the ellipsoid method are computationally prohibitive, we develop an efficient algorithm tailored to our setting. Multidimensional backtracking is provably competitive with the best diagonal preconditioner and requires no manual tuning.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Transportation > Air (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Mathematics of Computing (0.92)

Add feedback

05a70454516ecd9194c293b0e415777f-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 11:49:30 GMT

artificial intelligence, machine learning, modeling & simulation, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.14)

Technology:

Information Technology > Modeling & Simulation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

T-LoHo: ABayesian Regularization Model for Structured Sparsity and Smoothness on Graphs

Neural Information Processing SystemsApr-24-2026, 11:49:26 GMT

Graphs have been commonly used to represent complex data structures. In models dealing with graph-structured data, multivariate parameters may not only exhibit sparse patterns but have structured sparsity and smoothness in the sense that both zero and non-zero parameters tend to cluster together. We propose a new prior for high-dimensional parameters with graphical relations, referred to as the Treebased Low-rank Horseshoe (T-LoHo) model, that generalizes the popular univariate Bayesian horseshoe shrinkage prior to the multivariate setting to detect structured sparsity and smoothness simultaneously. The T-LoHo prior can be embedded in many high-dimensional hierarchical models. To illustrate its utility, we apply it to regularize a Bayesian high-dimensional regression problem where the regression coefficients are linked by a graph, so that the resulting clusters have flexible shapes and satisfy the cluster contiguity constraint with respect to the graph. We design an efficient Markov chain Monte Carlo algorithm that delivers full Bayesian inference with uncertainty measures for model parameters such as the number of clusters. We offer theoretical investigations of the clustering effects and posterior concentration results. Finally, we illustrate the performance of the model with simulation studies and a real data application for anomaly detection on a road network. The results indicate substantial improvements over other competing methods such as the sparse fused lasso.

artificial intelligence, machine learning, t-loho, (14 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.28)

Genre: Research Report (0.46)

Industry: Transportation > Ground > Road (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)
(2 more...)

Add feedback

0537fb40a68c18da59a35c2bfe1ca554-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 11:29:53 GMT

machine learning, natural language, topic model, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report (0.46)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

07956d40074d6523bad11112b3225c6e-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 11:10:06 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound

Neural Information Processing SystemsApr-24-2026, 11:08:51 GMT

We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective. The resulting stochastic majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight generalization bounds, in a series of numerical experiments when compared to competing algorithms which also minimize PACBayes objectives - both with uninformed (data-independent) and informed (datadependent) priors.

artificial intelligence, classifier, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre:

Research Report (0.68)
Instructional Material (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)

Add feedback

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

Neural Information Processing SystemsApr-24-2026, 11:08:32 GMT

We develop a framework for analyzing the training and learning rate dynamics on a large class of high-dimensional optimization problems, which we call the high line, trained using one-pass stochastic gradient descent (SGD) with adaptive learning rates. We give exact expressions for the risk and learning rate curves in terms of a deterministic solution to a system of ODEs. We then investigate in detail two adaptive learning rates -- an idealized exact line search and AdaGrad-Norm -- on the least squares problem. When the data covariance matrix has strictly positive eigenvalues, this idealized exact line search strategy can exhibit arbitrarily slower convergence when compared to the optimal fixed learning rate with SGD. Moreover we exactly characterize the limiting learning rate (as time goes to infinity) for line search in the setting where the data covariance has only two distinct eigenvalues. For noiseless targets, we further demonstrate that the AdaGrad-Norm learning rate converges to a deterministic constant inversely proportional to the average eigenvalue of the data covariance matrix, and identify a phase transition when the covariance density of eigenvalues follows a power law distribution.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

Add feedback

Filters

Collaborating Authors

Statistical Learning

Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

0678c572b0d5597d2d4a6b5bd135754c-Paper.pdf

063e26c670d07bb7c4d30e6fc69fe056-Paper.pdf

Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking

05a70454516ecd9194c293b0e415777f-Supplemental.pdf

T-LoHo: ABayesian Regularization Model for Structured Sparsity and Smoothness on Graphs

0537fb40a68c18da59a35c2bfe1ca554-Paper.pdf

07956d40074d6523bad11112b3225c6e-Supplemental-Conference.pdf

Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms