AITopics | model-based method

Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

Neural Information Processing SystemsApr-24-2026, 12:11:10 GMT

Estimating the per-state expected cumulative rewards is a critical aspect of reinforcement learning approaches, however the experience is obtained, but standard deep neural-network function-approximation methods are often inefficient in this setting. An alternative approach, exemplified by value iteration networks, is to learn transition and reward models of a latent Markov decision process whose value predictions fit the data. This approach has been shown empirically to converge faster to a more robust solution in many cases, but there has been little theoretical study of this phenomenon. In this paper, we explore such implicit representations of value functions via theory and focused experimentation. We prove that, for a linear parametrization, gradient descent converges to global optima despite nonlinearity and non-convexity introduced by the implicit representation. Furthermore, we derive convergence rates for both cases which allow us to identify conditions under which stochastic gradient descent (SGD) with this implicit representation converges substantially faster than its explicit counterpart. Finally, we provide empirical results in some simple domains that illustrate the theoretical findings.

machine learning, parameterization, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

Neural Information Processing SystemsApr-24-2026, 12:11:06 GMT

Estimating the per-state expected cumulative rewards is a critical aspect of reinforcement learning approaches, however the experience is obtained, but standard deep neural-network function-approximation methods are often inefficient in this setting. An alternative approach, exemplified by value iteration networks, is to learn transition and reward models of a latent Markov decision process whose value predictions fit the data. This approach has been shown empirically to converge faster to a more robust solution in many cases, but there has been little theoretical study of this phenomenon. In this paper, we explore such implicit representations of value functions via theory and focused experimentation. We prove that, for a linear parametrization, gradient descent converges to global optima despite nonlinearity and non-convexity introduced by the implicit representation. Furthermore, we derive convergence rates for both cases which allow us to identify conditions under which stochastic gradient descent (SGD) with this implicit representation converges substantially faster than its explicit counterpart. Finally, we provide empirical results in some simple domains that illustrate the theoretical findings.

machine learning, parameterization, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Supplementary Material A Access to and Benchmark

Neural Information Processing SystemsFeb-16-2026, 23:27:08 GMT

Figure 10: Illustration of the frame-based pupil segmentation: (a) the input eye image I; (b) the generate binary mask M; and (c) the detected pupil boundary Q and the pupil center c. 16 C More Details in Experiment C.1 Evaluation metrics The detailed description of the four metrics adopted for the dataset evalution are as follows:

artificial intelligence, human computer interaction, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China > Shandong Province (0.04)

Genre: Research Report (0.47)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.47)
Information Technology > Human Computer Interaction > Interfaces (0.31)

Add feedback

EV-Eye: Rethinking High-frequency Eye Tracking through the Lenses of Event Cameras

Neural Information Processing SystemsFeb-16-2026, 23:27:05 GMT

In this paper, we present EV-Eye, a first-of-its-kind large-scale multimodal eye tracking dataset aimed at inspiring research on high-frequency eye/gaze tracking. EV -Eye utilizes the emerging bio-inspired event camera to capture independent pixel-level intensity changes induced by eye movements, achieving sub-microsecond latency.

artificial intelligence, machine learning, pattern recognition, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.15)
Asia > China (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

In this work, we provide a fundamental unified convergence theorem used for deriving expected and almost sure convergence results for a series of stochastic optimization methods.

artificial intelligence, machine learning, verifying, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

c2c701fe341a7756ca7fd4eaa83ff63f-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 01:01:44 GMT

complexity, model-based method, optimization, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

48db71587df6c7c442e5b76cc723169a-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 07:44:37 GMT

model-based behavior, model-based method, muzero, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

When to Trust Your Model: Model-Based Policy Optimization

Neural Information Processing SystemsDec-25-2025, 11:20:52 GMT

Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy optimization both theoretically and empirically. We first formulate and analyze a model-based reinforcement learning algorithm with a guarantee of monotonic improvement at each step. In practice, this analysis is overly pessimistic and suggests that real off-policy data is always preferable to model-generated on-policy data, but we show that an empirical estimate of model generalization can be incorporated into such analysis to justify model usage. Motivated by this analysis, we then demonstrate that a simple procedure of using short model-generated rollouts branched from real data has the benefits of more complicated model-based algorithms without the usual pitfalls. In particular, this approach surpasses the sample efficiency of prior model-based methods, matches the asymptotic performance of the best model-free algorithms, and scales to horizons that cause other model-based methods to fail entirely.

algorithm, model-based policy optimization, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

model-based method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization

Supplementary Material A Access to and Benchmark

EV-Eye: Rethinking High-frequency Eye Tracking through the Lenses of Event Cameras

aaebdb8bb6b0e73f6c3c54a0ab0c6415-AuthorFeedback.pdf

576d026223582a390cd323bef4bad026-AuthorFeedback.pdf

AUnifiedConvergenceTheoremforStochastic OptimizationMethods

c2c701fe341a7756ca7fd4eaa83ff63f-Paper.pdf

48db71587df6c7c442e5b76cc723169a-Paper.pdf

When to Trust Your Model: Model-Based Policy Optimization