AITopics | td update

Country:

Europe > United Kingdom > England > Bristol (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Taylor TD-learning

Neural Information Processing SystemsDec-23-2025, 17:37:20 GMT

Many reinforcement learning approaches rely on temporal-difference (TD) learning to learn a critic.However, TD-learning updates can be high variance.Here, we introduce a model-based RL framework, Taylor TD, which reduces this variance in continuous state-action settings. Taylor TD uses a first-order Taylor series expansion of TD updates.This expansion allows Taylor TD to analytically integrate over stochasticity in the action-choice, and some stochasticity in the state distribution for the initial state and action of each TD update.We include theoretical and empirical evidence that Taylor TD updates are indeed lower variance than standard TD updates. Additionally, we show Taylor TD has the same stable learning guarantees as standard TD-learning with linear function approximation under a reasonable assumption.Next, we combine Taylor TD with the TD3 algorithm, forming TaTD3.We show TaTD3 performs as well, if not better, than several state-of-the art model-free and model-based baseline algorithms on a set of standard benchmark tasks.

name change, taylor td-learning, td update, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Taylor TD-learning

Neural Information Processing SystemsOct-8-2025, 00:10:26 GMT

However, TD-learning updates can be high variance.

machine learning, reinforcement learning, variance, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Bristol (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

274a10ffa06e434f2a94df765cac6bf4-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 10:04:03 GMT

artificial intelligence, cr map, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.74)

Add feedback

Taylor TD-learning

Neural Information Processing SystemsOct-9-2024, 08:37:58 GMT

Many reinforcement learning approaches rely on temporal-difference (TD) learning to learn a critic.However, TD-learning updates can be high variance.Here, we introduce a model-based RL framework, Taylor TD, which reduces this variance in continuous state-action settings. Taylor TD uses a first-order Taylor series expansion of TD updates.This expansion allows Taylor TD to analytically integrate over stochasticity in the action-choice, and some stochasticity in the state distribution for the initial state and action of each TD update.We include theoretical and empirical evidence that Taylor TD updates are indeed lower variance than standard TD updates. Additionally, we show Taylor TD has the same stable learning guarantees as standard TD-learning with linear function approximation under a reasonable assumption.Next, we combine Taylor TD with the TD3 algorithm, forming TaTD3.We show TaTD3 performs as well, if not better, than several state-of-the art model-free and model-based baseline algorithms on a set of standard benchmark tasks.

taylor td-learning, td update, variance, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Neural Information Processing SystemsOct-8-2024, 09:11:16 GMT

The main algorithmic idea is a weighted combination of H step temporal differences, estimated on H steps (and rolled out by a learned model of the environment). The underlying idea is to allow the learner to tradeoff between estimation errors in model and Q function in different parts of the state-action space during learning. The updated TD estimator is incorporated into the DDPG algorithm in a straightforward manner. The update is computationally more intensive but the result is improved sample complexity. The experimental results on a variety of continuous control tasks show significant improvement over the baseline DDPG and a related method (MVE) (which is the precursor to this work). Overall, the paper is well written. The empirical results are very promising. The analysis and discussion is a bit limited but is not a major drawback. Overall, there is much to like about the paper.

algorithm, sample-efficient reinforcement learning, stochastic ensemble value expansion, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Hallucinating Value: A Pitfall of Dyna-style Planning with Imperfect Environment Models

Jafferjee, Taher, Imani, Ehsan, Talvitie, Erin, White, Martha, Bowling, Micheal

arXiv.org Artificial IntelligenceJun-8-2020

Dyna-style reinforcement learning (RL) agents improve sample efficiency over model-free RL agents by updating the value function with simulated experience generated by an environment model. However, it is often difficult to learn accurate models of environment dynamics, and even small errors may result in failure of Dyna agents. In this paper, we investigate one type of model error: hallucinated states. These are states generated by the model, but that are not real states of the environment. We present the Hallucinated Value Hypothesis (HVH): updating values of real states towards values of hallucinated states results in misleading state-action values which adversely affect the control policy. We discuss and evaluate four Dyna variants; three which update real states toward simulated -- and therefore potentially hallucinated -- states and one which does not. The experimental results provide evidence for the HVH thus suggesting a fruitful direction toward developing Dyna algorithms robust to model error.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2006.04363

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

td update

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

036912a83bdbb1fd792baf6532f102d8-Supplemental-Conference.pdf

274a10ffa06e434f2a94df765cac6bf4-AuthorFeedback.pdf

036912a83bdbb1fd792baf6532f102d8-Supplemental-Conference.pdf

036912a83bdbb1fd792baf6532f102d8-Paper-Conference.pdf

Taylor TD-learning

Taylor TD-learning

274a10ffa06e434f2a94df765cac6bf4-AuthorFeedback.pdf

Taylor TD-learning

Reviews: Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Hallucinating Value: A Pitfall of Dyna-style Planning with Imperfect Environment Models