AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

CoinDICE: Off-Policy Confidence Interval Estimation

Neural Information Processing SystemsOct-3-2025, 03:57:32 GMT

One of the major barriers that hinders the application of reinforcement learning (RL) is the ability to evaluate new policies reliably before deployment, a problem generally known as off-policy evaluation (OPE).

arxiv preprint arxiv, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada (0.28)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-3-2025, 03:53:51 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper focuses on l_1 regularized multi-task feature RL by means of an integration between multi-task feature learning (MTFL) and Fitted Q-learning. Clarity: The paper is mostly well written. Regarding the format of this paper, the font size is not right. A suggestion on nuclear norm: The nuclear norm is usually represented as ||\cdot||_*, where in the paper it is notated as ||\cdot||_1. There is a mistake in Assumption 5. Judging from the context, I think line 291 is right and line 299 is mistakenly written, and thus the formulation in Equation (5, 6) are wrong, where U should be U^{-1}.

algorithm, iteration, reviewer, (13 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Sparse Multi-Task Reinforcement Learning

Daniele Calandriello, Alessandro Lazaric, Marcello Restelli

Neural Information Processing SystemsOct-3-2025, 03:53:49 GMT

Neural Information Processing Systems http://nips.cc/

sparse multi-task reinforcement learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

873be0705c80679f2c71fbf4d872df59-Paper.pdf

Neural Information Processing SystemsOct-3-2025, 03:53:18 GMT

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

Bayesian Optimization for Iterative Learning Vu Nguyen

Neural Information Processing SystemsOct-3-2025, 03:52:20 GMT

The performance of deep (reinforcement) learning systems crucially depends on the choice of hyperparameters. Their tuning is notoriously expensive, typically requiring an iterative training process to run for numerous steps to convergence. Traditional tuning algorithms only consider the final performance of hyperparam-eters acquired after many expensive iterations and ignore intermediate information from earlier training steps. In this paper, we present a Bayesian optimization (BO) approach which exploits the iterative structure of learning algorithms for efficient hyperparameter tuning. We propose to learn an evaluation function compressing learning progress at any stage of the training process into a single numeric score according to both training success and stability. Our BO framework is then balancing the benefit of assessing a hyperparameter setting over additional training steps against their computation cost. We further increase model efficiency by selectively including scores from different training steps for any evaluated hyper-parameter set. We demonstrate the efficiency of our algorithm by tuning hyperpa-rameters for the training of deep reinforcement learning agents and convolutional neural networks. Our algorithm outperforms all existing baselines in identifying optimal hyperparameters in minimal time.

machine learning, optimization, reinforcement learning, (10 more...)

Neural Information Processing Systems

Technology: