Goto

Collaborating Authors

 Reinforcement Learning


CoinDICE: Off-Policy Confidence Interval Estimation

Neural Information Processing Systems

One of the major barriers that hinders the application of reinforcement learning (RL) is the ability to evaluate new policies reliably before deployment, a problem generally known as off-policy evaluation (OPE).


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper focuses on l_1 regularized multi-task feature RL by means of an integration between multi-task feature learning (MTFL) and Fitted Q-learning. Clarity: The paper is mostly well written. Regarding the format of this paper, the font size is not right. A suggestion on nuclear norm: The nuclear norm is usually represented as ||\cdot||_*, where in the paper it is notated as ||\cdot||_1. There is a mistake in Assumption 5. Judging from the context, I think line 291 is right and line 299 is mistakenly written, and thus the formulation in Equation (5, 6) are wrong, where U should be U^{-1}.




Bayesian Optimization for Iterative Learning Vu Nguyen

Neural Information Processing Systems

The performance of deep (reinforcement) learning systems crucially depends on the choice of hyperparameters. Their tuning is notoriously expensive, typically requiring an iterative training process to run for numerous steps to convergence. Traditional tuning algorithms only consider the final performance of hyperparam-eters acquired after many expensive iterations and ignore intermediate information from earlier training steps. In this paper, we present a Bayesian optimization (BO) approach which exploits the iterative structure of learning algorithms for efficient hyperparameter tuning. We propose to learn an evaluation function compressing learning progress at any stage of the training process into a single numeric score according to both training success and stability. Our BO framework is then balancing the benefit of assessing a hyperparameter setting over additional training steps against their computation cost. We further increase model efficiency by selectively including scores from different training steps for any evaluated hyper-parameter set. We demonstrate the efficiency of our algorithm by tuning hyperpa-rameters for the training of deep reinforcement learning agents and convolutional neural networks. Our algorithm outperforms all existing baselines in identifying optimal hyperparameters in minimal time.