Plotting





Learning Invariances using the Marginal Likelihood

Neural Information Processing Systems

Generalising well in supervised learning tasks relies on correctly extrapolating the training data to a large region of the input space. One way to achieve this is to constrain the predictions to be invariant to transformations of the input that are known to be irrelevant (e.g.




Reviews: Is Q-Learning Provably Efficient?

Neural Information Processing Systems

This paper studies the problem of efficient exploration in finite episodic MDPs. They present a variant of optimistic initialization tuned learning rates for Q-learning that recover a UCB-style algorithm. The main contribution of this work is a polynomial regret bound for perhaps one of the most iconic "model-free" algorithms. There are several things to like about this paper: - Q-learning is perhaps the classic intro to RL algorithms, so it's nice to see that we can recover sample efficient guarantees for a variant of this algorithm. The computational time is also particularly appealing compared to existing model-free algorithms with sqrt{T} *expected* (Bayesian) regret (such as RLSVI), which have much higher computational and memory requirements.