Goto

Collaborating Authors

 widehat


Exploiting the Replay Memory Before Exploring the Environment: Enhancing Reinforcement Learning Through Empirical MDP Iteration

Neural Information Processing Systems

Reinforcement learning (RL) algorithms are typically based on optimizing a Markov Decision Process (MDP) using the optimal Bellman equation. Recent studies have revealed that focusing the optimization of Bellman equations solely on in-sample actions tends to result in more stable optimization, especially in the presence of function approximation. Upon on these findings, in this paper, we propose an Empirical MDP Iteration (EMIT) framework.


Pessimism for Offline Linear Contextual Bandits using \ell_p Confidence Sets

Neural Information Processing Systems

We present a family $\{\widehat{\pi}_p\}_{p\ge 1}$ of pessimistic learning rules for offline learning of linear contextual bandits, relying on confidence sets with respect to different $\ell_p$ norms, where $\widehat{\pi}_2$ corresponds to Bellman-consistent pessimism (BCP), while $\widehat{\pi}_\infty$ is a novel generalization of lower confidence bound (LCB) to the linear setting. We show that the novel $\widehat{\pi}_\infty$ learning rule is, in a sense, adaptively optimal, as it achieves the minimax performance (up to log factors) against all $\ell_q$-constrained problems, and as such it strictly dominates all other predictors in the family, including $\widehat{\pi}_2$.


Semi-supervised Active Linear Regression

Neural Information Processing Systems

Labeled data often comes at a high cost as it may require recruiting human labelers or running costly experiments. At the same time, in many practical scenarios, one already has access to a partially labeled, potentially biased dataset that can help with the learning task at hand. Motivated by such settings, we formally initiate a study of ``semi-supervised active learning'' through the frame of linear regression. Here, the learner has access to a dataset $X \in \mathbb{R}^{(n_{\text{un}}+n_{\text{lab}}) \times d}$ composed of $n_{\text{un}}$ unlabeled examples that a learner can actively query, and $n_{\text{lab}}$ examples labeled a priori.


Exploiting the Replay Memory Before Exploring the Environment: Enhancing Reinforcement Learning Through Empirical MDP Iteration

Neural Information Processing Systems

Reinforcement learning (RL) algorithms are typically based on optimizing a Markov Decision Process (MDP) using the optimal Bellman equation. Recent studies have revealed that focusing the optimization of Bellman equations solely on in-sample actions tends to result in more stable optimization, especially in the presence of function approximation. Upon on these findings, in this paper, we propose an Empirical MDP Iteration (EMIT) framework. For each of these empirical MDPs, it learns an estimated Q-function denoted as \widehat{Q} . The key strength is that by restricting the Bellman update to in-sample bootstrapping, each empirical MDP converges to a unique optimal \widehat{Q} function.


Pessimism for Offline Linear Contextual Bandits using \ell_p Confidence Sets

Neural Information Processing Systems

We present a family \{\widehat{\pi}_p\}_{p\ge 1} of pessimistic learning rules for offline learning of linear contextual bandits, relying on confidence sets with respect to different \ell_p norms, where \widehat{\pi}_2 corresponds to Bellman-consistent pessimism (BCP), while \widehat{\pi}_\infty is a novel generalization of lower confidence bound (LCB) to the linear setting. We show that the novel \widehat{\pi}_\infty learning rule is, in a sense, adaptively optimal, as it achieves the minimax performance (up to log factors) against all \ell_q -constrained problems, and as such it strictly dominates all other predictors in the family, including \widehat{\pi}_2 .


Semi-supervised Active Linear Regression

Neural Information Processing Systems

Labeled data often comes at a high cost as it may require recruiting human labelers or running costly experiments. At the same time, in many practical scenarios, one already has access to a partially labeled, potentially biased dataset that can help with the learning task at hand. Motivated by such settings, we formally initiate a study of semi-supervised active learning'' through the frame of linear regression. In this paper, we introduce an instance dependent parameter called the reduced rank, denoted \text{R}_X, and propose an efficient algorithm with query complexity O(\text{R}_X/\epsilon) . This result directly implies improved upper bounds for two important special cases: (i) active ridge regression, and (ii) active kernel ridge regression, where the reduced-rank equates to the statistical dimension'', \textsf{sd}_\lambda and effective dimension'', d_\lambda of the problem respectively, where \lambda \ge 0 denotes the regularization parameter. Finally, we introduce a distributional version of the problem as a special case of the agnostic formulation we consider earlier; here, for every X, we prove a matching instance-wise lower bound of \Omega (\text{R}_X / \epsilon) on the query complexity of any algorithm.


Teach Machine to Comprehend Text and Answer Question with Tensorflow - Part I · Han Xiao Tech Blog

#artificialintelligence

Reading comprehension is one of the fundamental skills for human, which one must learn systematically since the elementary school. Do you still remember how the worksheet of your reading class looks like? It usually consists of an article and few questions about its content. To answer these questions, you need to first gather information by collecting answer-related sentences from the article. Sometimes you can directly copy those original sentences from the article as the final answer.


Gradient Descent Learns Linear Dynamical Systems

#artificialintelligence

A linear dynamical system (A,B,C,D) is equivalent to the system (TAT {-1}, TB, CT {-1}, D) for any invertible matrix T in terms of the behavior of the outputs. A little thought shows therefore that in its unrestricted parameterization the objective function cannot have a unique optimum. A common way of removing this redundancy is to impose a canonical form.


What my deep model doesn't know... Yarin Gal - Blog Cambridge Machine Learning Group

#artificialintelligence

I come from the Cambridge machine learning group. More than once I heard people referring to us as "the most Bayesian machine learning group in the world". I mean, we do work with probabilistic models and uncertainty on a daily basis. Maybe that's why it felt so weird playing with those deep learning models (I know, joining the party very late). You see, I spent the last several years working mostly with Gaussian processes, modelling probability distributions over functions. I'm used to uncertainty bounds for decision making, in a similar way many biologists rely on model uncertainty to analyse their data. Working with point estimates alone felt weird to me. I couldn't tell whether the new model I was playing with was making sensible predictions or just guessing at random. I'm certain you've come across this problem yourself, either analysing data or solving some tasks, where you wished you could tell whether your model is certain about its output, asking yourself "maybe I need to use more diverse data? or perhaps change the model?". Most deep learning tools operate in a very different setting to the probabilistic models which possess this invaluable uncertainty information, as one would believe. I recently spent some time trying to understand why these deep learning models work so well – trying to relate them to new research from the last couple of years. I was quite surprised to see how close these were to my beloved Gaussian processes. I was even more surprised to see that we can get uncertainty information from these deep learning models for free – without changing a thing. Update (29/09/2015): I spotted a typo in the calculation of \tau; this has been fixed below.