Technicolor Labs
Matroid Bandits: Practical Large-Scale Combinatorial Bandits
Kveton, Branislav (Technicolor Labs) | Wen, Zheng (Technicolor Labs) | Ashkan, Azin (Technicolor Labs) | Eydgahi, Hoda (Technicolor Labs)
A matroid is a notion of independence that is closely related to computational efficiency in combinatorial optimization. In this work, we bring together the ideas of matroids and multi-armed bandits, and propose a new class of stochastic combinatorial bandits, matroid bandits. A key characteristic of this class is that matroid bandits can be solved both computationally and sample efficiently. We propose a practical algorithm for our problem and bound its regret. The regret scales favorably with all quantities of interest. We evaluate our approach on the problem of learning routing networks for Internet service providers. Our results clearly show that the approach is practical.
Structured Kernel-Based Reinforcement Learning
Kveton, Branislav (Technicolor Labs) | Theocharous, Georgios (Adobe)
Kernel-based reinforcement learning (KBRL) is a popular approach to learning non-parametric value function approximations. In this paper, we present structured KBRL, a paradigm for kernel-based RL that allows for modeling independencies in the transition and reward models of problems. Real-world problems often exhibit this structure and can be solved more efficiently when it is modeled. We make three contributions. First, we motivate our work, define a structured backup operator, and prove that it is a contraction. Second, we show how to evaluate our operator efficiently. Our analysis reveals that the fixed point of the operator is the optimal value function in a special factored MDP. Finally, we evaluate our method on a synthetic problem and compare it to two KBRL baselines. In most experiments, we learn better policies than the baselines from an order of magnitude less training data.
Kernel-Based Reinforcement Learning on Representative States
Kveton, Branislav (Technicolor Labs) | Theocharous, Georgios (Yahoo Labs)
Markov decision processes (MDPs) are an established framework for solving sequential decision-making problems under uncertainty. In this work, we propose a new method for batch-mode reinforcement learning (RL) with continuous state variables. The method is an approximation to kernel-based RL on a set of k representative states. Similarly to kernel-based RL, our solution is a fixed point of a kernelized Bellman operator and can approximate the optimal solution to an arbitrary level of granularity. Unlike kernel-based RL, our method is fast. In particular, our policies can be computed in O ( n ) time, where n is the number of training examples. The time complexity of kernel-based RL is Ω( n 2 ). We introduce our method, analyze its convergence, and compare it to existing work. The method is evaluated on two existing control problems with 2 to 4 continuous variables and a new problem with 64 variables. In all cases, we outperform state-of-the-art results and offer simpler solutions.