Looking at the last Google and Apple conventions it was clear to all: if in the past years the main buzzwords in the information technology field were IoT and Big Data, the catch'em all word of this year is without any doubts Machine Learning. What does this word exactly means? Are we talking about artificial intelligence? Somebody is trying to build a Skynet to ruin the world? Machines will steal my job in the future?

Looking at the latest Google and Apple conventions, it is clear to all: If in the past years the main buzzwords in the information technology field were IoT and Big Data, the catch-all word of this year is without a doubt Machine Learning. What does this word mean exactly? Are we talking about Artificial Intelligence? Is somebody trying to build a Skynet to ruin the world? Will machines steal my job in the future?

Fiez, Tanner, Jain, Lalit, Jamieson, Kevin G., Ratliff, Lillian

In this paper we introduce the pure exploration transductive linear bandit problem: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R} d$, a set of items $\mathcal{Z}\subset \mathbb{R} d$, a fixed confidence $\delta$, and an unknown vector $\theta {\ast}\in \mathbb{R} d$, the goal is to infer $\arg\max_{z\in \mathcal{Z}} z \top\theta \ast$ with probability $1-\delta$ by making as few sequentially chosen noisy measurements of the form $x \top\theta {\ast}$ as possible. When $\mathcal{X} \mathcal{Z}$, this setting generalizes linear bandits, and when $\mathcal{X}$ is the standard basis vectors and $\mathcal{Z}\subset \{0,1\} d$, combinatorial bandits. The transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages $\mathcal{X}$ a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages $\mathcal{Z}$ that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books $\mathcal{X}$ a user is queried about may be restricted to known best-sellers even though the goal might be to recommend more esoteric titles $\mathcal{Z}$.

Farina, Gabriele, Kroer, Christian, Sandholm, Tuomas

Regret minimization is a powerful tool for solving large-scale problems; it was recently used in breakthrough results for large-scale extensive-form-game solving. This was achieved by composing simplex regret minimizers into an overall regret-minimization framework for extensive-form-game strategy spaces. In this paper we study the general composability of regret minimizers. We derive a calculus for constructing regret minimizers for complex convex sets that are constructed from convexity-preserving operations on simpler convex sets. In particular, we show that local regret minimizers for the simpler sets can be composed with additional regret minimizers into an aggregate regret minimizer for the complex set. As an application of our framework we show that the CFR framework can be constructed easily from our framework. We also show how to construct a CFR variant for extensive-form games with strategy constraints. Unlike a recently proposed variant of CFR for strategy constraints by Davis, Waugh, and Bowling (2018), the algorithm resulting from our calculus does not depend on any unknown constants and thus avoids binary search.

Koren, Tomer, Livni, Roi, Mansour, Yishay

We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure $\mathcal{C}$ of the underlying metric which depends on its covering numbers. In finite metric spaces with $k$ actions, we give an efficient algorithm that achieves regret of the form $\widetilde(\max\set{\mathcal{C} {1/3}T {2/3},\sqrt{kT}})$, and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret $\widetilde{\Theta}(\max\set{k {1/3}T {2/3},\sqrt{kT}})$ where $\mathcal{C} \Theta(k)$, and (ii) the interval metric with regret $\widetilde{\Theta}(\max\set{T {2/3},\sqrt{kT}})$ where $\mathcal{C} \Theta(1)$.