AITopics | Valko, Michal

Collaborating Authors

Valko, Michal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Stochastic bandits with arm-dependent delays

Manegueu, Anne Gael, Vernade, Claire, Carpentier, Alexandra, Valko, Michal

arXiv.org Machine LearningJun-18-2020

Significant work has been recently dedicated to the stochastic delayed bandit setting because of its relevance in applications. The applicability of existing algorithms is however restricted by the fact that strong assumptions are often made on the delay distributions, such as full observability, restrictive shape constraints, or uniformity over arms. In this work, we weaken them significantly and only assume that there is a bound on the tail of the delay. In particular, we cover the important case where the delay distributions vary across arms, and the case where the delays are heavy-tailed. Addressing these difficulties, we propose a simple but efficient UCB-based algorithm called the PatientBandits. We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.

algorithm, artificial intelligence, data mining, (19 more...)

arXiv.org Machine Learning

2006.10459

Country: Europe > Germany > Saxony-Anhalt (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.68)

Add feedback

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

Perrault, Pierre, Boursier, Etienne, Perchet, Vianney, Valko, Michal

arXiv.org Machine LearningJun-11-2020

We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback (CMAB). In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family. We propose to answer the above question for these two families by analyzing variants of the Combinatorial Thompson Sampling policy (CTS). For mutually independent outcomes in $[0,1]$, we propose a tight analysis of CTS using Beta priors. We then look at the more general setting of multivariate sub-Gaussian outcomes and propose a tight analysis of CTS using Gaussian priors. This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB), which, although optimal, is not computationally efficient.

artificial intelligence, big data, log 2, (19 more...)

arXiv.org Machine Learning

2006.06613

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

Jonsson, Anders, Kaufmann, Emilie, Ménard, Pierre, Domingues, Omar Darwiche, Leurent, Edouard, Valko, Michal

arXiv.org Machine LearningJun-10-2020

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.

algorithm, artificial intelligence, planning & scheduling, (15 more...)

arXiv.org Machine Learning

2006.05879

Country: North America > Canada > Ontario (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback

No-Regret Exploration in Goal-Oriented Reinforcement Learning

Tarbouriech, Jean, Garcelon, Evrard, Valko, Michal, Pirotta, Matteo, Lazaric, Alessandro

arXiv.org Machine LearningDec-7-2019

Many popular reinforcement learning problems (e.g., navigation in a maze, some Atari games, mountain car) are instances of the so-called episodic setting or stochastic shortest path (SSP) problem, where an agent has to achieve a predefined goal state (e.g., the top of the hill) while maximizing the cumulative reward or minimizing the cumulative cost. Despite its popularity, most of the literature studying the exploration-exploitation dilemma either focused on different problems (i.e., fixed-horizon and infinite-horizon) or made the restrictive loop-free assumption (which implies that no same state can be visited twice during any episode). In this paper, we study the general SSP setting and introduce the algorithm UC-SSP whose regret scales as $\displaystyle \widetilde{O}(c_{\max}^{3/2} c_{\min}^{-1/2} D S \sqrt{ A D K})$ after $K$ episodes for any unknown SSP with $S$ non-terminal states, $A$ actions, an SSP-diameter of $D$ and positive costs in $[c_{\min}, c_{\max}]$. UC-SSP is thus the first learning algorithm with vanishing regret in the theoretically challenging setting of episodic RL.

artificial intelligence, computer game, null, (18 more...)

arXiv.org Machine Learning

1912.03517

Country: Europe (0.28)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Fixed-Confidence Guarantees for Bayesian Best-Arm Identification

Shang, Xuedong, de Heide, Rianne, Kaufmann, Emilie, Ménard, Pierre, Valko, Michal

arXiv.org Machine LearningOct-28-2019

In particular, we justify its use for fixed-confidence best-arm identification . We further propose a variant of TTTS called Top-Two Transportation Cost ( T3C), which disposes of the computational burden of TTTS . As our main contribution, we provide the first sample complexity analysis of TTTS and T3C when coupled with a very natural Bayesian stopping rule, for bandits with Gaussian rewards, solving one of the open questions raised by Russo (2016). We also provide new posterior convergence results for TTTS under two models that are commonly used in practice: bandits with Gaussian and Bernoulli rewards and conjugate priors. 1 Introduction In multi-armed bandits, a learner repeatedly chooses an arm to play, and receives a reward from the associated unknown probability distribution. When the task is best-arm identification (BAI), the learner is not only asked to sample an arm at each stage, but is also asked to output a recommendation (i.e., a guess for the arm with the largest mean reward) after a certain period.

artificial intelligence, big data, null, (20 more...)

arXiv.org Machine Learning

1910.10945

Genre:

Workflow (0.46)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Add feedback

Derivative-Free & Order-Robust Optimisation

Gabillon, Victor, Tutunov, Rasul, Valko, Michal, Ammar, Haitham Bou

arXiv.org Machine LearningOct-22-2019

In this paper, we formalise order-robust optimisation as an instance of online learning minimising simple regret, and propose VROOM, a zero'th order optimisation algorithm capable of achieving vanishing regret in non-stationary environments, while recovering favorable rates under stochastic reward-generating processes. Our results are the first to target simple regret definitions in adversarial scenarios unveiling a challenge that has been rarely considered in prior work.

educational setting, null, optimization problem, (21 more...)

arXiv.org Machine Learning

1910.04034

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Multiagent Evaluation under Incomplete Information

Rowland, Mark, Omidshafiei, Shayegan, Tuyls, Karl, Perolat, Julien, Valko, Michal, Piliouras, Georgios, Munos, Remi

arXiv.org Artificial IntelligenceSep-21-2019

This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents. Traditionally, researchers have relied on Elo ratings for this purpose, with recent works also using methods based on Nash equilibria. Unfortunately, Elo is unable to handle intransitive agent interactions, and other techniques are restricted to zero-sum, two-player settings or are limited by the fact that the Nash equilibrium is intractable to compute. Recently, a ranking method called {\alpha}-Rank, relying on a new graph-based game-theoretic solution concept, was shown to tractably apply to general games. However, evaluations based on Elo or {\alpha}-Rank typically assume noise-free game outcomes, despite the data often being collected from noisy simulations, making this assumption unrealistic in practice. This paper investigates multiagent evaluation in the incomplete information regime, involving general-sum many-player games with noisy outcomes. We derive sample complexity guarantees required to confidently rank agents in this setting. We propose adaptive algorithms for accurate ranking, provide correctness and sample complexity guarantees, then introduce a means of connecting uncertainties in noisy match outcomes to uncertainties in rankings. We evaluate the performance of these approaches in several domains, including Bernoulli games, a soccer meta-game, and Kuhn poker.

game theory, soccer, strategy profile, (19 more...)

arXiv.org Artificial Intelligence

1909.09849

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment > Games > Chess (0.48)
Leisure & Entertainment > Sports > Soccer (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Exact sampling of determinantal point processes with sublinear time preprocessing

Dereziński, Michał, Calandriello, Daniele, Valko, Michal

arXiv.org Machine LearningMay-31-2019

We study the complexity of sampling from a distribution over all index subsets of the set $\{1,...,n\}$ with the probability of a subset $S$ proportional to the determinant of the submatrix $\mathbf{L}_S$ of some $n\times n$ p.s.d. matrix $\mathbf{L}$, where $\mathbf{L}_S$ corresponds to the entries of $\mathbf{L}$ indexed by $S$. Known as a determinantal point process, this distribution is used in machine learning to induce diversity in subset selection. In practice, we often wish to sample multiple subsets $S$ with small expected size $k = E[|S|] \ll n$ from a very large matrix $\mathbf{L}$, so it is important to minimize the preprocessing cost of the procedure (performed once) as well as the sampling cost (performed repeatedly). For this purpose, we propose a new algorithm which, given access to $\mathbf{L}$, samples exactly from a determinantal point process while satisfying the following two properties: (1) its preprocessing cost is $n \cdot \text{poly}(k)$, i.e., sublinear in the size of $\mathbf{L}$, and (2) its sampling cost is $\text{poly}(k)$, i.e., independent of the size of $\mathbf{L}$. Prior to our results, state-of-the-art exact samplers required $O(n^3)$ preprocessing time and sampling time linear in $n$ or dependent on the spectral properties of $\mathbf{L}$. We also give a reduction which allows using our algorithm for exact sampling from cardinality constrained determinantal point processes with $n\cdot\text{poly}(k)$ time preprocessing.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1905.13476

Country:

North America > United States > New York (0.14)
North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)
North America > United States > Arizona (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

Calandriello, Daniele, Carratino, Luigi, Lazaric, Alessandro, Valko, Michal, Rosasco, Lorenzo

arXiv.org Machine LearningMar-13-2019

Gaussian processes (GP) are a popular Bayesian approach for the optimization of black-box functions. Despite their effectiveness in simple problems, GP-based algorithms hardly scale to complex high-dimensional functions, as their per-iteration time and space cost is at least quadratic in the number of dimensions $d$ and iterations $t$. Given a set of $A$ alternative to choose from, the overall runtime $O(t^3A)$ quickly becomes prohibitive. In this paper, we introduce BKB (budgeted kernelized bandit), a novel approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-optimal convergence rate) with near-constant per-iteration complexity and no assumption on the input space or covariance of the GP. Combining a kernelized linear bandit algorithm (GP-UCB) with randomized matrix sketching technique (i.e., leverage score sampling), we prove that selecting inducing points based on their posterior variance gives an accurate low-rank approximation of the GP, preserving variance estimates and confidence intervals. As a consequence, BKB does not suffer from variance starvation, an important problem faced by many previous sparse GP approximations. Moreover, we show that our procedure selects at most $\tilde{O}(d_{eff})$ points, where $d_{eff}$ is the effective dimension of the explored space, which is typically much smaller than both $d$ and $t$. This greatly reduces the dimensionality of the problem, thus leading to a $O(TAd_{eff}^2)$ runtime and $O(A d_{eff})$ space complexity.

bayesian inference, big data, optimization problem, (20 more...)

arXiv.org Machine Learning

1903.05594

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > France > Hauts-de-France (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Exploiting Structure of Uncertainty for Efficient Combinatorial Semi-Bandits

Perrault, Pierre, Perchet, Vianney, Valko, Michal

arXiv.org Machine LearningFeb-11-2019

We improve the efficiency of algorithms for stochastic \emph{combinatorial semi-bandits}. In most interesting problems, state-of-the-art algorithms take advantage of structural properties of rewards, such as \emph{independence}. However, while being minimax optimal in terms of regret, these algorithms are intractable. In our paper, we first reduce their implementation to a specific \emph{submodular maximization}. Then, in case of \emph{matroid} constraints, we design adapted approximation routines, thereby providing the first efficient algorithms that exploit the reward structure. In particular, we improve the state-of-the-art efficient gap-free regret bound by a factor $\sqrt{k}$, where $k$ is the maximum action size. Finally, we show how our improvement translates to more general \emph{budgeted combinatorial semi-bandits}.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1902.03794

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback