Mortal Multi-Armed Bandits

Neural Information Processing Systems

We formulate and study a new variant of the $k$-armed bandit problem, motivated by e-commerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard $k$-armed bandit model in which arms are available indefinitely and exploration is reduced once an optimal arm is identified with near-certainty. The main motivation for our setting is online-advertising, where ads have limited lifetime due to, for example, the nature of their content and their campaign budget. An algorithm needs to choose among a large collection of ads, more than can be fully explored within the ads' lifetime. We present an optimal algorithm for the state-aware (deterministic reward function) case, and build on this technique to obtain an algorithm for the state-oblivious (stochastic reward function) case. Empirical studies on various reward distributions, including one derived from a real-world ad serving application, show that the proposed algorithms significantly outperform the standard multi-armed bandit approaches applied to these settings.

An Abstraction Technique for the Verification of Multi-Agent Systems Against ATL Specifications

AAAI Conferences

We introduce an abstraction methodology for the verification ofmulti-agent systems against specifications expressed in alternating-timetemporal logic (ATL). Inspired by methodologies such as predicateabstraction, we define a three-valued semantics for the interpretationof ATL formulas on concurrent game structures and compare it to thestandard two-valued semantics. We define abstract models and establishpreservation results on the three-valued semantics between abstractmodels and their concrete counterparts. We illustrate the methodologyon the large state spaces resulting from a card game.

Beyond the Hazard Rate: More Perturbation Algorithms for Adversarial Multi-armed Bandits Machine Learning

Recent work on follow the perturbed leader (FTPL) algorithms for the adversarial multi-armed bandit problem has highlighted the role of the hazard rate of the distribution generating the perturbations. Assuming that the hazard rate is bounded, it is possible to provide regret analyses for a variety of FTPL algorithms for the multi-armed bandit problem. This paper pushes the inquiry into regret bounds for FTPL algorithms beyond the bounded hazard rate condition. There are good reasons to do so: natural distributions such as the uniform and Gaussian violate the condition. We give regret bounds for both bounded support and unbounded support distributions without assuming the hazard rate condition. We also disprove a conjecture that the Gaussian distribution cannot lead to a low-regret algorithm. In fact, it turns out that it leads to near optimal regret, up to logarithmic factors. A key ingredient in our approach is the introduction of a new notion called the generalized hazard rate.

How to Increase Email Course Open Rate with Machine Learning


I think we can all agree on the fact that split testing is an effective method to find out what works best and get more out of your existing traffic. It's extremely useful since it can be applied to a number of different things: subject lines and content of emails, landing pages, home pages, creatives for ads and the list goes on. Also, you can find articles, case studies on split testing for almost anything, except drip campaigns. It's because experimenting with automated emails takes a lot of time and preparation. It is nearly impossible and too technical with most marketing automation tools.

Bounded regret in stochastic multi-armed bandits Machine Learning

We study the stochastic multi-armed bandit problem when one knows the value $\mu^{(\star)}$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $\Delta$. We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows $\Delta$, and bounded regret of order $1/\Delta$ is not possible if one only knows $\mu^{(\star)}$