Further Optimal Regret Bounds for Thompson Sampling

arXiv.org Machine Learning

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the state of the art methods. In this paper, we provide a novel regret analysis for Thompson Sampling that simultaneously proves both the optimal problem-dependent bound of $(1+\epsilon)\sum_i \frac{\ln T}{\Delta_i}+O(\frac{N}{\epsilon^2})$ and the first near-optimal problem-independent bound of $O(\sqrt{NT\ln T})$ on the expected regret of this algorithm. Our near-optimal problem-independent bound solves a COLT 2012 open problem of Chapelle and Li. The optimal problem-dependent regret bound for this problem was first proven recently by Kaufmann et al. [ALT 2012]. Our novel martingale-based analysis techniques are conceptually simple, easily extend to distributions other than the Beta distribution, and also extend to the more general contextual bandits setting [Manuscript, Agrawal and Goyal, 2012].


Novel Exploration Techniques (NETs) for Malaria Policy Interventions

AAAI Conferences

The task of decision-making under uncertainty is daunting, especially for problems which have significant complexity. Healthcare policy makers across the globe are facing problems under challenging constraints, with limited tools to help them make data driven decisions. In this work we frame the process of finding an optimal malaria policy as a stochastic multi-armed bandit problem, and implement three agent based strategies to explore the policy space. We apply a Gaussian Process regression to the findings of each agent, both for comparison and to account for stochastic results from simulating the spread of malaria in a fixed population. The generated policy spaces are compared with published results to give a direct reference with human expert decisions for the same simulated population. Our novel approach provides a powerful resource for policy makers, and a platform which can be readily extended to capture future more nuanced policy spaces.


Scientists Use AI To Turn Brain Signals Into Speech

#artificialintelligence

A recent research study could give a voice to those who no longer have one. Scientists used electrodes and artificial intelligence to create a device that can translate brain signals into speech. This technology could help restore the ability to speak in people with brain injuries or those with neurological disorders such as epilepsy, Alzheimer disease, multiple sclerosis, Parkinson's disease and more. The new system being developed in the laboratory of Edward Chang, MD shows that it is possible to create a synthesized version of a person's voice that can be controlled by the activity of their brain's speech centers. In the future, this approach could not only restore fluent communication to individuals with a severe speech disability, the authors say, but could also reproduce some of the musicality of the human voice that conveys the speaker's emotions and personality.


Researchers use AI, big data and machine learning to find best place in the world to live

#artificialintelligence

Researchers at analytics firm SAS claim to have created an artificial intelligence (AI) program that can rank the best places to live in the world using a range of publicly available data sources. Check out the latest findings on how the hype around artificial intelligence could be sowing damaging confusion. Also, read a number of case studies on how enterprises are using AI to help reach business goals around the world. You forgot to provide an Email Address. This email address doesn't appear to be valid.


Context Attentive Bandits: Contextual Bandit with Restricted Context

arXiv.org Machine Learning

We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every iteration. This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling. Herein, we adapt the standard multi-armed bandit algorithm known as Thompson Sampling to take advantage of our restricted context setting, and propose two novel algorithms, called the Thompson Sampling with Restricted Context(TSRC) and the Windows Thompson Sampling with Restricted Context(WTSRC), for handling stationary and nonstationary environments, respectively. Our empirical results demonstrate advantages of the proposed approaches on several real-life datasets