Goto

Collaborating Authors

 grub



Maximizing and Satisficing in Multi-armed Bandits with Graph Information

Neural Information Processing Systems

Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision making and search under uncertainty. In modern applications however, one is often faced with a tremendously large number of options and even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to similarity relationships amongst the options that can be leveraged. In this paper, we consider the pure exploration problem in stochastic multi-armed bandits where the similarities between the arms is captured by a graph and the rewards may be represented as a smooth signal on this graph. In particular, we consider the problem of finding the arm with the maximum reward (i.e., the maximizing problem) or one that has sufficiently high reward (i.e., the satisficing problem) under this model. We propose novel algorithms GRUB (GRaph based UcB) and ζ-GRUB for these problems and provide theoretical characterization of their performance which specifically elicits the benefit of the graph side information. We also prove a lower bound on the data requirement that shows a large class of problems where these algorithms are near-optimal. We complement our theory with experimental results that show the benefit of capitalizing on such side information.



Appendix

Neural Information Processing Systems

The appendix is organized as follows. While this is where one uses domain expertise, this could be hard to estimate in certain real world problems. Our methods can be used for various applications such as drug discovery, advertising, and recommendation systems. Using a variant of Azuma's inequality [47, 51], for any κ > 0 the following inequality holds, P null |nulle I (.,G) indicates the minimum influence factor for arms. Using Lemma I.5, we have the following bound on [V A modified version of the Definition (4.3) of competitive set and non-competitive set is as follows: Definition E.1.


Pure Exploration in Multi-armed Bandits with Graph Side Information

arXiv.org Machine Learning

The multi-armed bandit has emerged as an important paradigm for modeling sequential decision making and learning under uncertainty with multiple practical applications such as design policies for sequential experiments [30], combinatorial online leaning tasks [6], collaborative learning on social media networks [21, 2], latency reduction in cloud systems [18] and many others [5, 41, 36]. In the traditional multi-armed bandit problem, the goal of the agent is to sequentially choose among a set of actions (or arms) to maximize a desired performance criterion (or reward). This objective demands a delicate tradeoff between exploration (of new arms) and exploitation (of promising arms). An important variation of the reward maximization problem is the identification of arms with the highest (or near-highest) expected reward. This best arm identification [28, 8] problem, which is one of pure exploration, has a wide range of important applications like identifying molecules and drugs to treat infectious diseases like COVID-19, finding relevant users to run targeted ad campaigns, hyperparameter optimization in neural networks and recommendation systems. The broad range of applications of this paradigm is unsurprising given its ability to essentially model any optimization problem of black-box functions on discrete (or discretizable) domains with noisy observations. While the bandit pure exploration problems harbor considerable promise, there is a significant catch. In modern applications, one is often faced with a tremendously large number of options (sometimes in the millions) that need to be considered rapidly before making a decision. Pulling each bandit arm even once could be intractable.


Crows figure out how to make their own tools from pieces of a syringe

Daily Mail - Science & tech

Clever crows can assemble tools from two or more components without any help, a feat previously seen only in humans and great apes. The birds were filmed slotting together rod pieces to create a tool long enough to extract a morsel of food which scientists had hidden away. In one experiment, they were presented with disassembled syringes, and created the right length of tool without any prompt or demonstration. The birds' ability to anticipate what an unseen object will be able to do matches the intelligence of a human toddler, Oxford University researchers said. The animals in the experiment were New Caledonian crows - a species native to a large Pacific island east of Australia of the same name.