AITopics | cab problem

Collaborating Authors

cab problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FromFinitetoCountable-ArmedBandits

Neural Information Processing SystemsFeb-8-2026, 14:15:49 GMT

Inaddition, there is a fixed distribution over types which sets the proportion of each type in the population of arms. The decision maker is oblivious to the type of any arm and to the aforementioned distribution over types, but perfectly knows the total number of types occurring in the population of arms.

artificial intelligence, bandit, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.06)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

From Finite to Countable-Armed Bandits

Kalvit, Anand, Zeevi, Assaf

arXiv.org Machine LearningMay-22-2021

We consider a stochastic bandit problem with countably many arms that belong to a finite set of types, each characterized by a unique mean reward. In addition, there is a fixed distribution over types which sets the proportion of each type in the population of arms. The decision maker is oblivious to the type of any arm and to the aforementioned distribution over types, but perfectly knows the total number of types occurring in the population of arms. We propose a fully adaptive online learning algorithm that achieves O(log n) distribution-dependent expected cumulative regret after any number of plays n, and show that this order of regret is best possible. The analysis of our algorithm relies on newly discovered concentration and convergence properties of optimism-based policies like UCB in finite-armed bandit problems with "zero gap," which may be of independent interest.

algorithm, cab problem, denote, (15 more...)

arXiv.org Machine Learning

2105.10721

Country:

North America > United States > New York (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.35)
Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Exploring Offline Policy Evaluation for the Continuous-Armed Bandit Problem

Kruijswijk, Jules, Parvinen, Petri, Kaptein, Maurits

arXiv.org Machine LearningAug-21-2019

In the canonical multi-armed bandit (MAB) problem a gambler stands in front of a row of slot machines, each with a (potentially) different payoff. It is up to the gambler to decide in sequence which machine to play and, during the course of sequentially playing the machines, she aims to make as much profit as possible by simultaneously learning from the previous observations and using the gained knowledge to steer future actions (Berry and Fristedt, 1985; Whittle, 1980). The gambler needs to pick a strategy that dictates which arm to play next given the previous observations. The problem of finding such a strategy is complicated since at each interaction the gambler only observes the outcomes of the machine she played, and she will never know the outcomes of the other possible courses of action at that moment in time. This so-called omission of counterfactuals (Li, Chu, Langford, and Wang, 2011) - not being able to gain knowledge about all the possible outcomes - gives rise to the exploration versus exploitation tradeoff (Berry and Fristedt, 1985): at each time point an action can either be geared at gaining more knowledge regarding the machines she is uncertain about (exploration), or it can be geared at using the knowledge gained in earlier interactions by playing machines with a high expected payoff (exploitation).

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

1908.07808

Genre: Research Report > Experimental Study (0.46)

Technology: