AITopics | cmab problem

Collaborating Authors

cmab problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e0688d13958a19e087e123148555e4b4-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 13:36:32 GMT

algorithm, base arm, oracle, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle Fang Kong 1 Y ueran Y ang 1 Wei Chen 2 Shuai Li1

Neural Information Processing SystemsAug-18-2025, 02:08:42 GMT

Thompson sampling (TS) has attracted a lot of interest in the bandit area.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network

Kotoku, Shun, Mihana, Takatomo, Röhm, André, Horisaki, Ryoichi

arXiv.org Artificial IntelligenceJul-12-2024

Multi-agent reinforcement learning (MARL) studies crucial principles that are applicable to a variety of fields, including wireless networking and autonomous driving. We propose a photonic-based decision-making algorithm to address one of the most fundamental problems in MARL, called the competitive multi-armed bandit (CMAB) problem. Our numerical simulations demonstrate that chaotic oscillations and cluster synchronization of optically coupled lasers, along with our proposed decentralized coupling adjustment, efficiently balance exploration and exploitation while facilitating cooperative decision-making without explicitly sharing information among agents. Our study demonstrates how decentralized reinforcement learning can be achieved by exploiting complex physical processes controlled by simple algorithms.

ns 1, probability, synchronization, (16 more...)

arXiv.org Artificial Intelligence

2407.09124

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Active Inference for Autonomous Decision-Making with Contextual Multi-Armed Bandits

Wakayama, Shohei, Ahmed, Nisar

arXiv.org Artificial IntelligenceFeb-24-2023

In autonomous robotic decision-making under uncertainty, the tradeoff between exploitation and exploration of available options must be considered. If secondary information associated with options can be utilized, such decision-making problems can often be formulated as contextual multi-armed bandits (CMABs). In this study, we apply active inference, which has been actively studied in the field of neuroscience in recent years, as an alternative action selection strategy for CMABs. Unlike conventional action selection strategies, it is possible to rigorously evaluate the uncertainty of each option when calculating the expected free energy (EFE) associated with the decision agent's probabilistic model, as derived from the free-energy principle. We specifically address the case where a categorical observation likelihood function is used, such that EFE values are analytically intractable. We introduce new approximation methods for computing the EFE based on variational and Laplace approximations. Extensive simulation study results demonstrate that, compared to other strategies, active inference generally requires far fewer iterations to identify optimal options and generally achieves superior cumulative regret, for relatively low extra computational cost.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2209.09185

Country:

Asia > Japan > Honshū > Kansai > Wakayama Prefecture > Wakayama (0.05)
North America > United States > Colorado > Boulder County > Boulder (0.04)

Genre: Research Report > New Finding (0.54)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Bayesian Linear Bandits for Large-Scale Recommender Systems

Ghoorchian, Saeed, Maghsudi, Setareh

arXiv.org Machine LearningFeb-7-2022

--Potentially, taking advantage of available side information boosts the performance of recommender systems; nevertheless, with the rise of big data, the side information has often several dimensions. Hence, it is imperative to develop decision-making algorithms that can cope with such a high-dimensional context in real-time. That is especially challenging when the decision-maker has a variety of items to recommend. In this paper, we build upon the linear contextual multi-armed bandit framework to address this problem. We develop a decision-making policy for a linear bandit problem with high-dimensional context vectors and several arms. Our policy employs Thompson sampling and feeds it with reduced context vectors, where the dimensionality reduction follows by random projection. Our proposed recommender system follows this policy to learn online the item preferences of users while keeping its runtime as low as possible. We prove a regret bound that scales as a factor of the reduced dimension instead of the original one. For numerical evaluation, we use our algorithm to build a recommender system and apply it to real-world datasets. The theoretical and numerical results demonstrate the effectiveness of our proposed algorithm compared to the state-of-the-art in terms of computational complexity and regret performance. Over the past decade, recommender systems have flourished to improve the economy by guiding decision-makers in different roles such as service providers, consumers, and producers, toward cost-effective and time-saving actions while retaining the constraints such as safety, privacy, and quality-of-service satisfaction.

algorithm, context vector, recommender system, (15 more...)

arXiv.org Machine Learning

2202.03167

Country:

North America > United States (0.04)
Europe > Germany > Berlin (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle

Kong, Fang, Yang, Yueran, Chen, Wei, Li, Shuai

arXiv.org Machine LearningNov-8-2021

Thompson sampling (TS) has attracted a lot of interest in the bandit area. It was introduced in the 1930s but has not been theoretically proven until recent years. All of its analysis in the combinatorial multi-armed bandit (CMAB) setting requires an exact oracle to provide optimal solutions with any input. However, such an oracle is usually not feasible since many combinatorial optimization problems are NP-hard and only approximation oracles are available. An example (Wang and Chen, 2018) has shown the failure of TS to learn with an approximation oracle. However, this oracle is uncommon and is designed only for a specific problem instance. It is still an open question whether the convergence analysis of TS can be extended beyond the exact oracle in CMAB. In this paper, we study this question under the greedy oracle, which is a common (approximation) oracle with theoretical guarantees to solve many (offline) combinatorial optimization problems. We provide a problem-dependent regret lower bound of order $\Omega(\log T/\Delta^2)$ to quantify the hardness of TS to solve CMAB problems with greedy oracle, where $T$ is the time horizon and $\Delta$ is some reward gap. We also provide an almost matching regret upper bound. These are the first theoretical results for TS to solve CMAB with a common approximation oracle and break the misconception that TS cannot work with approximation oracles.

algorithm, base arm, oracle, (13 more...)

arXiv.org Machine Learning

2111.04295

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre:

Instructional Material (0.45)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Add feedback

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

Merlis, Nadav, Mannor, Shie

arXiv.org Machine LearningFeb-13-2020

The Combinatorial Multi-Armed Bandit problem is a sequential decision-making problem in which an agent selects a set of arms on each round, observes feedback for each of these arms and aims to maximize a known reward function of the arms it chose. While previous work proved regret upper bounds in this setting for general reward functions, only a few works provided matching lower bounds, all for specific reward functions. In this work, we prove regret lower bounds for combinatorial bandits that hold under mild assumptions for all smooth reward functions. We derive both problem-dependent and problem-independent bounds and show that the recently proposed Giniweighted smoothness parameter (Merlis and Mannor, 2019) also determines the lower bounds for monotone reward functions. Notably, this implies that our lower bounds are tight up to log-factors.

cmab problem, reward function, tight lower bound, (11 more...)

arXiv.org Machine Learning

2002.05392

Country:

Asia > Middle East > Israel (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adapting to Concept Drift in Credit Card Transaction Data Streams Using Contextual Bandits and Decision Trees

Soemers, Dennis J. N. J. (Vrije Universiteit Brussel) | Brys, Tim (Vrije Universiteit Brussel) | Driessens, Kurt (Maastricht University) | Winands, Mark H. M. (Maastricht University) | Nowé, Ann (Vrije Universiteit Brussel)

AAAI ConferencesFeb-8-2018

Credit card transactions predicted to be fraudulent by automated detection systems are typically handed over to human experts for verification. To limit costs, it is standard practice to select only the most suspicious transactions for investigation. We claim that a trade-off between exploration and exploitation is imperative to enable adaptation to changes in behavior (concept drift). Exploration consists of the selection and investigation of transactions with the purpose of improving predictive models, and exploitation consists of investigating transactions detected to be suspicious. Modeling the detection of fraudulent transactions as rewarding, we use an incremental Regression Tree learner to create clusters of transactions with similar expected rewards. This enables the use of a Contextual Multi-Armed Bandit (CMAB) algorithm to provide the exploration/exploitation trade-off. We introduce a novel variant of a CMAB algorithm that makes use of the structure of this tree, and use Semi-Supervised Learning to grow the tree using unlabeled data. The approach is evaluated on a real dataset and data generated by a simulator that adds concept drift by adapting the behavior of fraudsters to avoid detection. It outperforms frequently used offline models in terms of cumulative rewards, in particular in the presence of concept drift.

law enforcement, transaction, upstream oil & gas, (22 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
Europe > Slovenia (0.14)

Industry:

Banking & Finance > Credit (1.00)
Law Enforcement & Public Safety > Fraud (0.91)
Energy > Oil & Gas > Upstream (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms: A Case with Bounded Regret

Sarıtaç, A. Ömer, Tekin, Cem

arXiv.org Machine LearningJul-24-2017

In this paper, we study the combinatorial multi-armed bandit problem (CMAB) with probabilistically triggered arms (PTAs). Under the assumption that the arm triggering probabilities (ATPs) are positive for all arms, we prove that a class of upper confidence bound (UCB) policies, named Combinatorial UCB with exploration rate $\kappa$ (CUCB-$\kappa$), and Combinatorial Thompson Sampling (CTS), which estimates the expected states of the arms via Thompson sampling, achieve bounded regret. In addition, we prove that CUCB-$0$ and CTS incur $O(\sqrt{T})$ gap-independent regret. These results improve the results in previous works, which show $O(\log T)$ gap-dependent and $O(\sqrt{T\log T})$ gap-independent regrets, respectively, under no assumptions on the ATPs. Then, we numerically evaluate the performance of CUCB-$\kappa$ and CTS in a real-world movie recommendation problem, where the actions correspond to recommending a set of movies, the arms correspond to the edges between the movies and the users, and the goal is to maximize the total number of users that are attracted by at least one movie. Our numerical results complement our theoretical findings on bounded regret. Apart from this problem, our results also directly apply to the online influence maximization (OIM) problem studied in numerous prior works.

artificial intelligence, big data, data mining, (16 more...)

arXiv.org Machine Learning

1707.07443

Genre: Research Report > New Finding (0.66)

Industry:

Media > Film (0.48)
Leisure & Entertainment (0.48)
Information Technology > Services (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Combinatorial Multi-Armed Bandits with Filtered Feedback

Grant, James A., Leslie, David S., Glazebrook, Kevin, Szechtman, Roberto

arXiv.org Machine LearningMay-26-2017

Motivated by problems in search and detection we present a solution to a Combinatorial Multi-Armed Bandit (CMAB) problem with both heavy-tailed reward distributions and a new class of feedback, filtered semibandit feedback. In a CMAB problem an agent pulls a combination of arms from a set $\{1,...,k\}$ in each round, generating random outcomes from probability distributions associated with these arms and receiving an overall reward. Under semibandit feedback it is assumed that the random outcomes generated are all observed. Filtered semibandit feedback allows the outcomes that are observed to be sampled from a second distribution conditioned on the initial random outcomes. This feedback mechanism is valuable as it allows CMAB methods to be applied to sequential search and detection problems where combinatorial actions are made, but the true rewards (number of objects of interest appearing in the round) are not observed, rather a filtered reward (the number of objects the searcher successfully finds, which must by definition be less than the number that appear). We present an upper confidence bound type algorithm, Robust-F-CUCB, and associated regret bound of order $\mathcal{O}(\ln(n))$ to balance exploration and exploitation in the face of both filtering of reward and heavy tailed reward distributions.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

1705.09605

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning (0.89)
Information Technology > Data Science > Data Mining > Big Data (0.71)

Add feedback