AITopics | cmab

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.62)

Neural Information Processing SystemsDec-24-2025, 07:35:18 GMT

Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms

In this paper, we study the combinatorial semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound, where $K$ is the total number of arms that can be pulled or triggered in each round. First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we discover a novel (directional) triggering probability and variance modulated (TPVM) condition that can replace the previously-used smoothness condition for various applications, such as cascading bandits, online network exploration and online influence maximization. Under this new condition, we propose a BCUCB-T algorithm with variance-aware confidence intervals and conduct regret analysis which reduces the $O(K)$ factor to $O(\log K)$ or $O(\log^2 K)$ in the regret bound, significantly improving the regret bounds for the above applications. Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition and completely removes the dependency on $K$ in the leading regret. As a valuable by-product, the regret analysis used in this paper can improve several existing results by a factor of $O(\log K)$. Finally, experimental evaluations show our superior performance compared with benchmark algorithms in different applications.

batch-size independent regret bound, combinatorial semi-bandit, probabilistically triggered arm, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.88)

arXiv.org Artificial IntelligenceOct-13-2025

Regret Bounds for Adversarial Contextual Bandits with General Function Approximation and Delayed Feedback

Levy, Orin, Erez, Liad, Cohen, Alon, Mansour, Yishay

We present regret minimization algorithms for the contextual multi-armed bandit (CMAB) problem over $K$ actions in the presence of delayed feedback, a scenario where loss observations arrive with delays chosen by an adversary. As a preliminary result, assuming direct access to a finite policy class $Π$ we establish an optimal expected regret bound of $ O (\sqrt{KT \log |Π|} + \sqrt{D \log |Π|)} $ where $D$ is the sum of delays. For our main contribution, we study the general function approximation setting over a (possibly infinite) contextual loss function class $ \mathcal{F} $ with access to an online least-square regression oracle $\mathcal{O}$ over $\mathcal{F}$. In this setting, we achieve an expected regret bound of $O(\sqrt{KT\mathcal{R}_T(\mathcal{O})} + \sqrt{ d_{\max} D β})$ assuming FIFO order, where $d_{\max}$ is the maximal delay, $\mathcal{R}_T(\mathcal{O})$ is an upper bound on the oracle's regret and $β$ is a stability parameter associated with the oracle. We complement this general result by presenting a novel stability analysis of a Hedge-based version of Vovk's aggregating forecaster as an oracle implementation for least-square regression over a finite function class $\mathcal{F}$ and show that its stability parameter $β$ is bounded by $\log |\mathcal{F}|$, resulting in an expected regret bound of $O(\sqrt{KT \log |\mathcal{F}|} + \sqrt{d_{\max} D \log |\mathcal{F}|})$ which is a $\sqrt{d_{\max}}$ factor away from the lower bound of $Ω(\sqrt{KT \log |\mathcal{F}|} + \sqrt{D \log |\mathcal{F}|})$ that we also present.

artificial intelligence, machine learning, theorem 4, (16 more...)

2510.09127

Country:

Europe > Austria > Vienna (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.63)

Dyer, Joel, Bishop, Nicholas, Calinescu, Anisoara, Wooldridge, Michael, Zennaro, Fabio Massimo

Using causal abstractions to accelerate decision-making in complex bandit problems

arXiv.org Artificial IntelligenceSep-5-2025

Although real-world decision-making problems can often be encoded as causal multi-armed bandits (CMABs) at different levels of abstraction, a general methodology exploiting the information and computational advantages of each abstraction level is missing. In this paper, we propose AT-UCB, an algorithm which efficiently exploits shared information between CMAB problem instances defined at different levels of abstraction. More specifically, AT-UCB leverages causal abstraction (CA) theory to explore within a cheap-to-simulate and coarse-grained CMAB instance, before employing the traditional upper confidence bound (UCB) algorithm on a restricted set of potentially optimal actions in the CMAB of interest, leading to significant reductions in cumulative regret when compared to the classical UCB algorithm. We illustrate the advantages of AT-UCB theoretically, through a novel upper bound on the cumulative regret, and empirically, by applying AT-UCB to epidemiological simulators with varying resolution and computational cost.

artificial intelligence, data mining, machine learning, (19 more...)

2509.04296

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Data Science > Data Mining > Big Data (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Aggarwal, Vaneet, Jain, Shweta, Pokhriyal, Subham, Quinn, Christopher John

Bi-Criteria Optimization for Combinatorial Bandits: Sublinear Regret and Constraint Violation under Bandit Feedback

arXiv.org Machine LearningMar-15-2025

In this paper, we study bi-criteria optimization for combinatorial multi-armed bandits (CMAB) with bandit feedback. We propose a general framework that transforms discrete bi-criteria offline approximation algorithms into online algorithms with sublinear regret and cumulative constraint violation (CCV) guarantees. Our framework requires the offline algorithm to provide an $(\alpha, \beta)$-bi-criteria approximation ratio with $\delta$-resilience and utilize $\texttt{N}$ oracle calls to evaluate the objective and constraint functions. We prove that the proposed framework achieves sub-linear regret and CCV, with both bounds scaling as ${O}\left(\delta^{2/3} \texttt{N}^{1/3}T^{2/3}\log^{1/3}(T)\right)$. Crucially, the framework treats the offline algorithm with $\delta$-resilience as a black box, enabling flexible integration of existing approximation algorithms into the CMAB setting. To demonstrate its versatility, we apply our framework to several combinatorial problems, including submodular cover, submodular cost covering, and fair submodular maximization. These applications highlight the framework's broad utility in adapting offline guarantees to online bi-criteria optimization under bandit feedback.

algorithm, artificial intelligence, constraint-based reasoning, (14 more...)

arXiv.org Machine Learning

2503.12285

Country:

North America > United States > Iowa (0.04)
Asia > India (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.71)

Neural Information Processing SystemsJan-19-2025, 10:26:27 GMT

The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle

approximation oracle, combinatorial semi-bandit, oracle, (7 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Neural Information Processing SystemsOct-11-2024, 07:08:09 GMT

Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms

In this paper, we study the combinatorial semi-bandits (CMAB) and focus on reducing the dependency of the batch-size K in the regret bound, where K is the total number of arms that can be pulled or triggered in each round. First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we discover a novel (directional) triggering probability and variance modulated (TPVM) condition that can replace the previously-used smoothness condition for various applications, such as cascading bandits, online network exploration and online influence maximization. Under this new condition, we propose a BCUCB-T algorithm with variance-aware confidence intervals and conduct regret analysis which reduces the O(K) factor to O(\log K) or O(\log 2 K) in the regret bound, significantly improving the regret bounds for the above applications. Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition and completely removes the dependency on K in the leading regret. As a valuable by-product, the regret analysis used in this paper can improve several existing results by a factor of O(\log K) .

batch-size independent regret bound, combinatorial semi-bandit, probabilistically triggered arm, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Hoveyda, Mohanna, de Vries, Arjen P., de Rijke, Maarten, Oosterhuis, Harrie, Hasibi, Faegheh

AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit

arXiv.org Artificial IntelligenceSep-23-2024

In question answering (QA), different questions can be effectively addressed with different answering strategies. Some require a simple lookup, while others need complex, multi-step reasoning to be answered adequately. This observation motivates the development of a dynamic method that adaptively selects the most suitable QA strategy for each question, enabling more efficient and effective systems capable of addressing a broader range of question types. To this aim, we build on recent advances in the orchestration of multiple large language models (LLMs) and formulate adaptive QA as a dynamic orchestration challenge. We define this as a contextual multi-armed bandit problem, where the context is defined by the characteristics of the incoming question and the action space consists of potential communication graph configurations among the LLM agents. We then train a linear upper confidence bound model to learn an optimal mapping between different question types and their corresponding optimal multi-LLM communication graph representation. Our experiments show that the proposed solution is viable for adaptive orchestration of a QA system with multiple modules, as it combines the superior performance of more complex strategies while avoiding their costs when simpler strategies suffice.

agent, graph, orchestration, (15 more...)

2409.13447

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(7 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

J, Pavamana K, Singh, Chandramani Kishore

Recommenadation aided Caching using Combinatorial Multi-armed Bandits

arXiv.org Artificial IntelligenceMay-3-2024

We study content caching with recommendations in a wireless network where the users are connected through a base station equipped with a finite-capacity cache. We assume a fixed set of contents with unknown user preferences and content popularities. We can recommend a subset of the contents to the users which encourages the users to request these contents. Recommendation can thus be used to increase cache hits. We formulate the cache hit optimization problem as a combinatorial multi-armed bandit (CMAB). We propose a UCB-based algorithm to decide which contents to cache and recommend. We provide an upper bound on the regret of our algorithm. We numerically demonstrate the performance of our algorithm and compare it to state-of-the-art algorithms.

algorithm, rec, recommendation, (16 more...)

2405.0008

Country: Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.64)

Industry: Telecommunications (0.66)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.60)

Zennaro, Fabio Massimo, Bishop, Nicholas, Dyer, Joel, Felekis, Yorgos, Calinescu, Anisoara, Wooldridge, Michael, Damoulas, Theodoros

Causally Abstracted Multi-armed Bandits

arXiv.org Artificial IntelligenceApr-26-2024

Multi-armed bandits (MAB) and causal MABs (CMAB) are established frameworks for decision-making problems. The majority of prior work typically studies and solves individual MAB and CMAB in isolation for a given problem and associated data. However, decision-makers are often faced with multiple related problems and multi-scale observations where joint formulations are needed in order to efficiently exploit the problem structures and data dependencies. Transfer learning for CMABs addresses the situation where models are defined on identical variables, although causal connections may differ. In this work, we extend transfer learning to setups involving CMABs defined on potentially different variables, with varying degrees of granularity, and related via an abstraction map. Formally, we introduce the problem of causally abstracted MABs (CAMABs) by relying on the theory of causal abstraction in order to express a rigorous abstraction map. We propose algorithms to learn in a CAMAB, and study their regret. We illustrate the limitations and the strengths of our algorithms on a real-world scenario related to online advertising.

abstracted cmab, abstraction, cmab, (17 more...)

2404.17493

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.63)

Industry: Marketing (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)