AITopics | base arm

Collaborating Authors

base arm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Combinatorial Multi-Armed Bandit with General Reward Functions

Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, Pinyan Lu

Neural Information Processing SystemsApr-21-2026, 15:27:42 GMT

In this paper, we study the stochastic combinatorial multi-armed bandit (CMAB) framework that allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables. Our framework enables a much larger class of reward functions such as the max() function and nonlinear utility functions. Existing techniques relying on accurate estimations of the means of random variables, such as the upper confidence bound (UCB) technique, do not work directly on these functions. We propose a new algorithm called stochastically dominant confidence bound (SDCB), which estimates the distributions of underlying random variables and their stochastically dominant confidence bounds. We prove that SDCB can achieve O(log T) distribution-dependent regret and O( T) distribution-independent regret, where T is the time horizon. We apply our results to the K-MAX problem and expected utility maximization problems. In particular, for K-MAX, we provide the first polynomial-time approximation scheme (PTAS) for its offline problem, and give the first O( T) bound on the (1)-approximation regret of its online problem, for any > 0.

artificial intelligence, data mining, machine learning, (21 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.86)
Information Technology > Game Theory (0.71)

Add feedback

Multi-User mmWave Beam and Rate Adaptation via Combinatorial Satisficing Bandits

Özyıldırım, Emre, Yaycı, Barış, Akturk, Umut Eren, Tekin, Cem

arXiv.org Machine LearningApr-17-2026

We study downlink beam and rate adaptation in a multi-user mmWave MISO system where multiple base stations (BSs), each using analog beamforming from finite codebooks, serve multiple single-antenna user equipments (UEs) with a unique beam per UE and discrete data transmission rates. BSs learn about transmission success based on ACK/NACK feedback. To encode service goals, we introduce a satisficing throughput threshold $τ_r$ and cast joint beam and rate adaptation as a combinatorial semi-bandit over beam-rate tuples. Within this framework, we propose SAT-CTS, a lightweight, threshold-aware policy that blends conservative confidence estimates with posterior sampling, steering learning toward meeting $τ_r$ rather than merely maximizing. Our main theoretical contribution provides the first finite-time regret bounds for combinatorial semi-bandits with satisficing objective: when $τ_r$ is realizable, we upper bound the cumulative satisficing regret to the target with a time-independent constant, and when $τ_r$ is non-realizable, we show that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which its regret is governed by the sum of the regret contributions of restarted CTS rounds, yielding an $O((\log T)^2)$ standard regret bound. On the practical side, we evaluate the performance via cumulative satisficing regret to $τ_r$ alongside standard regret and fairness. Experiments with time-varying sparse multipath channels show that SAT-CTS consistently reduces satisficing regret and maintains competitive standard regret, while achieving favorable average throughput and fairness across users, indicating that feedback-efficient learning can equitably allocate beams and rates to meet QoS targets without channel state knowledge.

artificial intelligence, assignment, machine learning, (19 more...)

arXiv.org Machine Learning

2604.14908

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)

Genre: Research Report (0.64)

Industry: Telecommunications (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.88)

Add feedback

e17184bcb70dcf3942c54e0b537ffc6d-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 13:57:22 GMT

algorithm, bandit, impact function, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Industry:

Education (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.96)
Banking & Finance (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

e0688d13958a19e087e123148555e4b4-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 13:36:32 GMT

algorithm, base arm, oracle, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

c92a10324374fac681719d63979d00fe-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 03:48:43 GMT

base arm, minw, super arm, (15 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Air (0.67)
Consumer Products & Services > Travel (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Networks (0.68)

Add feedback

Combinatorial Pure Exploration with Bottleneck Reward Function

Neural Information Processing SystemsFeb-11-2026, 03:48:38 GMT

In this paper, we study the Combinatorial Pure Exploration problem with the Bottleneck reward function (CPE-B) under the fixed-confidence (FC) and fixed-budget (FB) settings.

artificial intelligence, base arm, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.46)

Industry:

Transportation > Air (0.68)
Consumer Products & Services > Travel (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Batch-SizeIndependentRegretBoundsfor CombinatorialSemi-BanditswithProbabilistically TriggeredArmsorIndependentArms

Neural Information Processing SystemsFeb-9-2026, 08:44:21 GMT

As a valuable by-product, the regret analysis used in this paper can improve several existing results by a factor ofO(logK).

artificial intelligence, machine learning, mini, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > China > Beijing > Beijing (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Oracle-Efficient Combinatorial Semi-Bandits

Kim, Jung-hun, Vojnović, Milan, Oh, Min-hwan

arXiv.org Machine LearningOct-27-2025

We study the combinatorial semi-bandit problem where an agent selects a subset of base arms and receives individual feedback. While this generalizes the classical multi-armed bandit and has broad applicability, its scalability is limited by the high cost of combinatorial optimization, requiring oracle queries at every round. To tackle this, we propose oracle-efficient frameworks that significantly reduce oracle calls while maintaining tight regret guarantees. For the worst-case linear reward setting, our algorithms achieve $\tilde{O}(\sqrt{T})$ regret using only $O(\log\log T)$ oracle queries. We also propose covariance-adaptive algorithms that leverage noise structure for improved regret, and extend our approach to general (non-linear) rewards. Overall, our methods reduce oracle usage from linear to (doubly) logarithmic in time, with strong theoretical guarantees.

data mining, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2510.21431

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States (0.04)
Europe > United Kingdom (0.04)
Europe > France (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Data Science > Data Mining > Big Data (0.86)
Information Technology > Artificial Intelligence > Natural Language (0.70)

Add feedback

Continuous Mean-Covariance Bandits

Neural Information Processing SystemsOct-1-2025, 23:43:44 GMT

Specifically, in CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions. The agent's objective is to achieve the best trade-off between reward and risk, measured with option covariance.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.47)

Industry: Banking & Finance > Trading (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.31)

Add feedback

Multi-Play Combinatorial Semi-Bandit Problem

Nakamura, Shintaro, Kuroki, Yuko, Chen, Wei

arXiv.org Artificial IntelligenceSep-15-2025

In the combinatorial semi-bandit (CSB) problem, a player selects an action from a combinatorial action set and observes feedback from the base arms included in the action. While CSB is widely applicable to combinatorial optimization problems, its restriction to binary decision spaces excludes important cases involving non-negative integer flows or allocations, such as the optimal transport and knapsack problems.To overcome this limitation, we propose the multi-play combinatorial semi-bandit (MP-CSB), where a player can select a non-negative integer action and observe multiple feedbacks from a single arm in each round. We propose two algorithms for the MP-CSB. One is a Thompson-sampling-based algorithm that is computationally feasible even when the action space is exponentially large with respect to the number of arms, and attains $O(\log T)$ distribution-dependent regret in the stochastic regime, where $T$ is the time horizon. The other is a best-of-both-worlds algorithm, which achieves $O(\log T)$ variance-dependent regret in the stochastic regime and the worst-case $\tilde{\mathcal{O}}\left( \sqrt{T} \right)$ regret in the adversarial regime. Moreover, its regret in adversarial one is data-dependent, adapting to the cumulative loss of the optimal action, the total quadratic variation, and the path-length of the loss sequence. Finally, we numerically show that the proposed algorithms outperform existing methods in the CSB literature.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.09933

Country: