base arm
Multi-User mmWave Beam and Rate Adaptation via Combinatorial Satisficing Bandits
Özyıldırım, Emre, Yaycı, Barış, Akturk, Umut Eren, Tekin, Cem
We study downlink beam and rate adaptation in a multi-user mmWave MISO system where multiple base stations (BSs), each using analog beamforming from finite codebooks, serve multiple single-antenna user equipments (UEs) with a unique beam per UE and discrete data transmission rates. BSs learn about transmission success based on ACK/NACK feedback. To encode service goals, we introduce a satisficing throughput threshold $τ_r$ and cast joint beam and rate adaptation as a combinatorial semi-bandit over beam-rate tuples. Within this framework, we propose SAT-CTS, a lightweight, threshold-aware policy that blends conservative confidence estimates with posterior sampling, steering learning toward meeting $τ_r$ rather than merely maximizing. Our main theoretical contribution provides the first finite-time regret bounds for combinatorial semi-bandits with satisficing objective: when $τ_r$ is realizable, we upper bound the cumulative satisficing regret to the target with a time-independent constant, and when $τ_r$ is non-realizable, we show that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which its regret is governed by the sum of the regret contributions of restarted CTS rounds, yielding an $O((\log T)^2)$ standard regret bound. On the practical side, we evaluate the performance via cumulative satisficing regret to $τ_r$ alongside standard regret and fairness. Experiments with time-varying sparse multipath channels show that SAT-CTS consistently reduces satisficing regret and maintains competitive standard regret, while achieving favorable average throughput and fairness across users, indicating that feedback-efficient learning can equitably allocate beams and rates to meet QoS targets without channel state knowledge.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Education (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.96)
- Banking & Finance (0.93)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (2 more...)
- Transportation > Air (0.67)
- Consumer Products & Services > Travel (0.46)
- Transportation > Air (0.68)
- Consumer Products & Services > Travel (0.46)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Asia > China > Beijing > Beijing (0.04)
Combinatorial Multi-Armed Bandit with General Reward Functions
Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, Pinyan Lu
In this paper, unless otherwise specified, we use MAB to refer to stochastic MAB. MAB problem demonstrates the key tradeoff between exploration and exploitation: whether the player should stick to the choice that performs the best so far, or should try some less explored alternatives that may provide better rewards.
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Oracle-Efficient Combinatorial Semi-Bandits
Kim, Jung-hun, Vojnović, Milan, Oh, Min-hwan
We study the combinatorial semi-bandit problem where an agent selects a subset of base arms and receives individual feedback. While this generalizes the classical multi-armed bandit and has broad applicability, its scalability is limited by the high cost of combinatorial optimization, requiring oracle queries at every round. To tackle this, we propose oracle-efficient frameworks that significantly reduce oracle calls while maintaining tight regret guarantees. For the worst-case linear reward setting, our algorithms achieve $\tilde{O}(\sqrt{T})$ regret using only $O(\log\log T)$ oracle queries. We also propose covariance-adaptive algorithms that leverage noise structure for improved regret, and extend our approach to general (non-linear) rewards. Overall, our methods reduce oracle usage from linear to (doubly) logarithmic in time, with strong theoretical guarantees.
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States (0.04)
- Europe > United Kingdom (0.04)
- Europe > France (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
- Information Technology > Data Science > Data Mining > Big Data (0.86)
- Information Technology > Artificial Intelligence > Natural Language (0.70)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.31)