Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget
–Neural Information Processing Systems
After each decision to choose a particular arm, the learner receives some form of feedback - typically a numerical reward - determined by a feedback mechanism of the chosen arm.
Neural Information Processing Systems
Aug-16-2025, 12:40:52 GMT
- Country:
- Europe
- Germany
- Bavaria > Upper Bavaria
- Munich (0.04)
- North Rhine-Westphalia (0.04)
- Bavaria > Upper Bavaria
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Germany
- North America > United States (0.04)
- Europe
- Genre:
- Research Report (0.46)
- Technology: