Doubly-Robust Lasso Bandit
–Neural Information Processing Systems
While therewardcompensation mechanism isunknown,the learner can adapt his (her) decision to the past reward feedback so as to maximize the sum of rewards.
Neural Information Processing Systems
Feb-14-2026, 10:07:11 GMT
- Country:
- Technology: