Doubly-Robust Lasso Bandit

Gi-Soo Kim, Myunghee Cho Paik

Neural Information Processing Systems 

While therewardcompensation mechanism isunknown,the learner can adapt his (her) decision to the past reward feedback so as to maximize the sum of rewards.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found