Reinforcement Learning
Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning
We study risk-sensitive reinforcement learning (RL) based on the entropic risk measure. Although existing works have established non-asymptotic regret guarantees for this problem, they leave open an exponential gap between the upper and lower bounds. We identify the deficiencies in existing algorithms and their analysis that result in such a gap.
BCORLE(ฮป): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market Y ang Zhang
The online e-commerce environment is complicated and ever changing, so it requires the coupons allocation policy learning can quickly adapt to the changes of the company's business strategy. Unfortunately, existing studies with a huge computation overhead can hardly satisfy the requirements of real-time and fast-response in the real world.