Cold-Start Reinforcement Learning with Softmax Policy Gradient

Nan Ding, Radu Soricut

Neural Information Processing Systems 

Neural Information Processing Systems http://nips.cc/