Exponential Family Model-Based Reinforcement Learning via Score Matching

Neural Information Processing Systems 

SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression.