Distributional Policy Evaluation: a Maximum Entropy approach to Representation Learning