Maximum entropy exploration in contextual bandits with neural networks and energy based models