Surprise Minimizing Multi-Agent Learning with Energy-based Models