Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models