Efficient Online Learning with Offline Datasets for Infinite Horizon MDPs: A Bayesian Approach