Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting