Online Learning of Non-Markovian Reward Models