Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization