Making RL tractable by learning more informative reward functions: example-based control, meta-learning, and normalized maximum likelihood