Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning