RL without TD learning