Temporal difference learning and TD-Gammon