Backgammon is a two-player, perfect information game of skill and luck. Its large branching factor (number of different position the game pieces can be in after each turn) means that it can't be solved by simply reasoning through all possible moves, and the uncertainty of dice rolls means that probabilities and contingencies must be factored into strategies.
Gerald Tesauro published his paper in 1992 describing TD-Gammon as a neural network trained with reinforcement learning. There are two tracks, moving in opposite directions, and players take turns rolling dice to move their checkers from one end of their track to the other, called "home". TD-Gammon consists of a simple three-layer neural network trained using a reinforcement learning technique known as TD-Lambda or temporal-difference learning with a trace decay parameter lambda (?). Now when we backpropagate the end game state, we take into account the gradients from earlier states in the game while we avoid keeping a complete history of gradients.