Sutton, R. S.
This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Using a navigation task, results are shown for a simple Dyna-PI system that simultaneously learns by trial and error, learns a world model, and plans optimal routes using the evolving world model. The theory of Dyna is based on the theory of DP (e.g., Ross, 1983) and on DP's relationship to reinforcement learning (Watkins, 1989-- Barto, Sutton & Watkins, 1989, 1990), to temporal-dierence learning (Sutton, 1988), and to AI methods for planning and search (Korf, 1990). The same algorithm is applied both to real experience (resulting in learning) and to hypothetical experience generated by theworld model (resulting in relaxation planning).
For each input key it conducts a search for the output pattern which optimizes an external payoff or reinforcement signal. The associative search network (ASN) combines pattern recognition and function optimization capabilities in a simple and effective way. We define the associative search problem, discuss conditions under which the associative search network is capable of solving it, and present results from computer simulations. The synthesis of sensory-motor control surfaces is discussed as an example of the associative search problem.