On-Line Estimation of the Optimal Value Function: HJB- Estimators

Neural Information Processing Systems 

In this paper, we discuss on-line estimation strategies that model the optimal value function of a typical optimal control problem. We present a general strategy that uses local corridor solutions obtained via dynamic programming to provide local optimal con(cid:173) trol sequence training data for a neural architecture model of the optimal value function. In this paper, the problems of adaptive control using neural architectures are ex(cid:173) plored in the setting of general on-line estimators. 'Ve will try to pay close attention to the underlying mathematical structure that arises in the on-line estimation pro(cid:173) cess. The complete effect of a control action Uk at a given time step t/.; is clouded by the fact that the state history depends on the control actions taken after time step tk' So the effect of a control action over all future time must be monitored.