Local Bandit Approximation for Optimal Learning Problems

Duff, Michael O., Barto, Andrew G.

Neural Information Processing Systems 

A Bayesian formulation of the problem leads to a clear concept of a solution whose computation, however, appears to entail an examination of an intractably-large number of hyperstates. This paper hassuggested extending the Gittins index approach (which applies with great power and elegance to the special class of multi-armed bandit processes) to general adaptive MDP's. The hope has been that if certain salient features of the value of information could be captured, even approximately, then one could be led to a reasonable method for avoiding certain defects of certainty-equivalence approaches (problems with identifiability, "metastability"). Obviously, positive evidence, in the form of empirical results from simulation experiments, would lend support to these ideas-work along these lines is underway. Local bandit approximation is but one approximate computational approach for problems of optimal learning and dual control. Most prominent in the literature of control theory is the "wide-sense" approach of [Bar-Shalom & Tse, 1976], which utilizes localquadratic approximations about nominal state/control trajectories. For certain problems, this method has demonstrated superior performance compared to a certainty-equivalence approach, but it is computationally very intensive and unwieldy, particularly for problems with controller dimension greater than one. One could revert to the view of the bandit problem, or general adaptive MDP, as simply a very large MDP defined over hyperstates, and then consider a some- Local Bandit Approximationfor Optimal Learning Problems 1025 what direct approach in which one performs approximate dynamic programming with function approximation over this domain-details of function-approximation, feature-selection, and "training" all become important design issues.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found