Möller, Knut
Active Exploration in Dynamic Environments
Thrun, Sebastian B., Möller, Knut
Many real-valued connectionist approaches to learning control realize exploration by randomness inaction selection. This might be disadvantageous when costs are assigned to "negative experiences" . The basic idea presented in this paper is to make an agent explore unknown regions in a more directed manner. This is achieved by a so-called competence map, which is trained to predict the controller's accuracy, and is used for guiding exploration. Based on this, a bistable system enables smoothly switching attention between two behaviors - exploration and exploitation - depending on expected costsand knowledge gain. The appropriateness of this method is demonstrated by a simple robot navigation task.
Active Exploration in Dynamic Environments
Thrun, Sebastian B., Möller, Knut
Many real-valued connectionist approaches to learning control realize exploration by randomness in action selection. This might be disadvantageous when costs are assigned to "negative experiences". The basic idea presented in this paper is to make an agent explore unknown regions in a more directed manner. This is achieved by a so-called competence map, which is trained to predict the controller's accuracy, and is used for guiding exploration. Based on this, a bistable system enables smoothly switching attention between two behaviors - exploration and exploitation - depending on expected costs and knowledge gain. The appropriateness of this method is demonstrated by a simple robot navigation task.
Planning with an Adaptive World Model
Thrun, Sebastian, Möller, Knut, Linden, Alexander
We present a new connectionist planning method [TML90]. By interaction with an unknown environment, a world model is progressively constructed using gradient descent. For deriving optimal actions with respect to future reinforcement, planning is applied in two steps: an experience network proposes a plan which is subsequently optimized by gradient descent with a chain of world models, so that an optimal reinforcement may be obtained when it is actually run. The appropriateness of this method is demonstrated by a robotics application and a pole balancing task.
Planning with an Adaptive World Model
Thrun, Sebastian, Möller, Knut, Linden, Alexander
We present a new connectionist planning method [TML90]. By interaction with an unknown environment, a world model is progressively constructed usinggradient descent. For deriving optimal actions with respect to future reinforcement, planning is applied in two steps: an experience network proposesa plan which is subsequently optimized by gradient descent with a chain of world models, so that an optimal reinforcement may be obtained when it is actually run. The appropriateness of this method is demonstrated by a robotics application and a pole balancing task.