Goto

Collaborating Authors

 Optimization


Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming

Neural Information Processing Systems

Dynamic programming provides a methodology to plan trajectories and design controllers andestimators for nonlinear systems. However, general dynamic programming is computationally intractable. We have developed procedures that allow more complex planning problems to be solved. We have modified the State Increment Dynamic Programming approach of Larson (1968) in several ways: 1. In State Increment DP, a constant action is integrated to form a trajectory segment from the center of a cell to its boundary. We use second order local trajectory optimization (Differential Dynamic Programming) to generate an optimal trajectory and form an optimal policy in a tube surrounding the optimal trajectory within a cell. The trajectory segment and local policy are globally optimal, up to the resolution of the representation of the value function on the boundary of the cell.


Transition Point Dynamic Programming

Neural Information Processing Systems

Transition point dynamic programming (TPDP) is a memorybased, reinforcementlearning, direct dynamic programming approach toadaptive optimal control that can reduce the learning time and memory usage required for the control of continuous stochastic dynamic systems. TPDP does so by determining an ideal set of transition points (TPs) which specify only the control action changes necessary for optimal control. TPDP converges to an ideal TP set by using a variation of Q-Iearning to assess the merits ofadding, swapping and removing TPs from states throughout the state space. When applied to a race track problem, TPDP learned the optimal control policy much sooner than conventional Q-Iearning, and was able to do so using less memory. 1 INTRODUCTION Dynamic programming (DP) approaches can be utilized to determine optimal control policiesfor continuous stochastic dynamic systems when the state spaces of those systems have been quantized with a resolution suitable for control (Barto et al., 1991). DP controllers, in lheir simplest form, are memory-based controllers that operate by repeatedly updating cost values associated with every state in the discretized state space (Barto et al., 1991).



When will a Genetic Algorithm Outperform Hill Climbing

Neural Information Processing Systems

HoUand Dept. of Psychology University of Michigan Ann Arbor, MI 48109 StephanieForrest Dept. of Computer Science University of New Mexico Albuquerque, NM 87131 Abstract We analyze a simple hill-climbing algorithm (RMHC) that was previously shownto outperform a genetic algorithm (GA) on a simple "Royal Road" function. We then analyze an "idealized" genetic algorithm (IGA) that is significantly faster than RMHC and that gives a lower bound for GA speed. We identify the features of the IGA that give rise to this speedup, and discuss how these features can be incorporated into a real GA. 1 INTRODUCTION Our goal is to understand the class of problems for which genetic algorithms (GA) are most suited, and in particular, for which they will outperform other search algorithms. Several studies have empirically compared GAs with other search and optimization methods such as simple hill-climbing (e.g., Davis, 1991), simulated annealing (e.g., Ingber & Rosen, 1992), linear, nonlinear, and integer programming techniques, and other traditional optimization techniques (e.g., De Jong, 1975). However, such comparisons typically compare one version of the GA with a second algorithm on a single problem or set of problems, often using performance criteria which may not be appropriate.


Learning Spatio-Temporal Planning from a Dynamic Programming Teacher: Feed-Forward Neurocontrol for Moving Obstacle Avoidance

Neural Information Processing Systems

Within a simple test-bed, application of feed-forward neurocontrol for short-term planning of robot trajectories in a dynamic environment is studied. The action network is embedded in a sensorymotoric system architecture that contains a separate world model. It is continuously fed with short-term predicted spatiotemporal obstacle trajectories, and receives robot state feedback. The action net allows for external switching between alternative planning tasks. It generates goal-directed motor actions - subject to the robot's kinematic and dynamic constraints - such that collisions with moving obstacles are avoided.


Learning Spatio-Temporal Planning from a Dynamic Programming Teacher: Feed-Forward Neurocontrol for Moving Obstacle Avoidance

Neural Information Processing Systems

Within a simple test-bed, application of feed-forward neurocontrol for short-term planning of robot trajectories in a dynamic environment is studied. The action network is embedded in a sensorymotoric system architecture that contains a separate world model. It is continuously fed with short-term predicted spatiotemporal obstacle trajectories, and receives robot state feedback. The action net allows for external switching between alternative planning tasks. It generates goal-directed motor actions - subject to the robot's kinematic and dynamic constraints - such that collisions with moving obstacles are avoided.


Learning Spatio-Temporal Planning from a Dynamic Programming Teacher: Feed-Forward Neurocontrol for Moving Obstacle Avoidance

Neural Information Processing Systems

The action network is embedded in a sensorymotoric systemarchitecture that contains a separate world model. It is continuously fed with short-term predicted spatiotemporal obstacle trajectories, and receives robot state feedback. The action netallows for external switching between alternative planning tasks.It generates goal-directed motor actions - subject to the robot's kinematic and dynamic constraints - such that collisions withmoving obstacles are avoided. Using supervised learning, we distribute examples of the optimal planner mapping over a structure-level adapted parsimonious higher order network. The training database is generated by a Dynamic Programming algorithm. Extensivesimulations reveal, that the local planner mapping is highly nonlinear, but can be effectively and sparsely represented bythe chosen powerful net model. Excellent generalization occurs for unseen obstacle configurations. We also discuss the limitations offeed-forward neurocontrol for growing planning horizons.


Merging Constrained Optimisation with Deterministic Annealing to "Solve" Combinatorially Hard Problems

Neural Information Processing Systems

Several parallel analogue algorithms, based upon mean field theory (MFT) approximations to an underlying statistical mechanics formulation, and requiring an externally prescribed annealing schedule, now exist for finding approximate solutions to difficult combinatorial optimisation problems. They have been applied to the Travelling Salesman Problem (TSP), as well as to various issues in computational vision and cluster analysis. I show here that any given MFT algorithm can be combined in a natural way with notions from the areas of constrained optimisation and adaptive simulated annealing to yield a single homogenous and efficient parallel relaxation technique, for which an externally prescribed annealing schedule is no longer required. The results of numerical simulations on 50-city and 100-city TSP problems are presented, which show that the ensuing algorithms are typically an order of magnitude faster than the MFT algorithms alone, and which also show, on occasion, superior solutions as well. 1 INTRODUCTION Several promising parallel analogue algorithms, which can be loosely described by the term "deterministic annealing", or "mean field theory (MFT) annealing", have *also at Theoretical Division and Center for Nonlinear Studies, MSB213, Los Alamos National Laboratory, Los Alamos, NM 87545.


Segmentation Circuits Using Constrained Optimization

Neural Information Processing Systems

Analog hardware has obvious advantages in terms of its size, speed, cost, and power consumption. Analog chip designers, however, should not feel constrained to mapping existing digital algorithms to silicon. Many times, new algorithms must be adapted or invented to ensure efficient implementation in analog hardware. Novel analog algorithms embedded in the hardware must be simple and obey the natural constraints of physics. Much algorithm intuition can be gained from experimenting with these continuous-time nonlinear systems. For example, the algorithm described in this paper arose from experimentation with existing analog segmentation hardware. Surprisingly, many of these "analog" algorithms may prove useful even if a computer vision researcher is limited to simulating the analog hardware on a digital computer [7].


Constrained Optimization Applied to the Parameter Setting Problem for Analog Circuits

Neural Information Processing Systems

We use constrained optimization to select operating parameters for two circuits: a simple 3-transistor square root circuit, and an analog VLSI artificial cochlea. This automated method uses computer controlled measurement and test equipment to choose chip parameters which minimize the difference between the actual circuit's behavior and a specified goal behavior. Choosing the proper circuit parameters is important to compensate for manufacturing deviations or adjust circuit performance within a certain range. As biologically-motivated analog VLSI circuits become increasingly complex, implying more parameters, setting these parameters by hand will become more cumbersome. Thus an automated parameter setting method can be of great value [Fleischer 90].