Mathematical & Statistical Methods
Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach
Atkeson, Christopher G., Morimoto, Jun
A longstanding goal of reinforcement learning is to develop nonparametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of dimensionality. We have developed a trajectory-based approach, in which policies and value functions are represented nonparametrically along trajectories. These trajectories, policies, and value functions are updated as the value function becomes more accurate or as a model of the task is updated. We have applied this approach to periodic tasks such as hopping and walking, which required handling discount factors and discontinuities in the task dynamics, and using function approximation to represent value functions at discontinuities. We also describe extensions of the approach to make the policies more robust to modeling error and sensor noise.
Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach
Atkeson, Christopher G., Morimoto, Jun
A longstanding goal of reinforcement learning is to develop nonparametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of dimensionality. We have developed a trajectory-based approach, in which policies and value functions are represented nonparametrically along trajectories. These trajectories, policies, and value functions are updated as the value function becomes more accurate or as a model of the task is updated. We have applied this approach to periodic tasks such as hopping and walking, which required handling discount factors and discontinuities in the task dynamics, and using function approximation to represent value functions at discontinuities. We also describe extensions of the approach to make the policies more robust to modeling error and sensor noise.
On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems
Our al exploits the special structure of the sum of squared error measure in Equation (1); hence, the other objective functions are outside the scope of this paper. The gradient vector and Hessian matrix are given by g g(9) JT rand H H(9) JT J S, where J is the m x n Jacobian matrix of r, and S denotes the matrix of second-derivative terms. If S is simply omitted based on the "small residual" assumption, then the Hessian matrix reduces to the Gauss-Newton model Hessian: i.e., JT J. Furthermore, a family of quasi-Newton methods can be applied to approximate term S alone, leading to the augmented Gauss-Newton model Hessian (see, for example, Mizutani [2] and references therein).
On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems
Our al exploits the special structure of the sum of squared error measure in Equation (1); hence, the other objective functions are outside the scope of this paper. The gradient vector and Hessian matrix are given by g g(9) JT rand H H(9) JT J S, where J is the m x n Jacobian matrix of r, and S denotes the matrix of second-derivative terms. If S is simply omitted based on the "small residual" assumption, then the Hessian matrix reduces to the Gauss-Newton model Hessian: i.e., JT J. Furthermore, a family of quasi-Newton methods can be applied to approximate term S alone, leading to the augmented Gauss-Newton model Hessian (see, for example, Mizutani [2] and references therein).
Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles
Wainwright, Martin J., Sudderth, Erik B., Willsky, Alan S.
We present the embedded trees algorithm, an iterative technique for estimation of Gaussian processes defined on arbitrary graphs. By exactly solving a series of modified problems on embedded spanning trees,it computes the conditional means with an efficiency comparable to or better than other techniques. Unlike other methods, theembedded trees algorithm also computes exact error covariances. Theerror covariance computation is most efficient for graphs in which removing a small number of edges reveals an embedded tree.In this context, we demonstrate that sparse loopy graphs can provide a significant increase in modeling power relative totrees, with only a minor increase in estimation complexity. 1 Introduction Graphical models are an invaluable tool for defining and manipulating probability distributions. In modeling stochastic processes with graphical models, two basic problems arise: (i) specifying a class of graphs with which to model or approximate the process; and (ii) determining efficient techniques for statistical inference.
Neural Control for Nonlinear Dynamic Systems
Yu, Ssu-Hsin, Annaswamy, Anuradha M.
A neural network based approach is presented for controlling two distinct types of nonlinear systems. The first corresponds to nonlinear systems with parametric uncertainties where the parameters occur nonlinearly. The second corresponds to systems for which stabilizing control structures cannot be determined. The proposed neural controllers are shown to result in closed-loop system stability under certain conditions.
Neural Control for Nonlinear Dynamic Systems
Yu, Ssu-Hsin, Annaswamy, Anuradha M.
A neural network based approach is presented for controlling two distinct types of nonlinear systems. The first corresponds to nonlinear systems with parametric uncertainties where the parameters occur nonlinearly. The second corresponds to systems for which stabilizing control structures cannot be determined. The proposed neural controllers are shown to result in closed-loop system stability under certain conditions.
Neural Control for Nonlinear Dynamic Systems
Yu, Ssu-Hsin, Annaswamy, Anuradha M.
A neural network based approach is presented for controlling two distinct types of nonlinear systems. The first corresponds to nonlinear systems with parametric uncertainties where the parameters occur nonlinearly. The second corresponds to systems for which stabilizing control structures cannotbe determined. The proposed neural controllers are shown to result in closed-loop system stability under certain conditions.