Goto

Collaborating Authors

 Optimization



Interior Point Implementations of Alternating Minimization Training

Neural Information Processing Systems

AM techniques were first introduced in soft-competitive learning algorithms[l]. Thistraining procedure was later shown to be closely related to Expectation-Maximization algorithms used by the statistical estimation community[2].


AUTOCELL: An Intelligent Cellular Mobile Network Management System

AI Magazine

AUTOCELL is a system developed to assist in the operation and management of cellular mobile networks operated by Singapore Telecom. Its deployment is in line with the company's strategic move to introduce intelligent software into its operations. With the help of AI concepts and techniques, the system has enhanced the operational efficiency and network capacity and increased customer satisfaction with the network.


Provably Bounded-Optimal Agents

Journal of Artificial Intelligence Research

Since its inception, artificial intelligence has relied upon a theoretical foundation centered around perfect rationality as the desired property of intelligent systems. We argue, as others have done, that this foundation is inadequate because it imposes fundamentally unsatisfiable requirements. As a result, there has arisen a wide gap between theory and practice in AI, hindering progress in the field. We propose instead a property called bounded optimality. Roughly speaking, an agent is bounded-optimal if its program is a solution to the constrained optimization problem presented by its architecture and the task environment. We show how to construct agents with this property for a simple class of machine architectures in a broad class of real-time environments. We illustrate these results using a simple model of an automated mail sorting facility. We also define a weaker property, asymptotic bounded optimality (ABO), that generalizes the notion of optimality in classical complexity theory. We then construct universal ABO programs, i.e., programs that are ABO no matter what real-time constraints are applied. Universal ABO programs can be used as building blocks for more complex systems. We conclude with a discussion of the prospects for bounded optimality as a theoretical basis for AI, and relate it to similar trends in philosophy, economics, and game theory.


Transition Point Dynamic Programming

Neural Information Processing Systems

Transition point dynamic programming (TPDP) is a memorybased, reinforcement learning, direct dynamic programming approach to adaptive optimal control that can reduce the learning time and memory usage required for the control of continuous stochastic dynamic systems. TPDP does so by determining an ideal set of transition points (TPs) which specify only the control action changes necessary for optimal control. TPDP converges to an ideal TP set by using a variation of Q-Iearning to assess the merits of adding, swapping and removing TPs from states throughout the state space. When applied to a race track problem, TPDP learned the optimal control policy much sooner than conventional Q-Iearning, and was able to do so using less memory. 1 INTRODUCTION Dynamic programming (DP) approaches can be utilized to determine optimal control policies for continuous stochastic dynamic systems when the state spaces of those systems have been quantized with a resolution suitable for control (Barto et al., 1991). DP controllers, in lheir simplest form, are memory-based controllers that operate by repeatedly updating cost values associated with every state in the discretized state space (Barto et al., 1991).



The "Softmax" Nonlinearity: Derivation Using Statistical Mechanics and Useful Properties as a Multiterminal Analog Circuit Element

Neural Information Processing Systems

In this paper, we show a reciprocal implementation of the "softmax" nonlinearity that is usually used to enforce local competition between neurons [Peterson, 1989]. We show that the circuit is passive and incrementally passive, and we explicitly compute its content and co-content functions. This circuit adds a new element to the library of the analog circuit designer that can be combined with reciprocal constraint boxes [Harris, 1988] and nonlinear resistive fuses [Harris, 1989] to form fast, analog VLSI optimization networks.


Convergence of Stochastic Iterative Dynamic Programming Algorithms

Neural Information Processing Systems

Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DPbased learning algorithms to the powerful techniques of stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD("\) and Q-Iearning belong. 1 INTRODUCTION Learning to predict the future and to find an optimal way of controlling it are the basic goals of learning systems that interact with their environment. A variety of algorithms are currently being studied for the purposes of prediction and control in incompletely specified, stochastic environments. Here we consider learning algorithms defined in Markov environments. There are actions or controls (u) available for the learner that affect both the state transition probabilities, and the probability distribution for the immediate, state dependent costs (Ci(u)) incurred by the learner.


Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming

Neural Information Processing Systems

Dynamic programming provides a methodology to plan trajectories and design controllers and estimators for nonlinear systems. However, general dynamic programming is computationally intractable. We have developed procedures that allow more complex planning problems to be solved. We have modified the State Increment Dynamic Programming approach of Larson (1968) in several ways: 1. In State Increment DP, a constant action is integrated to form a trajectory segment from the center of a cell to its boundary. We use second order local trajectory optimization (Differential Dynamic Programming) to generate an optimal trajectory and form an optimal policy in a tube surrounding the optimal trajectory within a cell. The trajectory segment and local policy are globally optimal, up to the resolution of the representation of the value function on the boundary of the cell.


Transition Point Dynamic Programming

Neural Information Processing Systems

Transition point dynamic programming (TPDP) is a memorybased, reinforcement learning, direct dynamic programming approach to adaptive optimal control that can reduce the learning time and memory usage required for the control of continuous stochastic dynamic systems. TPDP does so by determining an ideal set of transition points (TPs) which specify only the control action changes necessary for optimal control. TPDP converges to an ideal TP set by using a variation of Q-Iearning to assess the merits of adding, swapping and removing TPs from states throughout the state space. When applied to a race track problem, TPDP learned the optimal control policy much sooner than conventional Q-Iearning, and was able to do so using less memory. 1 INTRODUCTION Dynamic programming (DP) approaches can be utilized to determine optimal control policies for continuous stochastic dynamic systems when the state spaces of those systems have been quantized with a resolution suitable for control (Barto et al., 1991). DP controllers, in lheir simplest form, are memory-based controllers that operate by repeatedly updating cost values associated with every state in the discretized state space (Barto et al., 1991).