AITopics

Dynamic programming provides a methodology to plan trajectories and design controllers and estimators for nonlinear systems. However, general dynamic programming is computationally intractable. We have developed procedures that allow more complex planning problems to be solved. We have modified the State Increment Dynamic Programming approach of Larson (1968) in several ways: 1. In State Increment DP, a constant action is integrated to form a trajectory segment from the center of a cell to its boundary. We use second order local trajectory optimization (Differential Dynamic Programming) to generate an optimal trajectory and form an optimal policy in a tube surrounding the optimal trajectory within a cell. The trajectory segment and local policy are globally optimal, up to the resolution of the representation of the value function on the boundary of the cell.

optimal trajectory, trajectory, value function, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Singh, Satinder P., Barto, Andrew G., Grupen, Roderic, Connolly, Christopher

Robust Reinforcement Learning in Motion Planning

While exploring to find better solutions, an agent performing online reinforcement learning (RL) can perform worse than is acceptable. In some cases, exploration might have unsafe, or even catastrophic, results, often modeled in terms of reaching'failure' states of the agent's environment. This paper presents a method that uses domain knowledge to reduce the number of failures during exploration. This method formulates the set of actions from which the RL agent composes a control policy to ensure that exploration is conducted in a policy space that excludes most of the unacceptable policies. The resulting action set has a more abstract relationship to the task being solved than is common in many applications of RL. Although the cost of this added safety is that learning may result in a suboptimal solution, we argue that this is an appropriate tradeoff in many problems. We illustrate this method in the domain of motion planning. "'This work was done while the first author was finishing his Ph.D in computer science at the University of Massachusetts, Amherst.

robot, robust reinforcement learning, trajectory, (14 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.34)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Flake, Gary W., Sun, Guo-Zhen, Lee, Yee-Chun

Exploiting Chaos to Control the Future

Recently, Ott, Grebogi and Yorke (OGY) [6] found an effective method to control chaotic systems to unstable fixed points by using only small control forces; however, OGY's method is based on and limited to a linear theory and requires considerable knowledge of the dynamics of the system to be controlled. In this paper we use two radial basis function networks: one as a model of an unknown plant and the other as the controller. The controller is trained with a recurrent learning algorithm to minimize a novel objective function such that the controller can locate an unstable fixed point and drive the system into the fixed point with no a priori knowledge of the system dynamics. Our results indicate that the neural controller offers many advantages over OGY's technique.

algorithm, controller, exploiting chaos, (14 more...)

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Buckland, Kenneth M., Lawrence, Peter D.

Transition Point Dynamic Programming

Transition point dynamic programming (TPDP) is a memorybased, reinforcement learning, direct dynamic programming approach to adaptive optimal control that can reduce the learning time and memory usage required for the control of continuous stochastic dynamic systems. TPDP does so by determining an ideal set of transition points (TPs) which specify only the control action changes necessary for optimal control. TPDP converges to an ideal TP set by using a variation of Q-Iearning to assess the merits of adding, swapping and removing TPs from states throughout the state space. When applied to a race track problem, TPDP learned the optimal control policy much sooner than conventional Q-Iearning, and was able to do so using less memory. 1 INTRODUCTION Dynamic programming (DP) approaches can be utilized to determine optimal control policies for continuous stochastic dynamic systems when the state spaces of those systems have been quantized with a resolution suitable for control (Barto et al., 1991). DP controllers, in lheir simplest form, are memory-based controllers that operate by repeatedly updating cost values associated with every state in the discretized state space (Barto et al., 1991).

buckland, q-iearning, tpdp, (14 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

An Analog VLSI Model of Central Pattern Generation in the Leech

Siegel, Micah S.

The biological network is small and relatively well understood, and the silicon model can therefore span three levels of organization in the leech nervous system (neuron, ganglion, system); it represents one of the first comprehensive models of leech swimming operating in real-time. The circuit employs biophysically motivated analog neurons networked to form multiple biologically inspired silicon ganglia. These ganglia are coupled using known interganglionic connections. Thus the model retains the flavor of its biological counterpart, and though simplified, the output of the silicon circuit is similar to the output of the leech swim central pattern generator. The model operates on the same time-and spatial-scale as the leech nervous system and will provide an excellent platform with which to explore real-time adaptive locomotion in the leech and other "simple" invertebrate nervous systems.

ganglia, leech, silicon model, (12 more...)

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
(2 more...)

Industry: Semiconductors & Electronics (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.89)

Optimal Unsupervised Motor Learning Predicts the Internal Representation of Barn Owl Head Movements

Sanger, Terence D.

This implies the existence of a set of orthogonal internal coordinates that are related to meaningful coordinates of the external world. No coherent computational theory has yet been proposed to explain this finding. I have proposed a simple model which provides a framework for a theory of low-level motor learning. I show that the theory predicts the observed microstimulation results in the barn owl. The model rests on the concept of "Optimal U n supervised Motor Learning", which provides a set of criteria that predict optimal internal representations. I describe two iterative Neural Network algorithms which find the optimal solution and demonstrate possible mechanisms for the development of internal representations in animals. 1 INTRODUCTION In the sensory domain, many algorithms for unsupervised learning have been proposed. These algorithms learn depending on statistical properties of the input data, and often can be used to find useful "intermediate" sensory representations

internal representation, masino and knudsen 1990, representation, (9 more...)

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
North America > Canada > Ontario > Toronto (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)

Rosen, Daniel J., Rumelhart, David E., Knudsen, Eric I.

A Connectionist Model of the Owl's Sound Localization System

Sound localization by the barn owl (tyto alba) measured with the search coil technique.

knudsen, localization system, owl, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Santa Clara County > Stanford (0.05)
North America > United States > California > Los Angeles County > Pasadena (0.04)

Industry: Health & Medicine (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Montague, P. Read, Dayan, Peter, Sejnowski, Terrence J.

Foraging in an Uncertain Environment Using Predictive Hebbian Learning

Survival is enhanced by an ability to predict the availability of food, the likelihood of predators, and the presence of mates. We present a concrete model that uses diffuse neurotransmitter systems to implement a predictive version of a Hebb learning rule embedded in a neural architecture based on anatomical and physiological studies on bees. The model captured the strategies seen in the behavior of bees and a number of other animals when foraging in an uncertain environment. The predictive model suggests a unified way in which neuromodulatory influences can be used to bias actions and control synaptic plasticity. Successful predictions enhance adaptive behavior by allowing organisms to prepare for future actions, rewards, or punishments. Moreover, it is possible to improve upon behavioral choices if the consequences of executing different actions can be reliably predicted. Although classical and instrumental conditioning results from the psychological literature [1] demonstrate that the vertebrate brain is capable of reliable prediction, how these predictions are computed in brains is not yet known. The brains of vertebrates and invertebrates possess small nuclei which project axons throughout large expanses of target tissue and deliver various neurotransmitters such as dopamine, norepinephrine, and acetylcholine [4]. The activity in these systems may report on reinforcing stimuli in the world or may reflect an expectation of future reward [5, 6,7,8].

nectar, prediction, uncertain environment, (16 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.15)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(9 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Bayesian Modeling and Classification of Neural Signals

Lewicki, Michael S.

Signal processing and classification algorithms often have limited applicability resulting from an inaccurate model of the signal's underlying structure. We present here an efficient, Bayesian algorithm for modeling a signal composed of the superposition of brief, Poisson-distributed functions. This methodology is applied to the specific problem of modeling and classifying extracellular neural waveforms which are composed of a superposition of an unknown number of action potentials CAPs). Previous approaches have had limited success due largely to the problems of determining the spike shapes, deciding how many are shapes distinct, and decomposing overlapping APs. A Bayesian solution to each of these problems is obtained by inferring a probabilistic model of the waveform. This approach quantifies the uncertainty of the form and number of the inferred AP shapes and is used to obtain an efficient method for decomposing complex overlaps. This algorithm can extract many times more information than previous methods and facilitates the extracellular investigation of neuronal classes and of interactions within neuronal circuits.

bayesian modeling and classification, spike, spike model, (14 more...)

Country: North America > United States > California > Los Angeles County > Pasadena (0.04)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Horiuchi, Timothy K., Bishofberger, Brooks, Koch, Christof

An Analog VLSI Saccadic Eye Movement System

In an effort to understand saccadic eye movements and their relation to visual attention and other forms of eye movements, we - in collaboration with a number of other laboratories - are carrying out a large-scale effort to design and build a complete primate oculomotor system using analog CMOS VLSI technology. Using this technology, a low power, compact, multi-chip system has been built which works in real-time using real-world visual inputs. We describe in this paper the performance of an early version of such a system including a 1-D array of photoreceptors mimicking the retina, a circuit computing the mean location of activity representing the superior colliculus, a saccadic burst generator, and a one degree-of-freedom rotational platform which models the dynamic properties of the primate oculomotor plant. 1 Introduction When we look around our environment, we move our eyes to center and stabilize objects of interest onto our fovea. In order to achieve this, our eyes move in quick jumps with short pauses in between. These quick jumps (up to 750 deg/sec in humans) are known as saccades and are seen in both exploratory eye movements and as reflexive eye movements in response to sudden visual, auditory, or somatosensory stimuli. Since the intent of the saccade is to bring new objects of interest onto the fovea, it can be considered a primitive attentional mechanism.

motor error, photoreceptor array, saccade, (11 more...)

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Semiconductors & Electronics (0.64)

Technology: Information Technology > Artificial Intelligence > Vision (0.35)