AITopics

Here we sudy the continuous time, continuous state-space stochastic case, which covers a wide variety of control problems including target, viability, optimization problems (see [FS93], [KP95])}or which a formalism is the following.

algorithm, equation, reinforcement learning, (11 more...)

Country:

Europe > France (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Monaco, Jeffrey F., Ward, David G., Barto, Andrew G.

Automated Aircraft Recovery via Reinforcement Learning: Initial Experiments

An emerging use of reinforcement learning (RL) is to approximate optimal policies for large-scale control problems through extensive simulated control experience. Described here are initial experiments directed toward the development of an automated recovery system (ARS) for high-agility aircraft. An ARS is an outer-loop flight control system designed to bring the aircraft from a range of initial states to straight, level, and non-inverted flight in minimum time while satisfying constraints such as maintaining altitude and accelerations within acceptable limits. Here we describe the problem and present initial results involving only single-axis (pitch) recoveries. Through extensive simulated control experience using a medium-fidelity simulation of an F-16, the RL system approximated an optimal policy for longitudinal-stick inputs to produce near-minimum-time transitions to straight and level flight in unconstrained cases, as well as while meeting a pilot-station acceleration constraint. 2 AIRCRAFT MODEL

aircraft, initial condition, rl system, (13 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Monaco (0.06)
(3 more...)

Industry:

Transportation > Air (0.38)
Aerospace & Defense (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

An Improved Policy Iteration Algorithm for Partially Observable MDPs

Hansen, Eric A.

A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more efficient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simplification is representation of a policy as a finite-state controller. This representation makes policy evaluation straightforward. The paper's contribution is to show that the dynamic-programming update used in the policy improvement step can be interpreted as the transformation of a finite-state controller into an improved finite-state controller. The new algorithm consistently outperforms value iteration as an approach to solving infinite-horizon problems.

finite-state controller, iteration, machine state, (12 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Asia > Middle East > Jordan (0.05)

Industry: Government > Regional Government > North America Government > United States Government (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Nonparametric Model-Based Reinforcement Learning

Atkeson, Christopher G.

This paper describes some of the interactions of model learning algorithms and planning algorithms we have found in exploring model-based reinforcement learning. The paper focuses on how local trajectory optimizers can be used effectively with learned nonparametric models. We find that trajectory planners that are fully consistent with the learned model often have difficulty finding reasonable plans in the early stages of learning. Trajectory planners that balance obeying the learned model with minimizing cost (or maximizing reward) often do better, even if the plan is not fully consistent with the learned model.

dynamic programming, trajectory, value function, (13 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.75)

Andre, David, Friedman, Nir, Parr, Ronald

Generalized Prioritized Sweeping

Prioritized sweeping is a model-based reinforcement learning method that attempts to focus an agent's limited computational resources to achieve a good estimate of the value of environment states. To choose effectively where to spend a costly planning step, classic prioritized sweeping uses a simple heuristic to focus computation on the states that are likely to have the largest errors. In this paper, we introduce generalized prioritized sweeping, a principled method for generating such estimates in a representation-specific manner. This allows us to extend prioritized sweeping beyond an explicit, state-based representation to deal with compact representations that are necessary for dealing with large state spaces. We apply this method for generalized model approximators (such as Bayesian networks), and describe preliminary experiments that compare our approach with classical prioritized sweeping.

procedure, representation, value function, (16 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.90)

Zimmermann, Hans-Georg, Neuneier, Ralph

The Observer-Observation Dilemma in Neuro-Forecasting

We explain how the training data can be separated into clean information and unexplainable noise. Analogous to the data, the neural network is separated into a time invariant structure used for forecasting, and a noisy part. We propose a unified theory connecting the optimization algorithms for cleaning and learning together with algorithms that control the data noise and the parameter noise. The combined algorithm allows a data-driven local control of the liability of the network parameters and therefore an improvement in generalization. The approach is proven to be very useful at the task of forecasting the German bond market.

algorithm, neural network, noise, (15 more...)

Country: Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.05)

Industry: Banking & Finance > Trading (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Tresp, Volker, Briegel, Thomas

A Solution for Missing Data in Recurrent Neural Networks with an Application to Blood Glucose Prediction

We consider neural network models for stochastic nonlinear dynamical systems where measurements of the variable of interest are only available at irregular intervals i.e. most realizations are missing. Difficulties arise since the solutions for prediction and maximum likelihood learning with missing data lead to complex integrals, which even for simple cases cannot be solved analytically. In this paper we propose a specific combination of a nonlinear recurrent neural predictive model and a linear error model which leads to tractable prediction and maximum likelihood adaptation rules. In particular, the recurrent neural network can be trained using the real-time recurrent learning rule and the linear error model can be trained by an EM adaptation rule, implemented using forward-backward Kalman filter equations. The model is applied to predict the glucose/insulin metabolism of a diabetic patient where blood glucose measurements are only available a few times a day at irregular intervals.

error model, linear error model, neural network, (13 more...)

Country: Europe > Germany (0.04)

Genre: Research Report > Promising Solution (0.40)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Spangler, Randall R., Goodman, Rodney M., Hawkins, Jim

Bach in a Box - Real-Time Harmony

These algorithms would take as input a melody such *rspangle@micro.caltech.edu,

algorithm, rule base, rulebase, (15 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom > England > Greater London > London > Hackney (0.04)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Song, Xubo B., Abu-Mostafa, Yaser S., Sill, Joseph, Kasdan, Harvey

Incorporating Contextual Information in White Blood Cell Identification

In this paper we propose a technique to incorporate contextual information into object classification. In the real world there are cases where the identity of an object is ambiguous due to the noise in the measurements based on which the classification should be made. It is helpful to reduce the ambiguity by utilizing extra information referred to as context, which in our case is the identities of the accompanying objects. This technique is applied to white blood cell classification. Comparisons are made against "no context" approach, which demonstrates the superior classification performance achieved by using context. In our particular application, it significantly reduces false alarm rate and thus greatly reduces the cost due to expensive clinical tests.

classification, specimen, standard deviation, (9 more...)

Country:

North America > United States > California > Los Angeles County > Pasadena (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Chatsworth (0.04)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.98)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Ryan, Jake, Lin, Meng-Jang, Miikkulainen, Risto

Intrusion Detection with Neural Networks

Intrusion detection schemes can be classified into two categories: misuse and anomaly intrusion detection. Misuse refers to known attacks that exploit the known vulnerabilities of the system. Anomaly means unusual activity in general that could indicate an intrusion.

detection, intrusion detection, vector, (14 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.16)
North America > United States > Colorado > Boulder County > Boulder (0.14)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)