AITopics

Recent experimental data indicate that the strengthening or weakening of synaptic connections between neurons depends on the relative timing of pre-and postsynaptic action potentials. A Hebbian synaptic modification rule based on these data leads to a stable state in which the excitatory and inhibitory inputs to a neuron are balanced, producing an irregular pattern of firing. It has been proposed that neurons in vivo operate in such a mode.

action potential, firing mode, neuron, (13 more...)

Country:

North America > United States > New York (0.05)
Asia > Brunei (0.05)
North America > United States > Massachusetts > Middlesex County > Waltham (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Williams, John K., Singh, Satinder P.

Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes

Partially Observable Markov Decision Processes (pO "MOPs) constitute an important class of reinforcement learning problems which present unique theoretical and computational difficulties. In the absence of the Markov property, popular reinforcement learning algorithms such as Q-Iearning may no longer be effective, and memory-based methods which remove partial observability via state-estimation are notoriously expensive. An alternative approach is to seek a stochastic memoryless policy which for each observation of the environment prescribes a probability distribution over available actions that maximizes the average reward per timestep. A reinforcement learning algorithm which learns a locally optimal stochastic memoryless policy has been proposed by Jaakkola, Singh and Jordan, but not empirically verified. We present a variation of this algorithm, discuss its implementation, and demonstrate its viability using four test problems.

algorithm, learner, memoryless policy, (12 more...)

Country:

Asia > Middle East > Jordan (0.25)
North America > United States > New York (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Suematsu, Nobuo, Hayashi, Akira

A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory

We have proved that the model learned by BLHT converges to the optimal model in given hypothesis space, 1{, which provides the most accurate predictions of percepts and rewards, given short-term memory. We believe this fact provides a solid basis for BLHT, and BLHT can be compared favorably with other methods using short-term memory.

blht, history tree, short-term memory, (14 more...)

Country:

Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.05)
Asia > Middle East > Jordan (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Sato, Masa-aki, Ishii, Shin

Reinforcement Learning Based on On-Line EM Algorithm

On the other hand, applications to continuous state/action problems (Werbos, 1990; Doya, 1996; Sofge & White, 1992) are much more difficult than the finite state/action cases. Good function approximation methods and fast learning algorithms are crucial for successful applications. In this article, we propose a new RL method that has the above-mentioned two features. This method is based on an actor-critic architecture (Barto et al., 1983), although the detailed implementations of the actor and the critic are quite differ- Reinforcement Learning Based on On-Line EM Algorithm 1053 ent from those in the original actor-critic model. The actor and the critic in our method estimate a policy and a Q-function, respectively, and are approximated by Normalized Gaussian Networks (NGnet) (l'doody & Darken, 1989).

algorithm, pendulum, rl method, (14 more...)

Country:

Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)

Oyama, Eimei, Tachi, Susumu

Coordinate Transformation Learning of Hand Position Feedback Controller by Using Change of Position Error Norm

"goal-directed"; i.e., there is no direct way

controller, equation, feedback controller, (7 more...)

Country:

Asia > Middle East > Jordan (0.05)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Neuneier, Ralph, Mihatsch, Oliver

Risk Sensitive Reinforcement Learning

A directed generative model for binary data using a small number of hidden continuous units is investigated. The relationships between the correlations of the underlying continuous Gaussian variables and the binary output variables are utilized to learn the appropriate weights of the network. The advantages of this approach are illustrated on a translationally invariant binary distribution and on handwritten digit images. Introduction Principal Components Analysis (PCA) is a widely used statistical technique for representing data with a large number of variables [1]. It is based upon the assumption that although the data is embedded in a high dimensional vector space, most of the variability in the data is captured by a much lower climensional manifold. In particular for PCA, this manifold is described by a linear hyperplane whose characteristic directions are given by the eigenvectors of the correlation matrix with the largest eigenvalues. The success of PCA and closely related techniques such as Factor Analysis (FA) and PCA mixtures clearly indicate that much real world data exhibit the low dimensional manifold structure assumed by these models [2, 3]. However, the linear manifold structure of PCA is not appropriate for data with binary valued variables.

algorithm, eigenvalue, generative model, (11 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Industry: Banking & Finance > Trading (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Kearns, Michael J., Singh, Satinder P.

Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms

In this paper, we address two issues of longstanding interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning after only a finite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based (indirect) approaches, which use experience to estimate next-state distributions for off-line value iteration? We first show that both Q-learning and the indirect approach enjoy rather rapid convergence to the optimal policy as a function of the number of state transitions observed.

algorithm, state-action pair, transition, (12 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Hayashi, Akira, Suematsu, Nobuo

Viewing Classifier Systems as Model Free Learning in POMDPs

Classifier systems are now viewed disappointing because of their problems such as the rule strength vs rule set performance problem and the credit assignment problem. In order to solve the problems, we have developed a hybrid classifier system: GLS (Generalization Learning System). In designing GLS, we view CSs as model free learning in POMDPs and take a hybrid approach to finding the best generalization, given the total number of rules. GLS uses the policy improvement procedure by Jaakkola et al. for an locally optimal stochastic policy when a set of rule conditions is given. GLS uses GA to search for the best set of rule conditions. 1 INTRODUCTION Classifier systems (CSs) (Holland 1986) have been among the most used in reinforcement learning.

learning, pomdp, viewing classifier system, (13 more...)

Country:

North America > United States > Michigan (0.07)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.05)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.97)

Brown, Timothy X., Tong, Hui, Singh, Satinder P.

Optimizing Admission Control while Ensuring Quality of Service in Multimedia Networks via Reinforcement Learning

This paper examines the application of reinforcement learning to a telecommunications networking problem. The problem requires that revenue be maximized while simultaneously meeting a quality of service constraint that forbids entry into certain states. We present a general solution to this multi-criteria problem that is able to earn significantly higher revenues than alternatives.

constraint, qos, revenue, (11 more...)

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.34)

Industry: Telecommunications (0.67)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Al-Ansari, Mohammad A., Williams, Ronald J.

Robust, Efficient, Globally-Optimized Reinforcement Learning with the Parti-Game Algorithm

Parti-game (Moore 1994a; Moore 1994b; Moore and Atkeson 1995) is a reinforcement learning (RL) algorithm that has a lot of promise in overcoming the curse of dimensionality that can plague RL algorithms when applied to high-dimensional problems. In this paper we introduce modifications to the algorithm that further improve its performance and robustness. In addition, while parti-game solutions can be improved locally by standard local path-improvement techniques, we introduce an add-on algorithm in the same spirit as parti-game that instead tries to improve solutions in a non-local manner. 1 INTRODUCTION Parti-game operates on goal problems by dynamically partitioning the space into hyperrectangular cells of varying sizes, represented using a k-d tree data structure. It assumes the existence of a pre-specified local controller that can be commanded to proceed from the current state to a given state. The algorithm uses a game-theoretic approach to assign costs to cells based on past experiences using a minimax algorithm.

algorithm, boundary, partition, (14 more...)

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.05)
Asia > Middle East > Saudi Arabia > Riyadh Province > Riyadh (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)