AITopics

Parti-game (Moore 1994a; Moore 1994b; Moore and Atkeson 1995) is a reinforcement learning (RL) algorithm that has a lot of promise in overcoming thecurse of dimensionality that can plague RL algorithms when applied to high-dimensional problems. In this paper we introduce modifications tothe algorithm that further improve its performance and robustness. In addition, while parti-game solutions can be improved locally by standard local path-improvement techniques, we introduce an add-on algorithm in the same spirit as parti-game that instead tries to improve solutions in a non-local manner. 1 INTRODUCTION Parti-game operates on goal problems by dynamically partitioning the space into hyperrectangular cellsof varying sizes, represented using a k-d tree data structure. It assumes the existence of a pre-specified local controller that can be commanded to proceed from the current state to a given state. The algorithm uses a game-theoretic approach to assign costs to cells based on past experiences using a minimax algorithm.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country:

North America > United States (0.14)
Asia > Middle East > Saudi Arabia (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ikeda, Shiro, Amari, Shun-ichi, Nakahara, Hiroyuki

Convergence of the Wake-Sleep Algorithm

The WS (Wake-Sleep) algorithm is a simple learning rule for the models with hidden variables. It is shown that this algorithm can be applied to a factor analysis model which is a linear version of the Helmholtz machine. Buteven for a factor analysis model, the general convergence is not proved theoretically. In this article, we describe the geometrical understanding ofthe WS algorithm in contrast with the EM (Expectation Maximization) algorithm and the em algorithm. As the result, we prove the convergence of the WS algorithm for the factor analysis model. We also show the condition for the convergence in general models.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: Asia > Japan (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Williams, John K., Singh, Satinder P.

Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes

Partially Observable Markov Decision Processes (pO"MOPs) constitute an important class of reinforcement learning problems which present unique theoretical and computational difficulties. In the absence of the Markov property, popular reinforcement learning algorithms such as Q-Iearning may no longer be effective, and memory-based methods which remove partial observability via state-estimation are notoriously expensive. An alternative approach is to seek a stochastic memoryless policy which for each observation of the environment prescribes a probability distribution over available actions that maximizes the average reward per timestep. A reinforcement learning algorithm which learns a locally optimal stochastic memoryless policy has been proposed by Jaakkola, Singh and Jordan, but not empirically verified. We present a variation of this algorithm, discuss its implementation, and demonstrate its viability using four test problems.

algorithm, artificial intelligence, machine learning, (14 more...)

Country:

Asia > Middle East > Jordan (0.25)
North America > United States > Colorado > Boulder County > Boulder (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Sato, Masa-aki, Ishii, Shin

Reinforcement Learning Based on On-Line EM Algorithm

The actor and the critic are approximated by Normalized Gaussian Networks (NGnet), which are networks of local linear regression units. The NGnet is trained by the online EM algorithm proposed in our previous paper.We apply our RL method to the task of swinging-up and stabilizing a single pendulum and the task of balancing a double pendulumnear the upright position. The experimental results show that our RL method can be applied to optimal control problems havingcontinuous state/action spaces and that the method achieves good control with a small number of trial-and-errors. 1 INTRODUCTION Reinforcement learning (RL) methods (Barto et al., 1990) have been successfully applied to various Markov decision problems having finite state/action spaces, such as the backgammon game (Tesauro, 1992) and a complex task in a dynamic environment (Lin,1992). On the other hand, applications to continuous state/action problems (Werbos, 1990; Doya, 1996; Sofge & White, 1992) are much more difficult than the finite state/action cases. Good function approximation methods and fast learning algorithms are crucial for successful applications.

algorithm, artificial intelligence, machine learning, (16 more...)

Country: Asia > Japan (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Oyama, Eimei, Tachi, Susumu

Coordinate Transformation Learning of Hand Position Feedback Controller by Using Change of Position Error Norm

The Jacobian of the hand position vector is expressed as J(8) 8/(8)/88. Let Xd be the desired hand position and e Xd - X Xd - /(8) be the hand position error vector.

artificial intelligence, equation, machine learning, (9 more...)

Country: Asia > Japan > Honshū > Kantō (0.14)

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Hayashi, Akira, Suematsu, Nobuo

Viewing Classifier Systems as Model Free Learning in POMDPs

Classifier systems are now viewed disappointing because of their problems suchas the rule strength vs rule set performance problem and the credit assignment problem. In order to solve the problems, we have developed ahybrid classifier system: GLS (Generalization Learning System). In designing GLS, we view CSs as model free learning in POMDPs and take a hybrid approach to finding the best generalization, given the total number of rules. GLS uses the policy improvement procedure by Jaakkola et al. for an locally optimal stochastic policy when a set of rule conditions is given. GLS uses GA to search for the best set of rule conditions. 1 INTRODUCTION Classifier systems (CSs) (Holland 1986) have been among the most used in reinforcement learning.

artificial intelligence, expert system, machine learning, (16 more...)

Country: Asia > Japan (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.97)

Weinshall, Daphna, Jacobs, David W., Gdalyahu, Yoram

Classification in Non-Metric Spaces

A key question in vision is how to represent our knowledge of previously encountered objects to classify new ones. The answer depends on how we determine the similarity of two objects. Similarity tells us how relevant each previously seen object is in determining the category to which a new object belongs.

algorithm, artificial intelligence, machine learning, (17 more...)

Country:

North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.67)

Lee, Daniel D., Sompolinsky, Haim

Learning a Continuous Hidden Variable Model for Binary Data

A directed generative model for binary data using a small number of hidden continuous units is investigated. The relationships between the correlations of the underlying continuousGaussian variables and the binary output variables are utilized to learn the appropriate weights of the network. The advantages of this approach are illustrated on a translationally invariant binarydistribution and on handwritten digit images. Introduction Principal Components Analysis (PCA) is a widely used statistical technique for representing datawith a large number of variables [1]. It is based upon the assumption that although the data is embedded in a high dimensional vector space, most of the variability in the data is captured by a much lower climensional manifold.

artificial intelligence, eigenvalue, machine learning, (13 more...)

Country:

Asia > Middle East > Israel (0.15)
North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)

Gdalyahu, Yoram, Weinshall, Daphna, Werman, Michael

A Randomized Algorithm for Pairwise Clustering

We present a stochastic clustering algorithm based on pairwise similarity ofdatapoints. Our method extends existing deterministic methods, including agglomerative algorithms, min-cut graph algorithms, andconnected components. Thus it provides a common framework for all these methods. Our graph-based method differs from existing stochastic methods which are based on analogy to physical systems. The stochastic nature of our method makes it more robust against noise, including accidental edges and small spurious clusters. We demonstrate the superiority of our algorithm using an example with 3 spiraling bands and a lot of noise. 1 Introduction Clustering algorithms can be divided into two categories: those that require a vectorial representationof the data, and those which use only pairwise representation. In the former case, every data item must be represented as a vector in a real normed space, while in the second case only pairwise relations of similarity or dissimilarity areused.

algorithm, artificial intelligence, machine learning, (16 more...)

Country: Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

A Theory of Mean Field Approximation

Tanaka, Toshiyuki

I present a theory of mean field approximation based on information geometry. Thistheory includes in a consistent way the naive mean field approximation, as well as the TAP approach and the linear response theorem instatistical physics, giving clear information-theoretic interpretations to them. 1 INTRODUCTION Many problems of neural networks, such as learning and pattern recognition, can be cast into a framework of statistical estimation problem. How difficult it is to solve a particular problem depends on a statistical model one employs in solving the problem. For Boltzmann machines[ 1] for example, it is computationally very hard to evaluate expectations of state variables from the model parameters. Mean field approximation[2], which is originated in statistical physics, has been frequently used in practical situations in order to circumvent this difficulty.

approximation, machine learning, pattern recognition, (16 more...)

Country: Asia > Japan (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.54)