AITopics

Visualizing and structuring pairwise dissimilarity data are difficult combinatorial optimization problems known as multidimensional scaling or pairwise data clustering. Algorithms for embedding dissimilarity data set in a Euclidian space, for clustering these data and for actively selecting data to support the clustering process are discussed in the maximum entropy framework. Active data selection provides a strategy to discover structure in a data set efficiently with partially unknown data. 1 Introduction Grouping experimental data into compact clusters arises as a data analysis problem in psychology, linguistics, genetics and other experimental sciences. The data which are supposed to be clustered are either given by an explicit coordinate representation (central clustering) or, in the non-metric case, they are characterized by dissimilarity values for pairs of data points (pairwise clustering). In this paper we study algorithms (i) for embedding non-metric data in a D-dimensional Euclidian space, (ii) for simultaneous clustering and embedding of non-metric data, and (iii) for active data selection to determine a particular cluster structure with minimal number of data queries. All algorithms are derived from the maximum entropy principle (Hertz et al., 1991) which guarantees robust statistics (Tikochinsky et al., 1984).

algorithm, dissimilarity, multidimensional scaling, (15 more...)

Country:

North America > United States > New York (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Bengio, Yoshua, Frasconi, Paolo

An Input Output HMM Architecture

We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation. 1 INTRODUCTION Learning problems involving sequentially structured data cannot be effectively dealt with static models such as feedforward networks. Recurrent networks allow to model complex dynamical systems and can store and retrieve contextual information in a flexible way. Up until the present time, research efforts of supervised learning for recurrent networks have almost exclusively focused on error minimization by gradient descent methods. Although effective for learning short term memories, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals (Bengio et al., 1994; Mozer, 1992).

algorithm, architecture, sequence, (16 more...)

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

FINANCIAL APPLICATIONS OF LEARNING FROM HINTS

Abu-Mostafa, Yaser S.

In financial market applications, it is typical to have limited amount of relevant training data, with high noise levels in the data. The information content of such data is modest, and while the learning process can try to make the most of what it has, it cannot create new information on its own. This poses a fundamental limitation on the 412 Yaser S. Abu-Mostafa

application, symmetry hint, target function, (15 more...)

Country: North America > United States > California > Los Angeles County > Pasadena (0.04)

Industry: Banking & Finance > Trading (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Saul, Lawrence K., Jordan, Michael I.

Boltzmann Chains and Hidden Markov Models

Statistical models of discrete time series have a wide range of applications, most notably to problems in speech recognition (Juang & Rabiner, 1991) and molecular biology (Baldi, Chauvin, Hunkapiller, & McClure, 1992). A common problem in these fields is to find a probabilistic model, and a set of model parameters, that 436 Lawrence K. Saul, Michael I. Jordan

boltzmann, boltzmann chain, hmm, (13 more...)

Country:

Asia > Middle East > Jordan (0.26)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Crites, Robert H., Barto, Andrew G.

An Actor/Critic Algorithm that is Equivalent to Q-Learning

We prove the convergence of an actor/critic algorithm that is equivalent to Q-Iearning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor and critic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using criteria that depend on the relative probability of the action that was executed.

actor critic algorithm, algorithm, probability, (11 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Bradtke, Steven J., Duff, Michael O.

Reinforcement Learning Methods for Continuous-Time Markov Decision Problems

Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approximation. Among these are TD(,x), Q-Iearning, and Real-time Dynamic Programming. After reviewing semi-Markov Decision Problems and Bellman's optimality equation in that context, we propose algorithms similar to those named above, adapted to the solution of semi-Markov Decision Problems. We demonstrate these algorithms by applying them to the problem of determining the optimal control for a simple queueing system. We conclude with a discussion of circumstances under which these algorithms may be usefully applied.

algorithm, decision problem, reinforcement learning method, (14 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Instance-Based State Identification for Reinforcement Learning

McCallum, R. Andrew

This paper presents instance-based state identification, an approach to reinforcement learning and hidden state that builds disambiguating amounts of short-term memory online, and also learns with an order of magnitude fewer training steps than several previous approaches. Inspired by a key similarity between learning with hidden state and learning in continuous geometrical spaces, this approach uses instance-based (or "memory-based") learning, a method that has worked well in continuous spaces. 1 BACKGROUND AND RELATED WORK When a robot's next course of action depends on information that is hidden from the sensors because of problems such as occlusion, restricted range, bounded field of view and limited attention, the robot suffers from hidden state. More formally, we say a reinforcement learning agent suffers from the hidden state problem if the agent's state representation is non-Markovian with respect to actions and utility. The hidden state problem arises as a case of perceptual aliasing: the mapping between states of the world and sensations of the agent is not one-to-one [Whitehead, 1992]. If the agent's perceptual system produces the same outputs for two world states in which different actions are required, and if the agent's state representation consists only of its percepts, then the agent will fail to choose correct actions.

agent, instance-based state identification, neighbor, (11 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > New York > Monroe County > Rochester (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Singh, Satinder P., Jaakkola, Tommi, Jordan, Michael I.

Reinforcement Learning with Soft State Aggregation

It is widely accepted that the use of more compact representations than lookup tables is crucial to scaling reinforcement learning (RL) algorithms to real-world problems. Unfortunately almost all of the theory of reinforcement learning assumes lookup table representations. In this paper we address the pressing issue of combining function approximation and RL, and present 1) a function approximator based on a simple extension to state aggregation (a commonly used form of compact representation), namely soft state aggregation, 2) a theory of convergence for RL with arbitrary, but fixed, soft state aggregation, 3) a novel intuitive understanding of the effect of state aggregation on online RL, and 4) a new heuristic adaptive state aggregation algorithm that finds improved compact representations by exploiting the non-discrete nature of soft state aggregation. Preliminary empirical results are also presented.

algorithm, learning, state aggregation, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Asia > Middle East > Jordan (0.06)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Harmon, Mance E., III, Leemon C. Baird, Klopf, A. Harry

Advantage Updating Applied to a Differential Game

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual gradient form of advantage updating. The game is a Markov Decision Process (MDP) with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. The reinforcement learning algorithm for optimal control is modified for differential games in order to find the minimax point, rather than the maximum. Simulation results are compared to the optimal solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual gradient and non-residual gradient forms of advantage updating and Q-learning are compared. The results show that advantage updating converges faster than Q-learning in all simulations.

algorithm, equation, reinforcement, (14 more...)

Country:

North America > United States > Virginia > Alexandria County > Alexandria (0.05)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Industry: Government > Military > Air Force (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Jaakkola, Tommi, Singh, Satinder P., Jordan, Michael I.

Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems

Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov assumption is removed, however, neither generally the algorithms nor the analyses continue to be usable. We propose and analyze a new learning algorithm to solve a certain class of non-Markov decision problems. Our algorithm applies to problems in which the environment is Markov, but the learner has restricted access to state information. The algorithm involves a Monte-Carlo policy evaluation combined with a policy improvement method that is similar to that of Markov decision problems and is guaranteed to converge to a local maximum. The algorithm operates in the space of stochastic policies, a space which can yield a policy that performs considerably better than any deterministic policy. Although the space of stochastic policies is continuous-even for a discrete action space-our algorithm is computationally tractable.

algorithm, average reward, learner, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Jordan (0.07)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)