AITopics

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Greensmith, Evan, Bartlett, Peter L., Baxter, Jonathan

Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

We consider the use of two additive control variate methods to reduce the variance of performance gradient estimates in reinforcement learning problems. The first approach we consider is the baseline method, in which a function of the current state is added to the discounted value estimate. We relate the performance of these methods, which use sample paths, to the variance of estimates based on iid data. We derive the baseline function that minimizes this variance, and we show that the variance for any baseline is the sum of the optimal variance and a weighted squared distance to the optimal baseline. We show that the widely used average discounted value baseline (where the reward is replaced by the difference between the reward and its expectation) is suboptimal.

baseline, value function, variance, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

Reinforcement Learning with Long Short-Term Memory

Bakker, Bram

This paper presents reinforcement learning with a Long Short Term Memory recurrent neural network: RL-LSTM. Model-free RL-LSTM using Advantage(,x) learning and directed exploration can solve non-Markovian tasks with long-term dependencies between relevant events. This is demonstrated in a T-maze task, as well as in a difficult variation of the pole balancing task. 1 Introduction Reinforcement learning (RL) is a way of learning how to behave based on delayed reward signals [12]. Among the more important challenges for RL are tasks where part of the state of the environment is hidden from the agent. Such tasks are called non-Markovian tasks or Partially Observable Markov Decision Processes. Many real world tasks have this problem of hidden state. For instance, in a navigation task different positions in the environment may look the same, but one and the same action may lead to different next states or rewards. Thus, hidden state makes RL more realistic.

agent, dependency, information, (16 more...)

Country:

Europe > Netherlands > South Holland > Leiden (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Face Recognition Using Kernel Methods

Yang, Ming-Hsuan

Principal Component Analysis and Fisher Linear Discriminant methods have demonstrated their success in face detection, recognition, and tracking. The representation in these subspace methods is based on second order statistics of the image set, and does not address higher order statistical dependencies such as the relationships among three or more pixels. Recently Higher Order Statistics and Independent Component Analysis (ICA) have been used as informative low dimensional representations for visual recognition. In this paper, we investigate the use of Kernel Principal Component Analysis and Kernel Fisher Linear Discriminant for learning low dimensional representations for face recognition, which we call Kernel Eigenface and Kernel Fisherface methods. While Eigenface and Fisherface methods aim to find projection directions based on the second order correlation of samples, Kernel Eigenface and Kernel Fisherface methods provide generalizations which take higher order correlations into account.

face recognition, fisherface method, recognition, (13 more...)

Country:

North America > United States > California > Santa Clara County > Mountain View (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.45)

Zimmermann, Hans-Georg, Neuneier, Ralph, Grothmann, Ralph

Active Portfolio-Management based on Error Correction Neural Networks

This paper deals with a neural network architecture which establishes a portfolio management system similar to the Black / Litterman approach. This allocation scheme distributes funds across various securities or financial markets while simultaneously complying with specific allocation constraints which meet the requirements of an investor. The portfolio optimization algorithm is modeled by a feedforward neural network. The underlying expected return forecasts are based on error correction neural networks (ECNN), which utilize the last model error as an auxiliary input to evaluate their own misspecification. The portfolio optimization is implemented such that (i.) the allocations comply with investor's constraints and that (ii.) the risk of the portfolio can be controlled.

constraint, excess return, portfolio, (12 more...)

Country:

North America > United States (0.04)
Europe > Spain (0.04)
Europe > Italy (0.04)
(3 more...)

Industry: Banking & Finance > Trading (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Warmuth, Manfred K. K., Rätsch, Gunnar, Mathieson, Michael, Liao, Jun, Lemmen, Christian

Active Learning in the Drug Discovery Process

We investigate the following data mining problem from Computational Chemistry: From a large data set of compounds, find those that bind to a target molecule in as few iterations of biological testing as possible. In each iteration a comparatively small batch of compounds is screened for binding to the target. We apply active learning techniques for selecting the successive batches. One selection strategy picks unlabeled examples closest to the maximum margin hyperplane. Another produces many weight vectors by running perceptrons over multiple permutations of the data.

fraction, hyperplane, version space, (16 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.31)

A Bayesian Network for Real-Time Musical Accompaniment

Raphael, Christopher

We describe a computer system that provides a real-time musical accompaniment for a live soloist in a piece of non-improvised music for soloist and accompaniment. A Bayesian network is developed that represents the joint distribution on the times at which the solo and accompaniment notes are played, relating the two parts through a layer of hidden variables. The network is first constructed using the rhythmic information contained in the musical score. The network is then trained to capture the musical interpretations of the soloist and accompanist in an off-line rehearsal phase. During live accompaniment the learned distribution of the network is combined with a real-time analysis of the soloist's acoustic signal, performed with a hidden Markov model, to generate a musically principled accompaniment that respects all available sources of knowledge. A live demonstration will be provided.

accompaniment, interpretation, soloist, (13 more...)

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Ontrup, Jorg, Ritter, Helge

Hyperbolic Self-Organizing Maps for Semantic Navigation

We introduce a new type of Self-Organizing Map (SOM) to navigate in the Semantic Space of large text collections. We propose a "hyperbolic SOM" (HSOM) based on a regular tesselation of the hyperbolic plane, which is a non-euclidean space characterized by constant negative gaussian curvature. The exponentially increasing size of a neighborhood around a point in hyperbolic space provides more freedom to map the complex information space arising from language into spatial relations. We describe experiments, showing that the HSOM can successfully be applied to text categorization tasks and yields results comparable to other state-of-the-art methods.

category, hsom, node, (14 more...)

Country:

North America > United States > New York (0.05)
North America > Canada > Ontario > Toronto (0.05)
Europe > Germany > Saxony > Leipzig (0.04)
Africa > Middle East > Egypt (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.37)

Lawrence, Neil D., Rowstron, Antony I. T., Bishop, Christopher M., Taylor, Michael J.

Optimising Synchronisation Times for Mobile Devices

Many applications rely on the device maintaining a replica of a data-structure which is stored on a server, for example news databases, calendars and email.

objective function, staleness, synchronisation, (15 more...)

Country: Europe > United Kingdom (0.04)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Cemgil, Ali Taylan, Kappen, Bert

Tempo tracking and rhythm quantization by sequential Monte Carlo

We present a probabilistic generative model for timing deviations in expressive music. The structure of the proposed model is equivalent to a switching state space model. We formulate two well known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering and maximum a posteriori (MAP) state estimation tasks. The inferences are carried out using sequential Monte Carlo integration (particle filtering) techniques. For this purpose, we have derived a novel Viterbi algorithm for Rao-Blackwellized particle filters, where a subset of the hidden variables is integrated out.

algorithm, particle, quantization, (15 more...)