AITopics

Country: Europe > United Kingdom > England > Greater London > London (0.24)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Guestrin, Carlos, Koller, Daphne, Parr, Ronald

Multiagent Planning with Factored MDPs

We present a principled and efficient planning algorithm for cooperative multiagent dynamicsystems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture. We view the entire multiagentsystem as a single, large Markov decision process (MDP), which we assume can be represented in a factored way using a dynamic Bayesian network (DBN).The action space of the resulting MDP is the joint action space of the entire set of agents. Our approach is based on the use of factored linear value functions as an approximation to the joint value function. This factorization of the value function allows the agents to coordinate their actions at runtime using a natural message passing scheme. We provide a simple and efficient method for computing such an approximate value function by solving a single linear program, whosesize is determined by the interaction between the value function structure and the DBN. We thereby avoid the exponential blowup in the state and action space. We show that our approach compares favorably with approaches based on reward sharing. We also show that our algorithm is an efficient alternative tomore complicated algorithms even in the single agent case.

agent, artificial intelligence, machine learning, (18 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Grudic, Gregory Z., Ungar, Lyle H.

Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

We address two open theoretical questions in Policy Gradient Reinforcement Learning.The first concerns the efficacy of using function approximation torepresent the state action value function, .

machine learning, performance gradient, reinforcement learning, (15 more...)

Country: North America > United States > Colorado (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.62)

Even-dar, Eyal, Mansour, Yishay

Convergence of Optimistic and Incremental Q-Learning

The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an E optimal policy. The second is a new and novel algorithm incremental Q-learning,which gradually promotes the values of actions that are not taken. We show that incremental Q-learning converges, in the limit, to the optimal policy. Our incremental Q-learning algorithm canbe viewed as derandomization of the E-greedy Q-learning. 1 Introduction One of the challenges of Reinforcement Learning is learning in an unknown environment.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Country: Asia > Middle East > Israel (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Dietterich, Thomas G., Wang, Xin

Batch Value Function Approximation via Support Vectors

One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves. All formulations attemptto minimize the number of support vectors while fitting the data. Experiments in a difficult, synthetic maze problem show that all three formulations give excellent performance, but the advantage formulation is much easier to train. Unlike policy gradient methods,the kernel methods described here can easily'adjust the complexity of the function approximator to fit the complexity of the value function.

Chang, Yu-Han, Kaelbling, Leslie Pack

Playing is believing: The role of beliefs in multi-agent learning

What do we expect a successful learner to do?

artificial intelligence, machine learning, opponent, (15 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Reinforcement Learning with Long Short-Term Memory

Bakker, Bram

This paper presents reinforcement learning with a Long Short Term Memory recurrent neural network: RL-LSTM. Model-free RL-LSTM using Advantage(,x) learning and directed exploration can solve non-Markovian tasks with long-term dependencies between relevantevents. This is demonstrated in a T-maze task, as well as in a difficult variation of the pole balancing task. 1 Introduction Reinforcement learning (RL) is a way of learning how to behave based on delayed reward signals [12]. Among the more important challenges for RL are tasks where part of the state of the environment is hidden from the agent. Such tasks are called non-Markovian tasks or Partially Observable Markov Decision Processes. Many real world tasks have this problem of hidden state. For instance, in a navigation task different positions in the environment may look the same, but one and the same action may lead to different next states or rewards. Thus, hidden state makes RL more realistic.

artificial intelligence, information, machine learning, (18 more...)

Country: Europe > Netherlands (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Zimmermann, Hans-Georg, Neuneier, Ralph, Grothmann, Ralph

Active Portfolio-Management based on Error Correction Neural Networks

This paper deals with a neural network architecture which establishes a portfolio management system similar to the Black / Litterman approach. This allocation scheme distributes funds across various securities or financial marketswhile simultaneously complying with specific allocation constraints which meet the requirements of an investor. The portfolio optimization algorithm is modeled by a feedforward neural network. The underlying expected return forecasts are based on error correction neural networks (ECNN), which utilize the last model error as an auxiliary input to evaluate their own misspecification. The portfolio optimization is implemented such that (i.) the allocations comply with investor's constraints and that (ii.) the risk of the portfolio canbe controlled.

artificial intelligence, excess return, machine learning, (15 more...)

Country: Europe > Germany (0.14)

Industry: Banking & Finance > Trading (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Face Recognition Using Kernel Methods

Yang, Ming-Hsuan

Principal Component Analysis and Fisher Linear Discriminant methods have demonstrated their success in face detection, recognition, andtracking. The representation in these subspace methods is based on second order statistics of the image set, and does not address higher order statistical dependencies such as the relationships amongthree or more pixels. Recently Higher Order Statistics and Independent Component Analysis (ICA) have been used as informative lowdimensional representations for visual recognition. In this paper, we investigate the use of Kernel Principal Component Analysisand Kernel Fisher Linear Discriminant for learning low dimensional representations for face recognition, which we call Kernel Eigenface and Kernel Fisherface methods. While Eigenface and Fisherface methods aim to find projection directions based on the second order correlation of samples, Kernel Eigenface and Kernel Fisherfacemethods provide generalizations which take higher order correlations into account.

artificial intelligence, machine learning, recognition, (15 more...)

Country: North America > United States > California (0.28)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Warmuth, Manfred K., Rätsch, Gunnar, Mathieson, Michael, Liao, Jun, Lemmen, Christian

Active Learning in the Drug Discovery Process

We investigate the following data mining problem from Computational Chemistry: From a large data set of compounds, find those that bind to a target molecule in as few iterations of biological testing as possible. In each iteration a comparatively small batch of compounds is screened for binding to the target. We apply active learning techniques for selecting the successive batches. One selection strategy picks unlabeled examples closest to the maximum margin hyperplane. Another produces many weight vectors by running perceptrons over multiple permutations of the data.

artificial intelligence, fraction, machine learning, (18 more...)