Goto

Collaborating Authors

 Technology


Learning with Multiple Labels

Neural Information Processing Systems

In this paper, we study a special kind of learning problem in which each training instance is given a set of (or distribution over) candidate class labels and only one of the candidate labels is the correct one. Such a problem can occur, e.g., in an information retrieval setting where a set of words is associated with an image, or if classes labels are organized hierarchically. We propose a novel discriminative approach for handling the ambiguity of class labels in the training examples. The experiments with the proposed approach over five different UCI datasets show that our approach is able to find the correct label among the set of candidate labels and actually achieve performance close to the case when each training instance is given a single correct label. In contrast, naIve methods degrade rapidly as more ambiguity is introduced into the labels. 1 Introduction Supervised and unsupervised learning problems have been extensively studied in the machine learning literature. In supervised classification each training instance is associated with a single class label, while in unsupervised classification (i.e.


Hyperkernels

Neural Information Processing Systems

We consider the problem of choosing a kernel suitable for estimation using a Gaussian Process estimator or a Support Vector Machine. A novel solution is presented which involves defining a Reproducing Kernel HilbertSpace on the space of kernels itself. By utilizing an analog of the classical representer theorem, the problem of choosing a kernel from a parameterized family of kernels (e.g. of varying width) is reduced to a statistical estimation problem akin to the problem of minimizing a regularized risk functional. Various classical settings for model or kernel selection are special cases of our framework.



Learning in Zero-Sum Team Markov Games Using Factored Value Functions

Neural Information Processing Systems

We present a new method for learning good strategies in zero-sum Markov games in which each side is composed of multiple agents collaborating againstan opposing team of agents. Our method requires full observability and communication during learning, but the learned policies canbe executed in a distributed manner. The value function is represented asa factored linear architecture and its structure determines the necessary computational resources and communication bandwidth. This approach permits a tradeoff between simple representations with little or no communication between agents and complex, computationally intensive representationswith extensive coordination between agents. Thus, we provide a principled means of using approximation to combat the exponential blowup in the joint action space of the participants. The approach isdemonstrated with an example that shows the efficiency gains over naive enumeration.


Learning to Take Concurrent Actions

Neural Information Processing Systems

Learning to Take Concurrent ActionsKhashayar Rohanimanesh Department of Computer Science University of Massachusetts Amherst, MA 01003 khash@cs.umass.edu Abstract We investigate a general semi-Markov Decision Process (SMDP) framework for modeling concurrent decision making, where agents learn optimal plans over concurrent temporally extended actions. We introduce three types of parallel termination schemes - all, any and continue - and theoretically and experimentally compare them. 1 Introduction We investigate a general framework for modeling concurrent actions. The notion of concurrent action is formalized in a general way, to capture both situations where a single agent can execute multiple parallel processes, as well as the multi-agent case where many agents act in parallel. Concurrency clearly allows agents to achieve goals more quickly: in making breakfast, we interleave making toast and coffee with other activities such as getting milk; in driving, we search for road signs while controlling the wheel, accelerator and brakes.


Efficient Learning Equilibrium

Neural Information Processing Systems

We introduce efficient learning equilibrium (ELE), a normative approach tolearning in non cooperative settings. In ELE, the learning algorithms themselves are required to be in equilibrium. In addition, the learning algorithms arrive at a desired value after polynomial time, and deviations from a prescribed ELE become irrational afterpolynomial time. We prove the existence of an ELE in the perfect monitoring setting, where the desired value is the expected payoff in a Nash equilibrium. We also show that an ELE does not always exist in the imperfect monitoring case.


A Convergent Form of Approximate Policy Iteration

Neural Information Processing Systems

We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a "policy improvement operator" to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution fromany initial policy. To our knowledge, this is the first convergence resultfor any form of approximate policy iteration under similar computational-resource assumptions.


Approximate Linear Programming for Average-Cost Dynamic Programming

Neural Information Processing Systems

This paper extends our earlier analysis on approximate linear programming asan approach to approximating the cost-to-go function in a discounted-cost dynamic program [6]. In this paper, we consider the average-cost criterion and a version of approximate linear programming that generates approximations to the optimal average cost and differential cost function. We demonstrate that a naive version of approximate linear programming prioritizes approximation of the optimal average cost and that this may not be well-aligned with the objective of deriving a policy with low average cost. For that, the algorithm should aim at producing a good approximation of the differential cost function. We propose a twophase variantof approximate linear programming that allows for external control of the relative accuracy of the approximation of the differential cost function over different portions of the state space via state-relevance weights. Performance bounds suggest that the new algorithm is compatible withthe objective of optimizing performance and provide guidance on appropriate choices for state-relevance weights.


Convergent Combinations of Reinforcement Learning with Linear Function Approximation

Neural Information Processing Systems

Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. However, inpractical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. This allows to analyse if a certain reinforcement learning algorithm and a certain functionapproximator are compatible. For the combination of the residual gradient algorithm with grid-based linear interpolation we show that there exists a universal constant learning rate such that the iteration converges independently of the concrete transition data. 1 Introduction The strongest convergence guarantees for reinforcement learning (RL) algorithms are available for the tabular case, where temporal difference algorithms for both policy evaluation and the general control problem converge with probability one independently of the concrete sampling strategy as long as all states are sampled infinitely often and the learning rate is decreased appropriately [2].


Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

Neural Information Processing Systems

Multiagent learning is a key problem in AI. In the presence of multiple Nashequilibria, even agents with non-conflicting interests may not be able to learn an optimal coordination policy. The problem is exaccerbated ifthe agents do not know the game and independently receive noisy payoffs. So, multiagent reinforfcement learning involves two interrelated problems:identifying the game and learning to play.