Goto

Collaborating Authors

 Industry


Bayesian Policy Search for Multi-Agent Role Discovery

AAAI Conferences

Bayesian inference is an appealing approach for leveraging prior knowledge in reinforcement learning (RL). In this paper we describe an algorithm for discovering different classes of roles for agents via Bayesian inference. In particular, we develop a Bayesian policy search approach for Multi-Agent RL (MARL), which is model-free and allows for priors on policy parameters. We present a novel optimization algorithm based on hybrid MCMC, which leverages both the prior and gradient information estimated from trajectories. Our experiments in a complex real-time strategy game demonstrate the effective discovery of roles from supervised trajectories, the use of discovered roles for successful transfer to similar tasks, and the discovery of roles through reinforcement learning.


Bayesian Matrix Factorization with Side Information and Dirichlet Process Mixtures

AAAI Conferences

Matrix factorization is a fundamental technique in machine learning that is applicable to collaborative filtering, information retrieval and many other areas. In collaborative filtering and many other tasks, the objective is to fill in missing elements of a sparse data matrix. One of the biggest challenges in this case is filling in a column or row of the matrix with very few observations. In this paper we introduce a Bayesian matrix factorization model that performs regression against side information known about the data in addition to the observations. The side information helps by adding observed entries to the factored matrices. We also introduce a nonparametric mixture model for the prior of the rows and columns of the factored matrices that gives a different regularization for each latent class. Besides providing a richer prior, the posterior distribution of mixture assignments reveals the latent classes. Using Gibbs sampling for inference, we apply our model to the Netflix Prize problem of predicting movie ratings given an incomplete user-movie ratings matrix. Incorporating rating information with gathered metadata information, our Bayesian approach outperforms other matrix factorization techniques even when using fewer dimensions.


Cost-Sensitive Semi-Supervised Support Vector Machine

AAAI Conferences

In this paper, we study cost-sensitive semi-supervised learning where many of the training examples are unlabeled and different misclassification errors are associated with unequal costs. This scenario occurs in many real-world applications. For example, in some disease diagnosis, the cost of erroneously diagnosing a patient as healthy is much higher than that of diagnosing a healthy person as a patient. Also, the acquisition of labeled data requires medical diagnosis which is expensive, while the collection of unlabeled data such as basic health information is much cheaper. We propose the CS4VM (Cost-Sensitive Semi-Supervised Support Vector Machine) to address this problem. We show that the CS4VM, when given the label means of the unlabeled data, closely approximates the supervised cost-sensitive SVM that has access to the ground-truth labels of all the unlabeled data. This observation leads to an efficient algorithm which first estimates the label means and then trains the CS4VM with the plug-in label means by an efficient SVM solver. Experiments on a broad range of data sets show that the proposed method is capable of reducing the total cost and is computationally efficient.


The Genetic Algorithm as a General Diffusion Model for Social Networks

AAAI Conferences

Diffusion processes taking place in social networks are used to model a number of phenomena, such as the spread of human or computer viruses, and the adoption of products in viral marketing campaigns. It is generally difficult to obtain accurate information about how such spreads actually occur, so a variety of stochastic diffusion models are used to simulate spreading processes in networks instead. We show that a canonical genetic algorithm with a spatially distributed population, when paired with specific forms of Holland's synthetic hyperplane-defined objective functions, can simulate a large and rich class of diffusion models for social networks. These include standard diffusion models, such as the Independent Cascade and Competing Processes models. In addition, our Genetic Algorithm Diffusion Model (GADM) can also model complex phenomena such as information diffusion. We demonstrate an application of the GADM to modeling information flow in a large, dynamic social network derived from e-mail headers.


Reinforcement Learning Via Practice and Critique Advice

AAAI Conferences

We consider the problem of incorporating end-user advice into reinforcement learning (RL). In our setting, the learner alternates between practicing, where learning is based on actual world experience, and end-user critique sessions where advice is gathered. During each critique session the end-user is allowed to analyze a trajectory of the current policy and then label an arbitrary subset of the available actions as good or bad. Our main contribution is an approach for integrating all of the information gathered during practice and critiques in order to effectively optimize a parametric policy. The approach optimizes a loss function that linearly combines losses measured against the world experience and the critique data. We evaluate our approach using a prototype system for teaching tactical battle behavior in a real-time strategy game engine. Results are given for a significant evaluation involving ten end-users showing the promise of this approach and also highlighting challenges involved in inserting end-users into the RL loop.


Facial Age Estimation by Learning from Label Distributions

AAAI Conferences

One of the main difficulties in facial age estimation is the lack of sufficient training data for many ages. Fortunately, the faces at close ages look similar since aging is a slow and smooth process. Inspired by this observation, in this paper, instead of considering each face image as an example with one label (age), we regard each face image as an example associated with a label distribution. The label distribution covers a number of class labels, representing the degree that each label describes the example. Through this way, in addition to the real age, one face image can also contribute to the learning of its adjacent ages. We propose an algorithm named IIS-LLD for learning from the label distributions, which is an iterative optimization process based on the maximum entropy model. Experimental results show the advantages of IIS-LLD over the traditional learning methods based on single-labeled data.


Myopic Policies for Budgeted Optimization with Constrained Experiments

AAAI Conferences

Motivated by a real-world problem, we study a novel budgeted optimization problem where the goal is to optimize an unknown function f ( x ) given a budget. In our setting, it is not practical to request samples of  f ( x ) at precise input values due to the formidable cost of precise experimental setup. Rather, we may request a constrained experiment, which is a subset r of the input space for which the experimenter returns  x  in r and  f ( x ). Importantly, as the constraints become looser, the experimental cost decreases, but the uncertainty about the location  x  of the next observation increases. Our goal is to manage this trade-off by selecting a sequence of constrained experiments to best optimize f within the budget. We introduce cost-sensitive policies for selecting constrained experiments using both model-free and model-based approaches, inspired by policies for unconstrained settings. Experiments on synthetic functions and functions derived from real-world experimental data indicate that our policies outperform random selection, that the model-based policies are superior to model-free ones, and give insights into which policies are preferable overall.


Latent Variable Model for Learning in Pairwise Markov Networks

AAAI Conferences

Pairwise Markov Networks (PMN) are an important class of Markov networks which, due to their simplicity, are widely used in many applications such as image analysis, bioinformatics, sensor networks, etc. However, learning of Markov networks from data is a challenging task; there are many possible structures one must consider and each of these structures comes with its own parameters making it easy to overfit the model with limited data. To deal with the problem, recent learning methods build upon the L1 regularization to express the bias towards sparse network structures. In this paper, we propose a new and more flexible framework that let us bias the structure, that can, for example, encode the preference to networks with certain local substructures which as a whole exhibit some special global structure. We experiment with and show the benefit of our framework on two types of problems: learning of modular networks and learning of traffic networks models.


Collaborative Filtering Meets Mobile Recommendation: A User-Centered Approach

AAAI Conferences

With the increasing popularity of location tracking services such as GPS, more and more mobile data are being accumulated. Based on such data, a potentially useful service is to make timely and targeted recommendations for users on places where they might be interested to go and activities that they are likely to conduct. For example, a user arriving in Beijing might wonder where to visit and what she can do around the Forbidden City. A key challenge for such recommendation problems is that the data we have on each individual user might be very limited, while to make useful and accurate recommendations, we need extensive annotated location and activity information from user trace data. In this paper, we present a new approach, known as user-centered collaborative location and activity filtering (UCLAF), to pull many users’ data together and apply collaborative filtering to find like-minded users and like-patterned activities at different locations. We model the userlocation- activity relations with a tensor representation, and propose a regularized tensor and matrix decomposition solution which can better address the sparse data problem in mobile information retrieval. We empirically evaluate UCLAF using a real-world GPS dataset collected from 164 users over 2.5 years, and showed that our system can outperform several state-of-the-art solutions to the problem.


Coalition Structure Generation based on Distributed Constraint Optimization

AAAI Conferences

Forming effective coalitions is a major research challenge in AI and multi-agent systems (MAS). Coalition Structure generation (CSG) involves partitioning a set of agents into coalitions so that social surplus (the sum of the rewards of all coalitions) is maximized. A partition is called a Coalition Structure (CS). In traditional works, the value of a coalition is given by a black box function called a characteristic function. In this paper, we propose a novel formalization of CSG, i.e., we assume the value of a characteristic function is given by an optimal solution of a distributed constraint optimization problem (DCOP) among the agents of a coalition. A DCOP is a popular approach for modeling cooperative agents, since it is quite general and can formalize various application problems in MAS. At first glance, one might assume that the computational costs required in this approach would be too expensive, since we need to solve an NP-hard problem just to obtain the value of a single coalition. To optimally solve a CSG, we might need to solve n-th power of 2 DCOP problem instances, where n is the number of agents. However, quite surprisingly, we show that an approximation algorithm, whose computational cost is about the same as solving just one DCOP, can find a CS with quality guarantees. More specifically, we develop an algorithm with parameter k that can find a CS whose social surplus is at least max(k/(w*+1), 2k/n) of the optimal CS, where w* is the tree width of a constraint graph. When k=1, the complexity of this algorithm is about the same as solving just one DCOP. These results illustrate that the locality of interactions among agents, which is explicitly modeled in the DCOP formalization, is quite useful in developing an efficient CSG algorithm with quality guarantees.