Goto

Collaborating Authors

 Country


Learning Causal Models of Relational Domains

AAAI Conferences

Methods for discovering causal knowledge from observational data have been a persistent topic of AI research for several decades. Essentially all of this work focuses on knowledge representations for propositional domains. In this paper, we present several key algorithmic and theoretical innovations that extend causal discovery to relational domains. We provide strong evidence that effective learning of causal models is enhanced by relational representations. We present an algorithm, relational PC, that learns causal dependencies in a state-of-the-art relational representation, and we identify the key representational and algorithmic innovations that make the algorithm possible. Finally, we prove the algorithm's theoretical correctness and demonstrate its effectiveness on synthetic and real data sets.


Multilinear Maximum Distance Embedding Via L1-Norm Optimization

AAAI Conferences

Dimensionality reduction plays an important role in many machine learning and pattern recognition tasks. In this paper, we present a novel dimensionality reduction algorithm called multilinear maximum distance embedding (M2DE), which includes three key components. To preserve the local geometry and discriminant information in the embedded space, M2DE utilizes a new objective function, which aims to maximize the distances between some particular pairs of data points, such as the distances between nearby points and the distances between data points from different classes. To make the mapping of new data points straightforward, and more importantly, to keep the natural tensor structure of high-order data, M2DE integrates multilinear techniques to learn the transformation matrices sequentially. To provide reasonable and stable embedding results, M2DE employs the L1-norm, which is more robust to outliers, to measure the dissimilarity between data points. Experiments on various datasets demonstrate that M2DE achieves good embedding results of high-order data for classification tasks.


Constrained Metric Learning Via Distance Gap Maximization

AAAI Conferences

Vectored data frequently occur in a variety of fields, which are easy to handle since they can be mathematically abstracted as points residing in a Euclidean space. An appropriate distance metric in the data space is quite demanding for a great number of applications. In this paper, we pose robust and tractable metric learning under pairwise constraints that are expressed as similarity judgements between data pairs. The major features of our approach include: 1) it maximizes the gap between the average squared distance among dissimilar pairs and the average squared distance among similar pairs; 2) it is capable of propagating similar constraints to all data pairs; and 3) it is easy to implement in contrast to the existing approaches using expensive optimization such as semidefinite programming. Our constrained metric learning approach has widespread applicability without being limited to particular backgrounds. Quantitative experiments are performed for classification and retrieval tasks, uncovering the effectiveness of the proposed approach.


Gaussian Mixture Model with Local Consistency

AAAI Conferences

Gaussian Mixture Model (GMM) is one of the most popular data clustering methods which can be viewed as a linear combination of different Gaussian components. In GMM, each cluster obeys Gaussian distribution and the task of clustering is to group observations into different components through estimating each cluster's own parameters. The Expectation-Maximization algorithm is always involved in such estimation problem. However, many previous studies have shown naturally occurring data may reside on or close to an underlying submanifold. In this paper, we consider the case where the probability distribution is supported on a submanifold of the ambient space. We take into account the smoothness of the conditional probability distribution along the geodesics of data manifold. That is, if two observations are close in intrinsic geometry, their distributions over different Gaussian components are similar. Simply speaking, we introduce a novel method based on manifold structure for data clustering, called Locally Consistent Gaussian Mixture Model (LCGMM). Specifically, we construct a nearest neighbor graph and adopt Kullback-Leibler Divergence as the distance measurement to regularize the objective function of GMM. Experiments on several data sets demonstrate the effectiveness of such regularization.


Non-Negative Matrix Factorization with Constraints

AAAI Conferences

Non-negative matrix factorization (NMF), as a useful decomposition method for multivariate data, has been widely used in pattern recognition, information retrieval and computer vision. NMF is an effective algorithm to find the latent structure of the data and leads to a parts-based representation. However, NMF is essentially an unsupervised method and can not make use of label information. In this paper, we propose a novel semi-supervised matrix decomposition method, called Constrained Non-negative Matrix Factorization, which takes the label information as additional constraints. Specifically, we require that the data points sharing the same label have the same coordinate in the new representation space. This way, the learned representations can have more discriminating power. We demonstrate the effectiveness of this novel algorithm through a set of evaluations on real world applications.


Cost-Sensitive Semi-Supervised Support Vector Machine

AAAI Conferences

In this paper, we study cost-sensitive semi-supervised learning where many of the training examples are unlabeled and different misclassification errors are associated with unequal costs. This scenario occurs in many real-world applications. For example, in some disease diagnosis, the cost of erroneously diagnosing a patient as healthy is much higher than that of diagnosing a healthy person as a patient. Also, the acquisition of labeled data requires medical diagnosis which is expensive, while the collection of unlabeled data such as basic health information is much cheaper. We propose the CS4VM (Cost-Sensitive Semi-Supervised Support Vector Machine) to address this problem. We show that the CS4VM, when given the label means of the unlabeled data, closely approximates the supervised cost-sensitive SVM that has access to the ground-truth labels of all the unlabeled data. This observation leads to an efficient algorithm which first estimates the label means and then trains the CS4VM with the plug-in label means by an efficient SVM solver. Experiments on a broad range of data sets show that the proposed method is capable of reducing the total cost and is computationally efficient.


The Genetic Algorithm as a General Diffusion Model for Social Networks

AAAI Conferences

Diffusion processes taking place in social networks are used to model a number of phenomena, such as the spread of human or computer viruses, and the adoption of products in viral marketing campaigns. It is generally difficult to obtain accurate information about how such spreads actually occur, so a variety of stochastic diffusion models are used to simulate spreading processes in networks instead. We show that a canonical genetic algorithm with a spatially distributed population, when paired with specific forms of Holland's synthetic hyperplane-defined objective functions, can simulate a large and rich class of diffusion models for social networks. These include standard diffusion models, such as the Independent Cascade and Competing Processes models. In addition, our Genetic Algorithm Diffusion Model (GADM) can also model complex phenomena such as information diffusion. We demonstrate an application of the GADM to modeling information flow in a large, dynamic social network derived from e-mail headers.


Structure Learning for Markov Logic Networks with Many Descriptive Attributes

AAAI Conferences

Many machine learning applications that involve relational databases incorporate first-order logic and probability. Markov Logic Networks (MLNs) are a prominent statistical relational model that consist of weighted first order clauses. Many of the current state-of-the-art algorithms for learning MLNs have focused on relatively small datasets with few descriptive attributes, where predicates are mostly binary and the main task is usually prediction of links between entities. This paper addresses what is in a sense a complementary problem: learning the structure of an MLN that models the distribution of discrete descriptive attributes on medium to large datasets, given the links between entities in a relational database. Descriptive attributes are usually nonbinary and can be very informative, but they increase the search space of possible candidate clauses. We present an efficient new algorithm for learning a directed relational model (parametrized Bayes net), which produces an MLN structure via a standard moralization procedure for converting directed models to undirected models. Learning MLN structure in this way is 200-1000 times faster and scores substantially higher in predictive accuracy than benchmark algorithms on three relational databases.


Reinforcement Learning Via Practice and Critique Advice

AAAI Conferences

We consider the problem of incorporating end-user advice into reinforcement learning (RL). In our setting, the learner alternates between practicing, where learning is based on actual world experience, and end-user critique sessions where advice is gathered. During each critique session the end-user is allowed to analyze a trajectory of the current policy and then label an arbitrary subset of the available actions as good or bad. Our main contribution is an approach for integrating all of the information gathered during practice and critiques in order to effectively optimize a parametric policy. The approach optimizes a loss function that linearly combines losses measured against the world experience and the critique data. We evaluate our approach using a prototype system for teaching tactical battle behavior in a real-time strategy game engine. Results are given for a significant evaluation involving ten end-users showing the promise of this approach and also highlighting challenges involved in inserting end-users into the RL loop.


Two-Stage Sparse Representation for Robust Recognition on Large-Scale Database

AAAI Conferences

This paper proposes a novel robust sparse representation method, called the two-stage sparse representation (TSR), for robust recognition on a large-scale database. Based on the divide and conquer strategy, TSR divides the procedure of robust recognition into outlier detection stage and recognition stage. In the first stage, a weighted linear regression is used to learn a metric in which noise and outliers in image pixels are detected. In the second stage, based on the learnt metric, the large-scale dataset is firstly filtered into a small set according to the nearest neighbor criterion. Then a sparse representation is computed by the non-negative least squares technique. The sparse solution is unique and can be optimized efficiently. The extensive numerical experiments on several public databases demonstrate that the proposed TSR approach generally obtains better classification accuracy than the state of the art Sparse Representation Classification (SRC). At the same time, by using the TSR, a significant reduction of computational cost is reached by over fifty times in comparison with the SRC, which enables the TSR to be deployed more suitably for large-scale dataset.