Goto

Collaborating Authors

 Statistical Learning


Graph Embedding with Constraints

AAAI Conferences

Recently graph based dimensionality reduction has received a lot of interests in many fields of information processing. Central to it is a graph structure which models the geometrical and discriminant structure of the data manifold. When label information is available, it is usually incorporated into the graph structure by modifying the weights between data points. In this paper, we propose a novel dimensionality reduction algorithm, called Constrained Graph Embedding, which considers the label information as additional constraints. Specifically, we constrain the space of the solutions that we explore only to contain embedding results that are consistent with the labels. Experimental results on two real life data sets illustrate the effectiveness of our proposed method.


Learning Optimal Subsets with Implicit User Preferences

AAAI Conferences

We study the problem of learning an optimal subset from a larger ground set of items, where the optimality criterion is defined by an unknown preference function. We model the problem as a discriminative structural learning problem and solve it using a Structural Support Vector Machine (SSVM) that optimizes a set accuracy performance measure representing set similarities. Our approach departs from previous approaches since we do not explicitly learn a pre-defined preference function. Experimental results on both a synthetic problem domain and a real-world face image subset selection problem show that our method significantly outperforms previous learning approaches for such problems.


Local Learning Regularized Nonnegative Matrix Factorization

AAAI Conferences

Nonnegative Matrix Factorization (NMF) has been widely used in machine learning and data mining. It aims to find two nonnegative matrices whose product can well approximate the nonnegative data matrix, which naturally lead to parts-based representation. In this paper, we present a local learning regularized nonnegative matrix factorization (LLNMF) for clustering. It imposes an additional constraint on NMF that the cluster label of each point can be predicted by the points in its neighborhood. This constraint encodes both the discriminative information and the geometric structure, and is good at clustering data on manifold. An iterative multiplicative updating algorithm is proposed to optimize the objective, and its convergence is guaranteed theoretically. Experiments on many benchmark data sets demonstrate that the proposed method outperforms NMF as well as many state of the art clustering methods.


Knowledge Driven Dimension Reduction For Clustering

AAAI Conferences

However, most dimension reduction approaches are driven by objective functions that may not or only We will provide more detail on our solution to this problem partially suit the end users requirements. In this later but it is important to note the problem of focus in this work, we show how to incorporate general-purpose paper is different to spectral clustering (dimension reduction) domain expertise encoded as a graph into dimension in two keys ways. Firstly, we are projecting the entire space reduction in way that lends itself to an elegant D occupies not just the points in G or D. Secondly, we do generalized eigenvalue problem. We call not formulate the problem as some form of min-cut and then our approach Graph-Driven Constrained Dimension solve a relaxed version of the problem. Reduction via Linear Projection (GCDR-LP) Our work aims to find a reduced dimension space based on and show that it has several desirable properties.


Locality Preserving Nonnegative Matrix Factorization

AAAI Conferences

Matrix factorization techniques have been frequently applied in information processing tasks. Among them, Non-negative Matrix Factorization (NMF) have received considerable attentions due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts-based in human brain. On the other hand, from geometric perspective the data is usually sampled from a low dimensional manifold embedded in high dimensional ambient space. One hopes then to find a compact representation which uncovers the hidden topics and simultaneously respects the intrinsic geometric structure. In this paper, we propose a novel algorithm, called {\em Locality Preserving Non-negative Matrix Factorization} (LPNMF), for this purpose. For two data points, we use KL-divergence to evaluate their similarity on the hidden topics. The optimal maps are obtained such that the feature values on hidden topics are restricted to be non-negative and vary smoothly along the geodesics of the data manifold. Our empirical study shows the encouraging results of the proposed algorithm in comparisons to the state-of-the-art algorithms on two large high-dimensional databases.


Exponential Family Hybrid Semi-Supervised Learning

AAAI Conferences

We present an approach to semi-supervised learning based on an exponential family characterization. Our approach generalizes previous work on coupled priors for hybrid generative/discriminative models. Our model is more flexible and natural than previous approaches. Experimental results on several data sets show that our approach also performs better in practice.ย 


Adaptive Cluster Ensemble Selection

AAAI Conferences

Cluster ensembles generate a large number of different clustering solutions and combine them into a more robust and accurate consensus clustering. On forming the ensembles, the literature has suggested that higher diversity among ensemble members produces higher performance gain. In contrast, some studies also indicated that medium diversity leads to the best performing ensembles. Such contradicting observations suggest that different data, with varying characteristics, may require different treatments. We empirically investigate this issue by examining the behavior of cluster ensembles on benchmark data sets. This leads to a novel framework that selects ensemble members for each data set based on its own characteristics. Our framework first generates a diverse set of solutions and combines them into a consensus partition P*. Based on the diversity between the ensemble members and P*, a subset of ensemble members is selected and combined to obtain the final output. We evaluate the proposed method on benchmark data sets and the results show that the proposed method can significantly improve the clustering performance, often by a substantial margin. In some cases, we were able to produce final solutions that significantly outperform even the best ensemble members.


Set Branching in Constraint Optimization

AAAI Conferences

Branch and bound is an effective technique for solving constraint optimization problems (COPโ€™s). However, its search space expands very rapidly as the domain sizes of the problem variables grow. In this paper, we present an algorithm that clusters the values of a variableโ€™s domain into sets. Branch and bound can then branch on these sets of values rather than on individual values, thereby reducing the branching factor of its search space. The aim of our clustering algorithm is to construct a collection of sets such that branching on these sets will still allow effective bounding. In conjunction with the reduced branching factor, the size of the explored search space is thus significantly reduced. We test our method and show empirically that it can yield significant performance gains over existing stateof- the-art techniques.


Online Stochastic Optimization in the Large: Application to Kidney Exchange

AAAI Conferences

Kidneys are the most prevalent organ transplants, but demand dwarfs supply. Kidney exchanges enable willing but incompatible donor-patient pairs to swap donors. These swaps can include cycles longer than two pairs as well, and chains triggered by altruistic donors. Current kidney exchanges address clearing(deciding who gets kidneys from whom) as an offline problem: they optimize the current batch. In reality, clearing is an online problem where patient-donor pairs and altruistic donors appear and expire over time. In this paper, we study trajectory-based online stochastic optimization algorithms (which use a recent scalable optimal offline solver as a subroutine) for this. We identify tradeoffs in these algorithms between different parameters. We also uncover the need to set the batch size that the algorithms consider an atomic unit. We develop an experimental methodology for setting these parameters, and conduct experiments on real and generated data. We adapt the REGRETS algorithm of Bent and van Hentenryck for the setting. We then develop a better algorithm. We also show that the AMSAA algorithm of Mercier and van Hentenryck does not scale to the nationwide level. Our best online algorithm saves significantly more lives than the current practice of solving each batch separately.


A Kernel Method for Market Clearing

AAAI Conferences

The problem of market clearing in an economy is that of finding prices such that supply meets demand. In this work, we propose a kernel method to compute nonlinear clearing prices for instances where linear prices do not suffice. We first present a procedure that, given a sample of values and costs for a set of bundles, implicitly computes nonlinear clearing prices by solving an appropriately formulated quadratic program. We then use this as a subroutine in an elicitation procedure that queries demand and supply incrementally over rounds, only as much as needed to reach clearing prices. An empirical evaluation demonstrates that, with a proper choice of kernel function, the method is able to find sparse nonlinear clearing prices with much less than full revelation of values and costs. When the kernel function is not suitable to clear the market, the method can be tuned to achieve approximate clearing.