Europe
Gaussian Mixture Model with Local Consistency
Liu, Jialu (Zhejiang University) | Cai, Deng (Zhejiang University) | He, Xiaofei (Zhejiang University)
Gaussian Mixture Model (GMM) is one of the most popular data clustering methods which can be viewed as a linear combination of different Gaussian components. In GMM, each cluster obeys Gaussian distribution and the task of clustering is to group observations into different components through estimating each cluster's own parameters. The Expectation-Maximization algorithm is always involved in such estimation problem. However, many previous studies have shown naturally occurring data may reside on or close to an underlying submanifold. In this paper, we consider the case where the probability distribution is supported on a submanifold of the ambient space. We take into account the smoothness of the conditional probability distribution along the geodesics of data manifold. That is, if two observations are close in intrinsic geometry, their distributions over different Gaussian components are similar. Simply speaking, we introduce a novel method based on manifold structure for data clustering, called Locally Consistent Gaussian Mixture Model (LCGMM). Specifically, we construct a nearest neighbor graph and adopt Kullback-Leibler Divergence as the distance measurement to regularize the objective function of GMM. Experiments on several data sets demonstrate the effectiveness of such regularization.
Non-Negative Matrix Factorization with Constraints
Liu, Haifeng (Zhejiang University) | Wu, Zhaohui (Zhejiang University)
Non-negative matrix factorization (NMF), as a useful decomposition method for multivariate data, has been widely used in pattern recognition, information retrieval and computer vision. NMF is an effective algorithm to find the latent structure of the data and leads to a parts-based representation. However, NMF is essentially an unsupervised method and can not make use of label information. In this paper, we propose a novel semi-supervised matrix decomposition method, called Constrained Non-negative Matrix Factorization, which takes the label information as additional constraints. Specifically, we require that the data points sharing the same label have the same coordinate in the new representation space. This way, the learned representations can have more discriminating power. We demonstrate the effectiveness of this novel algorithm through a set of evaluations on real world applications.
The Genetic Algorithm as a General Diffusion Model for Social Networks
Lahiri, Mayank (University of Illinois at Chicago) | Cebrian, Manuel (Massachusetts Institute of Technology)
Diffusion processes taking place in social networks are used to model a number of phenomena, such as the spread of human or computer viruses, and the adoption of products in viral marketing campaigns. It is generally difficult to obtain accurate information about how such spreads actually occur, so a variety of stochastic diffusion models are used to simulate spreading processes in networks instead. We show that a canonical genetic algorithm with a spatially distributed population, when paired with specific forms of Holland's synthetic hyperplane-defined objective functions, can simulate a large and rich class of diffusion models for social networks. These include standard diffusion models, such as the Independent Cascade and Competing Processes models. In addition, our Genetic Algorithm Diffusion Model (GADM) can also model complex phenomena such as information diffusion. We demonstrate an application of the GADM to modeling information flow in a large, dynamic social network derived from e-mail headers.
Two-Stage Sparse Representation for Robust Recognition on Large-Scale Database
He, Ran (Dalian University of Technology) | Hu, BaoGang (Chinese Academy of Sciences) | Zheng, Wei-Shi (Queen Mary University of London) | Guo, YanQing (Dalian University of Technology)
This paper proposes a novel robust sparse representation method, called the two-stage sparse representation (TSR), for robust recognition on a large-scale database. Based on the divide and conquer strategy, TSR divides the procedure of robust recognition into outlier detection stage and recognition stage. In the first stage, a weighted linear regression is used to learn a metric in which noise and outliers in image pixels are detected. In the second stage, based on the learnt metric, the large-scale dataset is firstly filtered into a small set according to the nearest neighbor criterion. Then a sparse representation is computed by the non-negative least squares technique. The sparse solution is unique and can be optimized efficiently. The extensive numerical experiments on several public databases demonstrate that the proposed TSR approach generally obtains better classification accuracy than the state of the art Sparse Representation Classification (SRC). At the same time, by using the TSR, a significant reduction of computational cost is reached by over fifty times in comparison with the SRC, which enables the TSR to be deployed more suitably for large-scale dataset.
Exact Algorithms and Experiments for Hierarchical Tree Clustering
Hartung, Sepp (University of Jena) | Guo, Jiong (Universitรคt des Saarlandes) | Komusiewicz, Christian (University of Jena) | Niedermeier, Rolf (University of Jena) | Uhlmann, Johannes (University of Jena)
We perform new theoretical as well as first-time experimental studies for the NP-hard problem to find a closest ultrametric for given dissimilarity data on pairs. This is a central problem in the area of hierarchical clustering, where so far only polynomial-time approximation algorithms were known. In contrast, we develop efficient preprocessing algorithms (known as kernelization in parameterized algorithmics) with provable performance guarantees and a simple search tree algorithm. These are used to find optimal solutions. Our experiments with synthetic and biological data show the effectiveness of our algorithms and demonstrate that an approximation algorithm due to Ailon and Charikar [FOCS 2005] often gives (almost) optimal solutions.
Properties of Bayesian Dirichlet Scores to Learn Bayesian Network Structures
Campos, Cassio Polpo de (Dalle Molle Institute for Artificial Intelligence) | Ji, Qiang (Rensselaer Polytechnic Institute)
As we see later, the mathematical derivations are more elaborate A Bayesian network is a probabilistic graphical model that than those recently introduced for BIC and AIC criteria relies on a structured dependency among random variables (de Campos, Zeng, and Ji 2009), and the reduction in the to represent a joint probability distribution in a compact and search space and cache size are less effective when priors efficient manner. It is composed by a directed acyclic graph are strong, but still relevant. This is expected, as the BIC (DAG) where nodes are associated to random variables and score is known to penalize complex graphs more than BD conditional probability distributions are defined for variables scores do. We show that the search space can be reduced given their parents in the graph. Learning the graph (or without losing the global optimality guarantee and that the structure) of these networks from data is one of the most memory requirements are small in many practical cases.
What if the Irresponsible Teachers Are Dominating?
Chen, Shuo (Tsinghua University) | Zhang, Jianwen (Tsinghua University) | Chen, Guangyun (Tsinghua University) | Zhang, Changshui (Tsinghua University)
As the Internet-based crowdsourcing services become more and more popular, learning from multiple teachers or sources has received more attention of the researchers in the machine learning area. In this setting, the learning system is dealing with samples and labels provided by multiple teachers, who in common cases, are non-expert. Their labeling styles and behaviors are usually diverse, some of which are even detrimental to the learning system. Thus, simply putting them together and utilizing the algorithms designed for single-teacher scenario would be not only improper, but also damaging. The problem calls for more specific methods. Our work focuses on a case where the teachers are composed of good ones and irresponsible ones. By irresponsible, we mean the teacher who takes the labeling task not seriously and label the sample at random without inspecting the sample itself. This behavior is quite common when the task is not attractive enough and the teacher just wants to finish it as soon as possible. Sometimes, the irresponsible teachers could take a considerable part among all the teachers. If we do not take out their effects, our learning system would be ruined with no doubt. In this paper, we propose a method for picking out the good teachers with promising experimental results. It works even when the irresponsible teachers are dominating in numbers.
G-Optimal Design with Laplacian Regularization
Chen, Chun (Zhejiang University) | Chen, Zhengguang (Zhejiang University) | Bu, Jiajun (Zhejiang University) | Wang, Can (Zhejiang University) | Zhang, Lijun (Zhejiang University) | Zhang, Cheng (China Disabled Persons')
In many real world applications, labeled data are usually expensive to get, while there may be a large amount of unlabeled data. To reduce the labeling cost, active learning attempts to discover the most informative data points for labeling. Recently, Optimal Experimental Design (OED) techniques have attracted an increasing amount of attention. OED is concerned with the design of experiments that minimizes variances of a parameterized model. Typical design criteria include D-, A-, and E-optimality. However, all these criteria are based on an ordinary linear regression model which aims to minimize the empirical error whereas the geometrical structure of the data space is not well respected. In this paper, we propose a novel optimal experimental design approach for active learning, called Laplacian G-Optimal Design (LapGOD), which considers both discriminating and geometrical structures. By using Laplacian Regularized Least Squares which incorporates manifold regularization into linear regression, our proposed algorithm selects those data points that minimizes the maximum variance of the predicted values on the data manifold. We also extend our algorithm to nonlinear case by using kernel trick. The experimental results on various image databases have shown that our proposed LapGOD active learning algorithm can significantly enhance the classification accuracy if the selected data points are used as training data.
Adaptive Transfer Learning
Cao, Bin (The Hong Kong University of Science and Technology) | Pan, Sinno Jialin (The Hong Kong University of Science and Technology) | Zhang, Yu (The Hong Kong University of Science and Technology) | Yeung, Dit-Yan (The Hong Kong University of Science and Technology) | Yang, Qiang (The Hong Kong University of Science and Technology)
Transfer learning aims at reusing the knowledge in some source tasks to improve the learning of a target task. Many transfer learning methods assume that the source tasks and the target task be related, even though many tasks are not related in reality. However, when two tasks are unrelated, the knowledge extracted from a source task may not help, and even hurt, the performance of a target task. Thus, how to avoid negative transfer and then ensure a "safe transfer" of knowledge is crucial in transfer learning. In this paper, we propose an Adaptive Transfer learning algorithm based on Gaussian Processes (AT-GP), which can be used to adapt the transfer learning schemes by automatically estimating the similarity between a source and a target task. The main contribution of our work is that we propose a new semi-parametric transfer kernel for transfer learning from a Bayesian perspective, and propose to learn the model with respect to the target task, rather than all tasks as in multi-task learning. We can formulate the transfer learning problem as a unified Gaussian Process (GP) model. The adaptive transfer ability of our approach is verified on both synthetic and real-world datasets.
Decidable Fragments of First-Order Language Under Stable Model Semantics and Circumscription
Zhang, Heng (Tsinghua University) | Ying, Mingsheng (Tsinghua University)
The stable model semantics was recently generalized by Ferraris, Lee and Lifschitz to the full first-order language with a syntax translation approach that is very similar to McCarthy's circumscription. In this paper, we investigate the decidability and undecidability of various fragments of first-order language under both semantics of stable models and circumscription. Some maximally decidable classes and undecidable classes are identified. The results obtained in the paper show that the boundaries between decidability and undecidability for these two semantics are very different in spite of the similarity of definition. Moreover, for all fragments considered in the paper, decidability under the semantics of circumscription coincides with that in classical first-order logic. This seems rather counterintuitive due to the second-order definition of circumscription and the high undecidability of first-order circumscription.