Goto

Collaborating Authors

 Support Vector Machines


Solving Indefinite Kernel Support Vector Machine with Difference of Convex Functions Programming

AAAI Conferences

Indefinite kernel support vector machine (IKSVM) has recently attracted increasing attentions in machine learning. Different from traditional SVMs, IKSVM essentially is a non-convex optimization problem. Some algorithms directly change the spectrum of the indefinite kernel matrix at the cost of losing some valuable information involved in the kernels so as to transform the non-convex problem into a convex one. Other algorithms aim to solve the dual form of IKSVM, but suffer from the dual gap between the primal and dual problems in the case of indefinite kernels. In this paper, we directly focus on the non-convex primal form of IKSVM and propose a novel algorithm termed as IKSVM-DC. According to the characteristics of the spectrum for the indefinite kernel matrix, IKSVM-DC decomposes the objective function into the subtraction of two convex functions and thus reformulates the primal problem as a difference of convex functions (DC) programming which can be optimized by the DC algorithm (DCA). In order to accelerate convergence rate, IKSVM-DC further combines the classical DCA with a line search step along the descent direction at each iteration. A theoretical analysis is then presented to validate that IKSVM-DC can converge to a local minimum. Systematical experiments on real-world datasets demonstrate the superiority of IKSVM-DC compared to state-of-the-art IKSVM related algorithms.


A General Framework for Sparsity Regularized Feature Selection via Iteratively Reweighted Least Square Minimization

AAAI Conferences

A variety of feature selection methods based on sparsity regularization have been developed with different loss functions and sparse regularization functions. Capitalizing on the existing sparsity regularized feature selection methods, we propose a general sparsity feature selection (GSR-FS) algorithm that optimizes a โ„“ 2, r (0 <ย  r โ‰ค 2) based loss function with a โ„“ 2, p -norm (0 < p โ‰ค 2) sparse regularization. The โ„“ 2, r - norm (0 < 𝑟 โ‰ค 2) based loss function brings flexibility to balance data-fitting and robustness to outliers by tuning its parameter, and the โ„“ 2, p -norm (0 < p โ‰ค 1) based regularization function is able to boost the sparsity for feature selection. To solve the optimization problem with multiple non-smooth and non-convex functions when , we develop an efficient solver under the general umbrella of Iterative Reweighted Least Square (IRLS) algorithms. Our algorithm has been proved to converge with a theoretical convergence order of min(2 โ€“ r, 2 โ€“ p ) at least . The experimental results have demonstrated that our method could achieve competitive feature selection performance on publicly available datasets compared with state-of-the-art feature selection methods, with reduced computational cost.


Multiclass Capped โ„“p-Norm SVM for Robust Classifications

AAAI Conferences

Support vector machine (SVM) model is one of most successful machine learning methods and has been successfully applied to solve numerous real-world application. Because the SVM methods use the hinge loss or squared hinge loss functions for classifications, they usually outperform other classification approaches, e.g. the least square loss function based methods. However, like most supervised learning algorithms, they learn classifiers based on the labeled data in training set without specific strategy to deal with the noise data. In many real-world applications, we often have data outliers in train set, which could misguide the classifiers learning, such that the classification performance is suboptimal. To address this problem, we proposed a novel capped Lp-norm SVM classification model by utilizing the capped `p-norm based hinge loss in the objective which can deal with both light and heavy outliers. We utilize the new formulation to naturally build the multiclass capped Lp-norm SVM. More importantly, we derive a novel optimization algorithms to efficiently minimize the capped Lp-norm based objectives, and also rigorously prove the convergence of proposed algorithms. We present experimental results showing that employing the new capped Lp-norm SVM method can consistently improve the classification performance, especially in the cases when the data noise level increases.


Lifted Inference for Convex Quadratic Programs

AAAI Conferences

Symmetry is the essential element of lifted inferencethat has recently demonstrated the possibility to perform very efficient inference in highly-connected, but symmetric probabilistic models. This raises the question, whether this holds for optimization problems in general.Here we show that for a large classof optimization methods this is actually the case.Specifically, we introduce the concept of fractionalsymmetries of convex quadratic programs (QPs),which lie at the heart of many AI and machine learning approaches,and exploit it to lift, i.e., to compress QPs.These lifted QPs can then be tackled with the usual optimization toolbox (off-the-shelf solvers, cutting plane algorithms,stochastic gradients etc.). If the original QP exhibitssymmetry, then the lifted one will generallybe more compact, and hence more efficient to solve.


Modeling Skewed Class Distributions by Reshaping the Concept Space

AAAI Conferences

We introduce an approach to learning from imbalanced class distributions that does not change the underlying data distribution. The ICC algorithm decomposes majority classes into smaller sub-classes that create a more balanced class distribution. In this paper, we explain how ICC can not only addressthe class imbalance problem but may also increase the expressive power of the hypothesis space. We validate ICC and analyze alternative decomposition methods on well-known machine learning datasets as well as new problems in pervasive computing. Our results indicate that ICC performs as well or better than existing approaches to handling class imbalance.


From Shared Subspaces to Shared Landmarks: A Robust Multi-Source Classification Approach

AAAI Conferences

Training machine leaning algorithms on augmented data fromdifferent related sources is a challenging task. This problemarises in several applications, such as the Internet of Things(IoT), where data may be collected from devices with differentsettings. The learned model on such datasets can generalizepoorly due to distribution bias. In this paper we considerthe problem of classifying unseen datasets, given several labeledtraining samples drawn from similar distributions. Weexploit the intrinsic structure of samples in a latent subspaceand identify landmarks, a subset of training instances fromdifferent sources that should be similar. Incorporating subspacelearning and landmark selection enhances generalizationby alleviating the impact of noise and outliers, as well asimproving efficiency by reducing the size of the data. However,since addressing the two issues simultaneously resultsin an intractable problem, we relax the objective functionby leveraging the theory of nonlinear projection and solve atractable convex optimisation. Through comprehensive analysis,we show that our proposed approach outperforms stateof-the-art results on several benchmark datasets, while keepingthe computational complexity low.


Latent Discriminant Analysis with Representative Feature Discovery

AAAI Conferences

Linear Discriminant Analysis (LDA) is a well-known method for dimension reduction and classification with focus on discriminative feature selection. However, how to discover discriminative as well as representative features in LDA model has not been explored. In this paper, we propose a latent Fisher discriminant model with representative feature discovery in an semi-supervised manner. Specifically, our model leverages advantages of both discriminative and generative models by generalizing LDA with data-driven prior over the latent variables. Thus, our method combines multi-class, latent variables and dimension reduction in an unified Bayesian framework. We test our method on MUSK and Corel datasets and yield competitive results compared to baselines. We also demonstrate its capacity on the challenging TRECVID MED11 dataset for semantic keyframe extraction and conduct a human-factors ranking-based experimental evaluation, which clearly demonstrates our proposed method consistently extracts more semantically meaningful keyframes than challenging baselines.


Cross-Domain Kernel Induction for Transfer Learning

AAAI Conferences

The key question in transfer learning (TL) research is how to make model induction transferable across different domains. Common methods so far require source and target domains to have a shared/homogeneous feature space, or the projection of features from heterogeneous domains onto a shared space. This paper proposes a novel framework, which does not require a shared feature space but instead uses a parallel corpus to calibrate domain-specific kernels into a unified kernel, to leverage graph-based label propagation in cross-domain settings, and to optimize semi-supervised learning based on labeled and unlabeled data in both source and target domains. Our experiments on benchmark datasets show advantageous performance of the proposed method over that of other state-of-the-art TL methods.


The Bernstein Mechanism: Function Release under Differential Privacy

AAAI Conferences

We address the problem of general function release under differential privacy, by developing a functional mechanism that applies under the weak assumptions of oracle access to target function evaluation and sensitivity. These conditions permit treatment of functions described explicitly or implicitly as algorithmic black boxes. We achieve this result by leveraging the iterated Bernstein operator for polynomial approximation of the target function, and polynomial coefficient perturbation. Under weak regularity conditions, we establish fast rates on utility measured by high-probability uniform approximation. We provide a lower bound on the utility achievable for any functional mechanism that is epsilon-differentially private. The generality of our mechanism is demonstrated by the analysis of a number of example learners, including naive Bayes, non-parametric estimators and regularized empirical risk minimization. Competitive rates are demonstrated for kernel density estimation; and epsilon-differential privacy is achieved for a broader class of support vector machines than known previously.


Unsupervised Domain Adaptation with a Relaxed Covariate Shift Assumption

AAAI Conferences

The distributions can be different (Storkey and Sugiyama 2006; training and test domains are commonly referred to in the Ben-David and Urner 2012; 2014). Covariate shift is a valid domain adaptation literature as the source and target domains, assumption in some problems, but it can as well be quite respectively. Domain diversity can emerge as a result of the unrealistic for many other domain adaptation tasks where the scarcity of available labeled data from the target domain. It conditional label distributions are not (or, more precisely, not can as well be innate in the problem itself due to, for example, guaranteed to be) identical. The simplification resulting from an ongoing change occurring to the source domain like assuming identical labeling distributions facilitates the quest in cases where the original source domain keeps changing for a tractable learning algorithm, albeit possibly at the cost over time. Domain adaptation aims at finding solutions for of reducing the expressiveness power of the representation, this kind of problem, where the training (source) data are and consequently the accuracy of the resulting hypothesis.