Ying, Yiming
Co-Regularized PLSA for Multi-Modal Learning
Wang, Xin (State University of New York at Albany) | Chang, MingChing (State University of New York at Albany) | Ying, Yiming (State University of New York at Albany) | Lyu, Siwei (State University of New York at Albany)
Many learning problems in real world applications involve rich datasets comprising multiple information modalities. In this work, we study co-regularized PLSA(coPLSA) as an efficient solution to probabilistic topic analysis of multi-modal data. In coPLSA, similarities between topic compositions of a data entity across different data modalities are measured with divergences between discrete probabilities, which are incorporated as a co-regularizer to augment individual PLSA models over each data modality. We derive efficient iterative learning algorithms for coPLSA with symmetric KL, L2 and L1 divergences as co-regularizers, in each case the essential optimization problem affords simple numerical solutions that entail only matrix arithmetic operations and numerical solution of 1D nonlinear equations. We evaluate the performance of the coPLSA algorithms on text/image cross-modal retrieval tasks, on which they show competitive performance with state-of-the-art methods.
Unregularized Online Learning Algorithms with General Loss Functions
Ying, Yiming, Zhou, Ding-Xuan
In this paper, we consider unregularized online learning algorithms in a Reproducing Kernel Hilbert Spaces (RKHS). Firstly, we derive explicit convergence rates of the unregularized online learning algorithms for classification associated with a general gamma-activating loss (see Definition 1 in the paper). Our results extend and refine the results in Ying and Pontil (2008) for the least-square loss and the recent result in Bach and Moulines (2011) for the loss function with a Lipschitz-continuous gradient. Moreover, we establish a very general condition on the step sizes which guarantees the convergence of the last iterate of such algorithms. Secondly, we establish, for the first time, the convergence of the unregularized pairwise learning algorithm with a general loss function and derive explicit rates under the assumption of polynomially decaying step sizes. Concrete examples are used to illustrate our main results. The main techniques are tools from convex analysis, refined inequalities of Gaussian averages, and an induction approach.
Online Pairwise Learning Algorithms with Kernels
Ying, Yiming, Zhou, Ding-Xuan
Pairwise learning usually refers to a learning task which involves a loss function depending on pairs of examples, among which most notable ones include ranking, metric learning and AUC maximization. In this paper, we study an online algorithm for pairwise learning with a least-square loss function in an unconstrained setting of a reproducing kernel Hilbert space (RKHS), which we refer to as the Online Pairwise lEaRning Algorithm (OPERA). In contrast to existing works \cite{Kar,Wang} which require that the iterates are restricted to a bounded domain or the loss function is strongly-convex, OPERA is associated with a non-strongly convex objective function and learns the target function in an unconstrained RKHS. Specifically, we establish a general theorem which guarantees the almost surely convergence for the last iterate of OPERA without any assumptions on the underlying distribution. Explicit convergence rates are derived under the condition of polynomially decaying step sizes. We also establish an interesting property for a family of widely-used kernels in the setting of pairwise learning and illustrate the above convergence results using such kernels. Our methodology mainly depends on the characterization of RKHSs using its associated integral operators and probability inequalities for random variables with values in a Hilbert space.
Generalization Bounds for Metric and Similarity Learning
Cao, Qiong, Guo, Zheng-Chu, Ying, Yiming
Recently, metric learning and similarity learning have attracted a large amount of interest. Many models and optimisation algorithms have been proposed. However, there is relatively little work on the generalization analysis of such methods. In this paper, we derive novel generalization bounds of metric and similarity learning. In particular, we first show that the generalization analysis reduces to the estimation of the Rademacher average over "sums-of-i.i.d." sample-blocks related to the specific matrix norm. Then, we derive generalization bounds for metric/similarity learning with different matrix-norm regularisers by estimating their specific Rademacher complexities. Our analysis indicates that sparse metric/similarity learning with $L^1$-norm regularisation could lead to significantly better bounds than those with Frobenius-norm regularisation. Our novel generalization analysis develops and refines the techniques of U-statistics and Rademacher complexity analysis.
Learning with Support Vector Machines
Campbell, Colin, Ying, Yiming
Support Vectors Machines have become a well-established tool within machine learning. This introductory overview of Support Vectors Machines (SVM) starts with a simple SVM for performing binary classification, multi-class classification, and learning in the presence of noise. It shows this framework can be extended for prediction with real-valued outputs, novelty detection and the handling of complex output structures such as parse trees. ISBN 9781608456161, 95 pages.
Sparse Metric Learning via Smooth Optimization
Ying, Yiming, Huang, Kaizhu, Campbell, Colin
In this paper we study the problem of learning a low-dimensional (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixed-norm regularization which is non-convex. We then show that it can be equivalently formulated as a convex saddle (min-max) problem. From this saddle representation, we develop an efficient smooth optimization approach for sparse metric learning although the learning model is based on a non-differential loss function. This smooth optimization approach has an optimal convergence rate of $O(1 /\ell^2)$ for smooth problems where $\ell$ is the iteration number. Finally, we run experiments to validate the effectiveness and efficiency of our sparse metric learning model on various datasets.
Analysis of SVM with Indefinite Kernels
Ying, Yiming, Campbell, Colin, Girolami, Mark
The recent introduction of indefinite SVM by Luss and dAspremont [15] has effectively demonstrated SVM classification with a non-positive semi-definite kernel (indefinite kernel). This paper studies the properties of the objective function introduced there. In particular, we show that the objective function is continuously differentiable and its gradient can be explicitly computed. Indeed, we further show that its gradient is Lipschitz continuous. The main idea behind our analysis is that the objective function is smoothed by the penalty term, in its saddle (min-max) representation, measuring the distance between the indefinite kernel matrix and the proxy positive semi-definite one. Our elementary result greatly facilitates the application of gradient-based algorithms. Based on our analysis, we further develop Nesterovs smooth optimization approach [16,17] for indefinite SVM which has an optimal convergence rate for smooth problems. Experiments on various benchmark datasets validate our analysis and demonstrate the efficiency of our proposed algorithms.
A Spectral Regularization Framework for Multi-Task Structure Learning
Argyriou, Andreas, Pontil, Massimiliano, Ying, Yiming, Micchelli, Charles A.
Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalizationperformance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization withspectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efficiently by an alternating minimization algorithm. In addition, we provide a necessary and sufficient condition for convexity of the regularizer.