Collaborating Authors

Learning with Noisy Labels

Neural Information Processing Systems

In this paper, we theoretically study the problem of binary classification in the presence of random classification noise --- the learner, instead of seeing the true labels, sees labels that have independently been flipped with some small probability. Moreover, random label noise is \emph{class-conditional} --- the flip probability depends on the class. We provide two approaches to suitably modify any given surrogate loss function. First, we provide a simple unbiased estimator of any loss, and obtain performance bounds for empirical risk minimization in the presence of iid data with noisy labels. If the loss function satisfies a simple symmetry condition, we show that the method leads to an efficient algorithm for empirical minimization.

Iterative Reweighted Minimization Methods for $l_p$ Regularized Unconstrained Nonlinear Programming Machine Learning

In this paper we study general $l_p$ regularized unconstrained minimization problems. In particular, we derive lower bounds for nonzero entries of first- and second-order stationary points, and hence also of local minimizers of the $l_p$ minimization problems. We extend some existing iterative reweighted $l_1$ (IRL1) and $l_2$ (IRL2) minimization methods to solve these problems and proposed new variants for them in which each subproblem has a closed form solution. Also, we provide a unified convergence analysis for these methods. In addition, we propose a novel Lipschitz continuous $\epsilon$-approximation to $\|x\|^p_p$. Using this result, we develop new IRL1 methods for the $l_p$ minimization problems and showed that any accumulation point of the sequence generated by these methods is a first-order stationary point, provided that the approximation parameter $\epsilon$ is below a computable threshold value. This is a remarkable result since all existing iterative reweighted minimization methods require that $\epsilon$ be dynamically updated and approach zero. Our computational results demonstrate that the new IRL1 method is generally more stable than the existing IRL1 methods [21,18] in terms of objective function value and CPU time.

Penalty Decomposition Methods for Rank Minimization

Neural Information Processing Systems

In this paper we consider general rank minimization problems with rank appearing ineither objective function or constraint. We first show that a class of matrix optimization problems can be solved as lower dimensional vector optimization problems. As a consequence, we establish that a class of rank minimization problems haveclosed form solutions. Using this result, we then propose penalty decomposition methodsfor general rank minimization problems. The convergence results of the PD methods have been shown in the longer version of the paper [19]. Finally, we test the performance of our methods by applying them to matrix completion and nearest low-rank correlation matrix problems. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed.

Semi-supervised Learning in Network-Structured Data via Total Variation Minimization Machine Learning

We propose and analyze a method for semi-supervised learning from partially-labeled network-structured data. Our approach is based on a graph signal recovery interpretation under a clustering hypothesis that labels of data points belonging to the same well-connected subset (cluster) are similar valued. This lends naturally to learning the labels by total variation (TV) minimization, which we solve by applying a recently proposed primal-dual method for non-smooth convex optimization. The resulting algorithm allows for a highly scalable implementation using message passing over the underlying empirical graph, which renders the algorithm suitable for big data applications. By applying tools of compressed sensing, we derive a sufficient condition on the underlying network structure such that TV minimization recovers clusters in the empirical graph of the data. In particular, we show that the proposed primal-dual method amounts to maximizing network flows over the empirical graph of the dataset. Moreover, the learning accuracy of the proposed algorithm is linked to the set of network flows between data points having known labels. The effectiveness and scalability of our approach is verified by numerical experiments.

Reflection methods for user-friendly submodular optimization

Neural Information Processing Systems

Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. In consequence, there is need for efficient optimization procedures for submodular functions, in particular for minimization problems. While general submodular minimization is challenging, we propose a new approach that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our approach is a formulation of the discrete submodular minimization problem as a continuous best approximation problem. It is solved through a sequence of reflections and its solution can be automatically thresholded to obtain an optimal discrete solution. Our method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we show the benefits of our new algorithms for two image segmentation tasks.