Plotting

 Paul, Debolina


Robust and Automatic Data Clustering: Dirichlet Process meets Median-of-Means

arXiv.org Machine Learning

Clustering stands as one of the most prominent challenges within the realm of unsupervised machine learning. Among the array of centroid-based clustering algorithms, the classic $k$-means algorithm, rooted in Lloyd's heuristic, takes center stage as one of the extensively employed techniques in the literature. Nonetheless, both $k$-means and its variants grapple with noteworthy limitations. These encompass a heavy reliance on initial cluster centroids, susceptibility to converging into local minima of the objective function, and sensitivity to outliers and noise in the data. When confronted with data containing noisy or outlier-laden observations, the Median-of-Means (MoM) estimator emerges as a stabilizing force for any centroid-based clustering framework. On a different note, a prevalent constraint among existing clustering methodologies resides in the prerequisite knowledge of the number of clusters prior to analysis. Utilizing model-based methodologies, such as Bayesian nonparametric models, offers the advantage of infinite mixture models, thereby circumventing the need for such requirements. Motivated by these facts, in this article, we present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies that mitigates the effect of noise on the quality of clustering while ensuring that the number of clusters need not be specified in advance. Statistical guarantees on the upper bound of clustering error, and rigorous assessment through simulated and real datasets suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.


Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates and Model Misspecification

arXiv.org Machine Learning

Linear prediction is the cornerstone of a significant group of statistical learning algorithms including linear regression, Support Vector Machines (SVM), regularized regressions (such as ridge, elastic net, lasso, and its variants), logistic regression, Poisson regression, probit models, single-layer perceptrons, and tensor regression, just to name a few. Thus, developing a deeper understanding of the pertinent linear prediction models and generalizing the methods to provide unified theoretical bounds is of critical importance to the machine learning community. For the past few decades, researchers have unveiled different aspects of these linear models. Bartlett and Shawe-Taylor (1999) obtained high confidence generalization error bounds for SVMs and other learning algorithms such as boosting and Bayesian posterior classifier. Vapnik-Chervonenkis (VC) theory (Vapnik, 2013) and Rademacher complexity (Bartlett and Mendelson, 2001, 2002) have been instrumental in the machine learning literature to provide generalization bounds (Shalev-Shwartz and Ben-David, 2014). Theoretical properties of the multiple-instance extensions of SVM were analyzed by Doran and Ray (2014). Joint first authors contributed equally to this work.


Uniform Concentration Bounds toward a Unified Framework for Robust Clustering

arXiv.org Machine Learning

Recent advances in center-based clustering continue to improve upon the drawbacks of Lloyd's celebrated $k$-means algorithm over $60$ years after its introduction. Various methods seek to address poor local minima, sensitivity to outliers, and data that are not well-suited to Euclidean measures of fit, but many are supported largely empirically. Moreover, combining such approaches in a piecemeal manner can result in ad hoc methods, and the limited theoretical results supporting each individual contribution may no longer hold. Toward addressing these issues in a principled way, this paper proposes a cohesive robust framework for center-based clustering under a general class of dissimilarity measures. In particular, we present a rigorous theoretical treatment within a Median-of-Means (MoM) estimation framework, showing that it subsumes several popular $k$-means variants. In addition to unifying existing methods, we derive uniform concentration bounds that complete their analyses, and bridge these results to the MoM framework via Dudley's chaining arguments. Importantly, we neither require any assumptions on the distribution of the outlying observations nor on the relative number of observations $n$ to features $p$. We establish strong consistency and an error rate of $O(n^{-1/2})$ under mild conditions, surpassing the best-known results in the literature. The methods are empirically validated thoroughly on real and synthetic datasets.


Robust Principal Component Analysis: A Median of Means Approach

arXiv.org Machine Learning

Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well known to fall prey to the presence of outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Recent supervised learning methods, following the Median of Means (MoM) philosophy, have shown great success in dealing with outlying observations without much compromise to their large sample theoretical properties. In this paper, we propose a PCA procedure based on the MoM principle. Called the Median of Means Principal Component Analysis (MoMPCA), the proposed method is not only computationally appealing but also achieves optimal convergence rates under minimal assumptions. In particular, we explore the non-asymptotic error bounds of the obtained solution via the aid of Vapnik-Chervonenkis theory and Rademacher complexity, while granting absolutely no assumption on the outlying observations. The efficacy of the proposal is also thoroughly showcased through simulations and real data applications.


Automated Clustering of High-dimensional Data with a Feature Weighted Mean Shift Algorithm

arXiv.org Machine Learning

Mean shift is a simple interactive procedure that gradually shifts data points towards the mode which denotes the highest density of data points in the region. Mean shift algorithms have been effectively used for data denoising, mode seeking, and finding the number of clusters in a dataset in an automated fashion. However, the merits of mean shift quickly fade away as the data dimensions increase and only a handful of features contain useful information about the cluster structure of the data. We propose a simple yet elegant feature-weighted variant of mean shift to efficiently learn the feature importance and thus, extending the merits of mean shift to high-dimensional data. The resulting algorithm not only outperforms the conventional mean shift clustering procedure but also preserves its computational simplicity. In addition, the proposed method comes with rigorous theoretical convergence guarantees and a convergence rate of at least a cubic order. The efficacy of our proposal is thoroughly assessed through experimental comparison against baseline and state-of-the-art clustering methods on synthetic as well as real-world datasets.


Kernel k-Means, By All Means: Algorithms and Strong Consistency

arXiv.org Machine Learning

Kernel $k$-means clustering is a powerful tool for unsupervised learning of non-linearly separable data. Since the earliest attempts, researchers have noted that such algorithms often become trapped by local minima arising from non-convexity of the underlying objective function. In this paper, we generalize recent results leveraging a general family of means to combat sub-optimal local solutions to the kernel and multi-kernel settings. Called Kernel Power $k$-Means, our algorithm makes use of majorization-minimization (MM) to better solve this non-convex problem. We show the method implicitly performs annealing in kernel feature space while retaining efficient, closed-form updates, and we rigorously characterize its convergence properties both from computational and statistical points of view. In particular, we characterize the large sample behavior of the proposed method by establishing strong consistency guarantees. Its merits are thoroughly validated on a suite of simulated datasets and real data benchmarks that feature non-linear and multi-view separation.


Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension reduction & clustering

arXiv.org Machine Learning

Clustering of data into groups of relatively similar observations is one of the canonical tasks in unsupervised learning. With an increasing focus in recent years on very richly parameterized models, there has been a corresponding emphasis in the literature on complex clustering algorithms. A popular theme has been on clustering on the latent variable level, while allowing estimation of both the clustering structure and a complex nonlinear mapping from the latent to observed data level. Such methods are appealing in being able to realistically generate data that are indistinguishable from the observed data, while clustering observations in a lower-dimensional space. A particularly popular strategy is to develop clustering algorithms based on variational autoencoders (VAEs). For example, instead of drawing the latent variables in a VAE from standard Gaussian distributions, one can use a mixture of Gaussians for model-based clustering (Dilokthanakul et al., 2016; Lim et al., 2020; Yang et al., 2019). The problem with this family of methods is that, with a rich enough deep neural network, VAEs can accurately approximate any data generating distribution regardless of the continuous density placed on the latent variables. If one uses a richer family of densities, such as a mixture model, then one can potentially approximate the data distribution using a simpler neural network structure. However, the inferred clusters are not reliable due to problems of non-identifiability.