Goto

Collaborating Authors

 Vaswani, Namrata


Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

arXiv.org Machine Learning

We study how representation learning can improve the learning efficiency of contextual bandit problems. We study the setting where we play T contextual linear bandits with dimension d simultaneously, and these T bandit tasks collectively share a common linear representation with a dimensionality of r much smaller than d. We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator to recover a low-rank feature matrix. Using the proposed estimator, we present a multi-task learning algorithm for linear contextual bandits and prove the regret bound of our algorithm. We presented experiments and compared the performance of our algorithm against benchmark algorithms.


Democratizing Signal Processing and Machine Learning: Math Learning Equity for Elementary and Middle School Students

arXiv.org Artificial Intelligence

Signal Processing (SP) and Machine Learning (ML) rely on good math and coding knowledge, in particular, linear algebra, probability, and complex numbers. A good grasp of these relies on scalar algebra learned in middle school. The ability to understand and use scalar algebra well, in turn, relies on a good foundation in basic arithmetic. Because of various systemic barriers, many students are not able to build a strong foundation in arithmetic in elementary school. This leads them to struggle with algebra and everything after that. Since math learning is cumulative, the gap between those without a strong early foundation and everyone else keeps increasing over the school years and becomes difficult to fill in college. In this article we discuss how SP faculty and graduate students can play an important role in starting, and participating in, university-run (or other) out-of-school math support programs to supplement students' learning. Two example programs run by the authors (CyMath at ISU and Ab7G at Purdue) are briefly described. The second goal of this article is to use our perspective as SP, and engineering, educators who have seen the long-term impact of elementary school math teaching policies, to provide some simple almost zero cost suggestions that elementary schools could adopt to improve math learning: (i) more math practice in school, (ii) send small amounts of homework (individual work is critical in math), and (iii) parent awareness (math resources, need for early math foundation, clear in-school test information and sharing of feedback from the tests). In summary, good early math support (in school and through out-of-school programs) can help make SP and ML more accessible.


Efficient Federated Low Rank Matrix Completion

arXiv.org Artificial Intelligence

In this work, we develop and analyze a Gradient Descent (GD) based solution, called Alternating GD and Minimization (AltGDmin), for efficiently solving the low rank matrix completion (LRMC) in a federated setting. LRMC involves recovering an $n \times q$ rank-$r$ matrix $\Xstar$ from a subset of its entries when $r \ll \min(n,q)$. Our theoretical guarantees (iteration and sample complexity bounds) imply that AltGDmin is the most communication-efficient solution in a federated setting, is one of the fastest, and has the second best sample complexity among all iterative solutions to LRMC. In addition, we also prove two important corollaries. (a) We provide a guarantee for AltGDmin for solving the noisy LRMC problem. (b) We show how our lemmas can be used to provide an improved sample complexity guarantee for AltMin, which is the fastest centralized solution.


Byzantine-Resilient Federated PCA and Low Rank Matrix Recovery

arXiv.org Machine Learning

In this work we consider the problem of estimating the principal subspace (span of the top r singular vectors) of a symmetric matrix in a federated setting, when each node has access to estimates of this matrix. We study how to make this problem Byzantine resilient. We introduce a novel provably Byzantine-resilient, communication-efficient, and private algorithm, called Subspace-Median, to solve it. We also study the most natural solution for this problem, a geometric median based modification of the federated power method, and explain why it is not useful. We consider two special cases of the resilient subspace estimation meta-problem - federated principal components analysis (PCA) and the spectral initialization step of horizontally federated low rank column-wise sensing (LRCCS) in this work. For both these problems we show how Subspace Median provides a resilient solution that is also communication-efficient. Median of Means extensions are developed for both problems. Extensive simulation experiments are used to corroborate our theoretical guarantees. Our second contribution is a complete AltGDmin based algorithm for Byzantine-resilient horizontally federated LRCCS and guarantees for it. We do this by developing a geometric median of means estimator for aggregating the partial gradients computed at each node, and using Subspace Median for initialization.


Detection and Mitigation of Byzantine Attacks in Distributed Training

arXiv.org Artificial Intelligence

A plethora of modern machine learning tasks require the utilization of large-scale distributed clusters as a critical component of the training pipeline. However, abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference. Such behavior can be attributed to unintentional system malfunctions or orchestrated attacks; as a result, some nodes may return arbitrary results to the parameter server (PS) that coordinates the training. Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients. In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities which only change every few iterations at a time. Our algorithms rely on redundant task assignments coupled with detection of adversarial behavior. We also show the convergence of our method to the optimal point under common assumptions and settings considered in literature. For strong attacks, we demonstrate a reduction in the fraction of distorted gradients ranging from 16%-99% as compared to the prior state-of-the-art. Our top-1 classification accuracy results on the CIFAR-10 data set demonstrate 25% advantage in accuracy (averaged over strong and weak scenarios) under the most sophisticated attacks compared to state-of-the-art methods.


Non-Convex Structured Phase Retrieval

arXiv.org Machine Learning

Phase retrieval (PR), also sometimes referred to as quadratic sensing, is a problem that occurs in numerous signal and image acquisition domains ranging from optics, X-ray crystallography, Fourier ptychography, sub-diffraction imaging, and astronomy. In each of these domains, the physics of the acquisition system dictates that only the magnitude (intensity) of certain linear projections of the signal or image can be measured. Without any assumptions on the unknown signal, accurate recovery necessarily requires an over-complete set of measurements. The only way to reduce the measurements/sample complexity is to place extra assumptions on the unknown signal/image. A simple and practically valid set of assumptions is obtained by exploiting the structure inherently present in many natural signals or sequences of signals. Two commonly used structural assumptions are (i) sparsity of a given signal/image or (ii) a low rank model on the matrix formed by a set, e.g., a time sequence, of signals/images. Both have been explored for solving the PR problem in a sample-efficient fashion. This article describes this work, with a focus on non-convex approaches that come with sample complexity guarantees under simple assumptions. We also briefly describe other different types of structural assumptions that have been used in recent literature.


Fast Robust Subspace Tracking via PCA in Sparse Data-Dependent Noise

arXiv.org Machine Learning

This work studies the robust subspace tracking (ST) problem. Robust ST can be simply understood as a (slow) time-varying subspace extension of robust PCA. It assumes that the true data lies in a low-dimensional subspace that is either fixed or changes slowly with time. The goal is to track the changing subspaces over time in the presence of additive sparse outliers and to do this quickly (with a short delay). We introduce a ``fast'' mini-batch robust ST solution that is provably correct under mild assumptions. Here ``fast'' means two things: (i) the subspace changes can be detected and the subspaces can be tracked with near-optimal delay, and (ii) the time complexity of doing this is the same as that of simple (non-robust) PCA. Our main result assumes piecewise constant subspaces (needed for identifiability), but we also provide a corollary for the case when there is a little change at each time. A second contribution is a novel non-asymptotic guarantee for PCA in linearly data-dependent noise. An important setting where this result is useful is for linearly data-dependent noise that is sparse with enough support changes over time. The subspace update step of our proposed robust ST solution uses this result.


Correlated-PCA: Principal Components' Analysis when Data and Noise are Correlated

Neural Information Processing Systems

Given a matrix of observed data, Principal Components Analysis (PCA) computes a small number of orthogonal directions that contain most of its variability. Provably accurate solutions for PCA have been in use for a long time. However, to the best of our knowledge, all existing theoretical guarantees for it assume that the data and the corrupting noise are mutually independent, or at least uncorrelated. This is valid in practice often, but not always. In this paper, we study the PCA problem in the setting where the data and noise can be correlated.


Phaseless Low Rank Matrix Recovery and Subspace Tracking

arXiv.org Machine Learning

Abstract--This work introduces the first simple and provably correct solution for recovering a low-rank matrix from phaseless (magnitude-only)linear projections of each of its columns. This problem finds important applications in phaseless dynamic imaging, e.g., Fourier ptychographic imaging of live biological specimens. We demonstrate the practical advantage of our proposed approach, AltMinLowRaP, over existing work via extensive simulation, and some real-data, experiments. We also provide a solution for a dynamic extension of the above problem. This allows the low-dimensional subspace from which each image/signal is generated to change with time in a piecewise constant fashion. I. INTRODUCTION In recent years, there has been a resurgence of interest in the classical "phase retrieval (PR)" problem [1], [2]. These are commonly referred to as phaseless linear projections of the unknown signal. While practical PR methods have existed for a long time, e.g., see [1], [2], the focus of the recent work has been on obtaining correctness guarantees for these and newer algorithms. This line of work includes convex relaxation methods [3], [4] as well as non-convex methods [5], [6], [7], [8], [9]. It is easy to see that, without extra assumptions, PR requires m n. The best known guarantees - see [7] and followup works - prove exact recovery with high probability (whp) with order-optimal number of measurements/samples: m Cn.


Subspace Tracking from Missing and Outlier Corrupted Data

arXiv.org Machine Learning

We study the related problems of subspace tracking in the presence of missing data (ST-miss) as well as robust subspace tracking with missing data (RST-miss). Here "robust" refers to robustness to sparse outliers. In recent work, we have studied the RST problem without missing data. In this work, we show that simple modifications of our solution approach for RST also provably solve ST-miss and RST-miss under weaker and similar assumptions respectively. To our knowledge, our result is the first complete guarantee for both ST-miss and RST-miss. This means we are able to show that, under assumptions on only the algorithm inputs (input data and/or initialization), the output subspace estimates are close to the true data subspaces at all times. Our guarantees hold under mild and easily interpretable assumptions and handle time-varying subspaces (unlike all previous work). We also show that our algorithm and its extensions are fast and have competitive experimental performance when compared with existing methods.