Goto

Collaborating Authors

 Country


Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms

arXiv.org Machine Learning

The statistical analysis of Randomized Numerical Linear Algebra (RandNLA) algorithms within the past few years has mostly focused on their performance as point estimators. However, this is insufficient for conducting statistical inference, e.g., constructing confidence intervals and hypothesis testing, since the distribution of the estimator is lacking. In this article, we develop an asymptotic analysis to derive the distribution of RandNLA sampling estimators for the least-squares problem. In particular, we derive the asymptotic distribution of a general sampling estimator with arbitrary sampling probabilities. The analysis is conducted in two complementary settings, i.e., when the objective of interest is to approximate the full sample estimator or is to infer the underlying ground truth model parameters. For each setting, we show that the sampling estimator is asymptotically normally distributed under mild regularity conditions. Moreover, the sampling estimator is asymptotically unbiased in both settings. Based on our asymptotic analysis, we use two criteria, the Asymptotic Mean Squared Error (AMSE) and the Expected Asymptotic Mean Squared Error (EAMSE), to identify optimal sampling probabilities. Several of these optimal sampling probability distributions are new to the literature, e.g., the root leverage sampling estimator and the predictor length sampling estimator. Our theoretical results clarify the role of leverage in the sampling process, and our empirical results demonstrate improvements over existing methods.


On Pruning Adversarially Robust Neural Networks

arXiv.org Machine Learning

In safety-critical but computationally resource-constrained applications, deep learning faces two key challenges: lack of robustness against adversarial attacks and large neural network size (often millions of parameters). While the research community has extensively explored the use of robust training and network pruning \emph{independently} to address one of these challenges, we show that integrating existing pruning techniques with multiple types of robust training techniques, including verifiably robust training, leads to poor robust accuracy even though such techniques can preserve high regular accuracy. We further demonstrate that making pruning techniques aware of the robust learning objective can lead to a large improvement in performance. We realize this insight by formulating the pruning objective as an empirical risk minimization problem which is then solved using SGD. We demonstrate the success of the proposed pruning technique across CIFAR-10, SVHN, and ImageNet dataset with four different robust training techniques: iterative adversarial training, randomized smoothing, MixTrain, and CROWN-IBP. Specifically, at 99\% connection pruning ratio, we achieve gains up to 3.2, 10.0, and 17.8 percentage points in robust accuracy under state-of-the-art adversarial attacks for ImageNet, CIFAR-10, and SVHN dataset, respectively. Our code and compressed networks are publicly available at https://github.com/inspire-group/compactness-robustness


Variational Hyper RNN for Sequence Modeling

arXiv.org Machine Learning

In this work, we propose a novel probabilistic sequence model that excels at capturing high variability in time series data, both across sequences and within an individual sequence. Our method uses temporal latent variables to capture information about the underlying data pattern and dynamically decodes the latent information into modifications of weights of the base decoder and recurrent model. The efficacy of the proposed method is demonstrated on a range of synthetic and real-world sequential data that exhibit large scale variations, regime shifts, and complex dynamics.


Interpolating Between Gradient Descent and Exponentiated Gradient Using Reparameterized Gradient Descent

arXiv.org Machine Learning

Continuous-time mirror descent (CMD) can be seen as the limit case of the discrete-time MD update when the step-size is infinitesimally small. In this paper, we focus on the geometry of the primal and dual CMD updates and introduce a general framework for reparameterizing one CMD update as another. Specifically, the reparameterized update also corresponds to a CMD, but on the composite loss w.r.t. the new variables, and the original variables are obtained via the reparameterization map. We employ these results to introduce a new family of reparameterizations that interpolate between the two commonly used updates, namely the continuous-time gradient descent (GD) and unnormalized exponentiated gradient (EGU), while extending to many other well-known updates. In particular, we show that for the underdetermined linear regression problem, these updates generalize the known behavior of GD and EGU, and provably converge to the minimum $\mathrm{L}_{2-\tau}$-norm solution for $\tau\in[0,1]$. Our new results also have implications for the regularized training of neural networks to induce sparsity.


Precise Tradeoffs in Adversarial Training for Linear Regression

arXiv.org Machine Learning

Despite breakthrough performance, modern learning models are known to be highly vulnerable to small adversarial perturbations in their inputs. While a wide variety of recent \emph{adversarial training} methods have been effective at improving robustness to perturbed inputs (robust accuracy), often this benefit is accompanied by a decrease in accuracy on benign inputs (standard accuracy), leading to a tradeoff between often competing objectives. Complicating matters further, recent empirical evidence suggest that a variety of other factors (size and quality of training data, model size, etc.) affect this tradeoff in somewhat surprising ways. In this paper we provide a precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features. In particular, we characterize the fundamental tradeoff between the accuracies achievable by any algorithm regardless of computational power or size of the training data. Furthermore, we precisely characterize the standard/robust accuracy and the corresponding tradeoff achieved by a contemporary mini-max adversarial training approach in a high-dimensional regime where the number of data points and the parameters of the model grow in proportion to each other. Our theory for adversarial training algorithms also facilitates the rigorous study of how a variety of factors (size and quality of training data, model overparametrization etc.) affect the tradeoff between these two competing accuracies.


Deep Nearest Neighbor Anomaly Detection

arXiv.org Machine Learning

Nearest neighbors is a successful and long-standing technique for anomaly detection. Significant progress has been recently achieved by self-supervised deep methods (e.g. RotNet). Self-supervised features however typically under-perform Imagenet pre-trained features. In this work, we investigate whether the recent progress can indeed outperform nearest-neighbor methods operating on an Imagenet pretrained feature space. The simple nearest-neighbor based-approach is experimentally shown to outperform self-supervised methods in: accuracy, few shot generalization, training time and noise robustness while making fewer assumptions on image distributions.


LogicGAN: Logic-guided Generative Adversarial Networks

arXiv.org Machine Learning

Generative Adversarial Networks (GANs) are a revolutionary class of Deep Neural Networks (DNNs) that have been successfully used to generate realistic images, music, text, and other data. However, it is well known that GAN training can be notoriously resource-intensive and presents many challenges. Further, a potential weakness in GANs is that discriminator DNNs typically provide only one value (loss) of corrective feedback to generator DNNs (namely, the discriminator's assessment of the generated example). By contrast, we propose a new class of GAN we refer to as LogicGAN, that leverages recent advances in (logic-based) explainable AI (xAI) systems to provide a "richer" form of corrective feedback from discriminators to generators. Specifically, we modify the gradient descent process using xAI systems that specify the reason as to why the discriminator made the classification it did, thus providing the richer corrective feedback that helps the generator to better fool the discriminator. Using our approach, we show that LogicGANs learn much faster on MNIST data, achieving an improvement in data efficiency of 45% in single and 12.73% in multi-class setting over standard GANs while maintaining the same quality as measured by Fr\'echet Inception Distance. Further, we argue that LogicGAN enables users greater control over how models learn than standard GAN systems.


Learning Structured Distributions From Untrusted Batches: Faster and Simpler

arXiv.org Machine Learning

We revisit the problem of learning from untrusted batches introduced by Qiao and Valiant [QV17]. Recently, Jain and Orlitsky [JO19] gave a simple semidefinite programming approach based on the cut-norm that achieves essentially information-theoretically optimal error in polynomial time. Concurrently, Chen et al. [CLM19] considered a variant of the problem where $\mu$ is assumed to be structured, e.g. log-concave, monotone hazard rate, $t$-modal, etc. In this case, it is possible to achieve the same error with sample complexity sublinear in $n$, and they exhibited a quasi-polynomial time algorithm for doing so using Haar wavelets. In this paper, we find an appealing way to synthesize the techniques of [JO19] and [CLM19] to give the best of both worlds: an algorithm which runs in polynomial time and can exploit structure in the underlying distribution to achieve sublinear sample complexity. Along the way, we simplify the approach of [JO19] by avoiding the need for SDP rounding and giving a more direct interpretation of it through the lens of soft filtering, a powerful recent technique in high-dimensional robust estimation.


Neural Message Passing on High Order Paths

arXiv.org Machine Learning

Graph neural network have achieved impressive results in predicting molecular properties, but they do not directly account for local and hidden structures in the graph such as functional groups and molecular geometry. At each propagation step, GNNs aggregate only over first order neighbours, ignoring important information contained in subsequent neighbours as well as the relationships between those higher order connections. In this work, we generalize graph neural nets to pass messages and aggregate across higher order paths. This allows for information to propagate over various levels and substructures of the graph. We demonstrate our model on a few tasks in molecular property prediction.


Clustering and Classification with Non-Existence Attributes: A Sentenced Discrepancy Measure Based Technique

arXiv.org Machine Learning

For some or all of the data instances a number of independentworld clustering issues suffer from incomplete data characterization due to losing or absent attributes. Typical clustering approaches cannot be applied directly to such data unless pre-processing by techniques like imputation or marginalization. We have overcome this drawback by utilizing a Sentenced Discrepancy Measure which we refer to as the Attribute Weighted Penalty based Discrepancy (AWPD). Using the AWPD measure, we modified the K-MEANS and Scalable K-MEANS for clustering algorithm and k Nearest Neighbor (kNN) for classification so as to make them directly applicable to datasets with nonexistence attributes. We have presented a detailed theoretical analysis which shows that the new AWPD based K-MEANS, Scalable K-MEANS and kNN algorithm merge into a local prime among the number of iterations is finite. We have reported in depth experiments on numerous benchmark datasets for various forms of Non-Existence showing that the projected clustering and classification techniques usually show better results in comparison to some of the renowned imputation methods that are generally used to process such insufficient data. This technique is designed to trace invaluable data to: directly apply our method on the datasets which have Non-Existence attributes and establish a method for detecting unstructured Non-Existence attributes with the best accuracy rate and minimum cost. Keywords: Clustering, Classification, Non-Existence Attributes, Unstructured Non-Existence, Sentenced Discrepancy Measure (SDM), Attribute Weighted Penalty based Discrepancy (AWPD).