Goto

Collaborating Authors

 Tsiligkaridis, Theodoros


Misclassification-Aware Gaussian Smoothing improves Robustness against Domain Shifts

arXiv.org Artificial Intelligence

Deep neural networks achieve high prediction accuracy when the train and test distributions coincide. However, in practice various types of corruptions can deviate from this setup and performance can be heavily degraded. There have been only a few methods to address generalization in presence of unexpected domain shifts observed during deployment. In this paper, a misclassification-aware Gaussian smoothing approach is presented to improve the robustness of image classifiers against a variety of corruptions while maintaining clean accuracy. The intuition behind our proposed misclassification-aware objective is revealed through bounds on the local loss deviation in the small-noise regime. When our method is coupled with additional data augmentations, it is empirically shown to improve upon the state-of-the-art in robustness and uncertainty calibration on several image classification tasks.


Second Order Optimization for Adversarial Robustness and Interpretability

arXiv.org Machine Learning

Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique aimed at learning features robust to such attacks and is widely regarded as a very effective defense. However, the computational cost of such training can be prohibitive as the network size and input dimensions grow. Inspired by the relationship between robustness and curvature, we propose a novel regularizer which incorporates first and second order information via a quadratic approximation to the adversarial loss. The worst case quadratic loss is approximated via an iterative scheme. It is shown that using only a single iteration in our regularizer achieves stronger robustness than prior gradient and curvature regularization schemes, avoids gradient obfuscation, and, with additional iterations, achieves strong robustness with significantly lower training time than AT. Further, it retains the interesting facet of AT that networks learn features which are well-aligned with human perception. We demonstrate experimentally that our method produces higher quality human-interpretable features than other geometric regularization techniques. These robust features are then used to provide human-friendly explanations to model predictions.


Information Robust Dirichlet Networks for Predictive Uncertainty Estimation

arXiv.org Machine Learning

Precise estimation of uncertainty in predictions for AI systems is a critical factor in ensuring trust and safety. Conventional neural networks tend to be overconfident as they do not account for uncertainty during training. In contrast to Bayesian neural networks that learn approximate distributions on weights to infer prediction confidence, we propose a novel method, Information Robust Dirichlet networks, that learns the Dirichlet distribution on prediction probabilities by minimizing the expected $L_p$ norm of the prediction error and an information divergence loss that penalizes information flow towards incorrect classes, while simultaneously maximizing differential entropy of small adversarial perturbations to provide accurate uncertainty estimates. Properties of the new cost function are derived to indicate how improved uncertainty estimation is achieved. Experiments using real datasets show that our technique outperforms state-of-the-art neural networks, by a large margin, for estimating in-distribution and out-of-distribution uncertainty, and detecting adversarial examples.


Accelerated Reinforcement Learning Algorithms with Nonparametric Function Approximation for Opportunistic Spectrum Access

arXiv.org Machine Learning

We study the problem of throughput maximization by predicting spectrum opportunities using reinforcement learning. Our kernel-based reinforcement learning approach is coupled with a sparsification technique that efficiently captures the environment states to control dimensionality and finds the best possible channel access actions based on the current state. This approach allows learning and planning over the intrinsic state-action space and extends well to large state and action spaces. For stationary Markov environments, we derive the optimal policy for channel access, its associated limiting throughput, and propose a fast online algorithm for achieving the optimal throughput. We then show that the maximum-likelihood channel prediction and access algorithm is suboptimal in general, and derive conditions under which the two algorithms are equivalent. For reactive Markov environments, we derive kernel variants of Q-learning, R-learning and propose an accelerated R-learning algorithm that achieves faster convergence. We finally test our algorithms against a generic reactive network. Simulation results are shown to validate the theory and show the performance gains over current state-of-the-art techniques.


Asynchronous Decentralized 20 Questions for Adaptive Search

arXiv.org Machine Learning

This paper considers the problem of adaptively searching for an unknown target using multiple agents connected through a time-varying network topology. Agents are equipped with sensors capable of fast information processing, and we propose a decentralized collaborative algorithm for controlling their search given noisy observations. Specifically, we propose decentralized extensions of the adaptive query-based search strategy that combines elements from the 20 questions approach and social learning. Under standard assumptions on the time-varying network dynamics, we prove convergence to correct consensus on the value of the parameter as the number of iterations go to infinity. The convergence analysis takes a novel approach using martingale-based techniques combined with spectral graph theory. Our results establish that stability and consistency can be maintained even with one-way updating and randomized pairwise averaging, thus providing a scalable low complexity method with performance guarantees. We illustrate the effectiveness of our algorithm for random network topologies.


Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models

Neural Information Processing Systems

We develop a sequential low-complexity inference procedure for Dirichlet process mixtures of Gaussians for online clustering and parameter estimation when the number of clusters are unknown a-priori. We present an easily computable, closed form parametric expression for the conditional likelihood, in which hyperparameters are recursively updated as a function of the streaming data assuming conjugate priors. Motivated by large-sample asymptotics, we propose a noveladaptive low-complexity design for the Dirichlet process concentration parameter and show that the number of classes grow at most at a logarithmic rate. We further prove that in the large-sample limit, the conditional likelihood and datapredictive distribution become asymptotically Gaussian. We demonstrate through experiments on synthetic and real data sets that our approach is superior to otheronline state-of-the-art methods.


Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models

arXiv.org Machine Learning

We develop a sequential low-complexity inference procedure for Dirichlet process mixtures of Gaussians for online clustering and parameter estimation when the number of clusters are unknown a-priori. We present an easily computable, closed form parametric expression for the conditional likelihood, in which hyperparameters are recursively updated as a function of the streaming data assuming conjugate priors. Motivated by large-sample asymptotics, we propose a novel adaptive low-complexity design for the Dirichlet process concentration parameter and show that the number of classes grow at most at a logarithmic rate. We further prove that in the large-sample limit, the conditional likelihood and data predictive distribution become asymptotically Gaussian. We demonstrate through experiments on synthetic and real data sets that our approach is superior to other online state-of-the-art methods.


Covariance Estimation in High Dimensions via Kronecker Product Expansions

arXiv.org Machine Learning

This paper presents a new method for estimating high dimensional covariance matrices. The method, permuted rank-penalized least-squares (PRLS), is based on a Kronecker product series expansion of the true covariance matrix. Assuming an i.i.d. Gaussian random sample, we establish high dimensional rates of convergence to the true covariance as both the number of samples and the number of variables go to infinity. For covariance matrices of low separation rank, our results establish that PRLS has significantly faster convergence than the standard sample covariance matrix (SCM) estimator. The convergence rate captures a fundamental tradeoff between estimation error and approximation error, thus providing a scalable covariance estimation framework in terms of separation rank, similar to low rank approximation of covariance matrices. The MSE convergence rates generalize the high dimensional rates recently obtained for the ML Flip-flop algorithm for Kronecker product covariance estimation. We show that a class of block Toeplitz covariance matrices is approximatable by low separation rank and give bounds on the minimal separation rank $r$ that ensures a given level of bias. Simulations are presented to validate the theoretical bounds. As a real world application, we illustrate the utility of the proposed Kronecker covariance estimator for spatio-temporal linear least squares prediction of multivariate wind speed measurements.


Convergence Properties of Kronecker Graphical Lasso Algorithms

arXiv.org Machine Learning

This paper studies iteration convergence of Kronecker graphical lasso (KGLasso) algorithms for estimating the covariance of an i.i.d. Gaussian random sample under a sparse Kronecker-product covariance model and MSE convergence rates. The KGlasso model, originally called the transposable regularized covariance model by Allen ["Transposable regularized covariance models with an application to missing data imputation," Ann. Appl. Statist., vol. 4, no. 2, pp. 764-790, 2010], implements a pair of $\ell_1$ penalties on each Kronecker factor to enforce sparsity in the covariance estimator. The KGlasso algorithm generalizes Glasso, introduced by Yuan and Lin ["Model selection and estimation in the Gaussian graphical model," Biometrika, vol. 94, pp. 19-35, 2007] and Banerjee ["Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data," J. Mach. Learn. Res., vol. 9, pp. 485-516, Mar. 2008], to estimate covariances having Kronecker product form. It also generalizes the unpenalized ML flip-flop (FF) algorithm of Dutilleul ["The MLE algorithm for the matrix normal distribution," J. Statist. Comput. Simul., vol. 64, pp. 105-123, 1999] and Werner ["On estimation of covariance matrices with Kronecker product structure," IEEE Trans. Signal Process., vol. 56, no. 2, pp. 478-491, Feb. 2008] to estimation of sparse Kronecker factors. We establish that the KGlasso iterates converge pointwise to a local maximum of the penalized likelihood function. We derive high dimensional rates of convergence to the true covariance as both the number of samples and the number of variables go to infinity. Our results establish that KGlasso has significantly faster asymptotic convergence than Glasso and FF. Simulations are presented that validate the results of our analysis.


Kronecker Sum Decompositions of Space-Time Data

arXiv.org Machine Learning

In this paper we consider the use of the space vs. time Kronecker product decomposition in the estimation of covariance matrices for spatio-temporal data. This decomposition imposes lower dimensional structure on the estimated covariance matrix, thus reducing the number of samples required for estimation. To allow a smooth tradeoff between the reduction in the number of parameters (to reduce estimation variance) and the accuracy of the covariance approximation (affecting estimation bias), we introduce a diagonally loaded modification of the sum of kronecker products representation [1]. We derive a Cramer-Rao bound (CRB) on the minimum attainable mean squared predictor coefficient estimation error for unbiased estimators of Kronecker structured covariance matrices. We illustrate the accuracy of the diagonally loaded Kronecker sum decomposition by applying it to video data of human activity.