Goto

Collaborating Authors

 sbl


General Pruning Criteria for Fast SBL

Möderl, Jakob, Leitinger, Erik, Fleury, Bernard Henri

arXiv.org Machine Learning

Sparse Bayesian learning (SBL) associates to each weight in the underlying linear model a hyperparameter by assuming that each weight is Gaussian distributed with zero mean and precision (inverse variance) equal to its associated hyperparameter. The method estimates the hyperparameters by marginalizing out the weights and performing (marginalized) maximum likelihood (ML) estimation. SBL returns many hyperparameter estimates to diverge to infinity, effectively setting the estimates of the corresponding weights to zero (i.e., pruning the corresponding weights from the model) and thereby yielding a sparse estimate of the weight vector. In this letter, we analyze the marginal likelihood as function of a single hyperparameter while keeping the others fixed, when the Gaussian assumptions on the noise samples and the weight distribution that underlies the derivation of SBL are weakened. We derive sufficient conditions that lead, on the one hand, to finite hyperparameter estimates and, on the other, to infinite ones. Finally, we show that in the Gaussian case, the two conditions are complementary and coincide with the pruning condition of fast SBL (F-SBL), thereby providing additional insights into this algorithm.


Flatness-aware Sequential Learning Generates Resilient Backdoors

Pham, Hoang, Ta, The-Anh, Tran, Anh, Doan, Khoa D.

arXiv.org Artificial Intelligence

Recently, backdoor attacks have become an emerging threat to the security of machine learning models. From the adversary's perspective, the implanted backdoors should be resistant to defensive algorithms, but some recently proposed fine-tuning defenses can remove these backdoors with notable efficacy. This is mainly due to the catastrophic forgetting (CF) property of deep neural networks. This paper counters CF of backdoors by leveraging continual learning (CL) techniques. We begin by investigating the connectivity between a backdoored and fine-tuned model in the loss landscape. Our analysis confirms that fine-tuning defenses, especially the more advanced ones, can easily push a poisoned model out of the backdoor regions, making it forget all about the backdoors. Based on this finding, we re-formulate backdoor training through the lens of CL and propose a novel framework, named Sequential Backdoor Learning (SBL), that can generate resilient backdoors. This framework separates the backdoor poisoning process into two tasks: the first task learns a backdoored model, while the second task, based on the CL principles, moves it to a backdoored region resistant to fine-tuning. We additionally propose to seek flatter backdoor regions via a sharpness-aware minimizer in the framework, further strengthening the durability of the implanted backdoor. Finally, we demonstrate the effectiveness of our method through extensive empirical experiments on several benchmark datasets in the backdoor domain. The source code is available at https://github.com/mail-research/SBL-resilient-backdoors


Covariance-Free Sparse Bayesian Learning

Lin, Alexander, Song, Andrew H., Bilgic, Berkin, Ba, Demba

arXiv.org Machine Learning

Sparse Bayesian learning (SBL) is a powerful framework for tackling the sparse coding problem while also providing uncertainty quantification. However, the most popular inference algorithms for SBL become too expensive for high-dimensional problems due to the need to maintain a large covariance matrix. To resolve this issue, we introduce a new SBL inference algorithm that avoids explicit computation of the covariance matrix, thereby saving significant time and space. Instead of performing costly matrix inversions, our covariance-free method solves multiple linear systems to obtain provably unbiased estimates of the posterior statistics needed by SBL. These systems can be solved in parallel, enabling further acceleration of the algorithm via graphics processing units. In practice, our method can be up to thousands of times faster than existing baselines, reducing hours of computation time to seconds. We showcase how our new algorithm enables SBL to tractably tackle high-dimensional signal recovery problems, such as deconvolution of calcium imaging data and multi-contrast reconstruction of magnetic resonance images. Finally, we open-source a toolbox containing all of our implementations to drive future research in SBL.


Handling Long-Tail Queries with Slice-Aware Conversational Systems

Wang, Cheng, Kim, Sun, Park, Taiwoo, Choudhary, Sajal, Park, Sunghyun, Kim, Young-Bum, Sarikaya, Ruhi, Lee, Sungjin

arXiv.org Artificial Intelligence

We have been witnessing the usefulness of conversational AI systems such as Siri and Alexa, directly impacting our daily lives. These systems normally rely on machine learning models evolving over time to provide quality user experience. However, the development and improvement of the models are challenging because they need to support both high (head) and low (tail) usage scenarios, requiring fine-grained modeling strategies for specific data subsets or slices. In this paper, we explore the recent concept of slice-based learning (SBL) (Chen et al., 2019) to improve our baseline conversational skill routing system on the tail yet critical query traffic. We first define a set of labeling functions to generate weak supervision data for the tail intents. We then extend the baseline model towards a slice-aware architecture, which monitors and improves the model performance on the selected tail intents. Applied to de-identified live traffic from a commercial conversational AI system, our experiments show that the slice-aware model is beneficial in improving model performance for the tail intents while maintaining the overall performance.


Building Blocks of Social Intelligence: Enabling Autonomy for Socially Intelligent and Assistive Robots

Mead, Ross Alan (University of Southern California) | Atrash, Amin (University of Southern California) | Kaszubski, Edward (University of Southern California) | Clair, Aaron St. (University of Southern California) | Greczek, Jillian (University of Southern California) | Clabaugh, Caitlyn (University of Southern California) | Kohan, Brian (University of Southern California) | Mataric, Maja J. (University of Southern California)

AAAI Conferences

Vocalics is the study of the nonverbal aspects of speech, such as volume, pitch, and rate. Our contribution is a parametric We present an overview of the control, recognition, decision-making, vocalic behavior controller that autonomously adjusts and learning techniques utilized by the Interaction the robot speaker volume based on models of how a Lab (robotics.usc.edu/interaction) at the University human user will hear speech produced by the robot. These of Southern California (USC) to enable autonomy in sociable models vary with distance, orientation, and perceived environmental and socially assistive robots. These techniques are implemented interference (Mead & Matarić 2014). Our future with two software libraries: 1) the Social Behavior work will investigate adapting the pitch and rate of speech Library (SBL) provides autonomous social behavior produced by a robot to improve user speech perception.


From Streamlined Combinatorial Search to Efficient Constructive Procedures

Bras, Ronan Le (Cornell University) | Gomes, Carla (Cornell University) | Selman, Bart (Cornell University)

AAAI Conferences

In recent years, significant progress in the area of search, constraint satisfaction, and automated reasoning has been driven in part by the study of challenge problems from combinatorics and finite algebra. This work has led to the discovery of interesting discrete structures with intricate mathematical properties. While some of those results have resolved open questions and conjectures, a shortcoming is that they generally do not provide further mathematical insights, from which one could derive more general observations. We propose an approach that integrates specialized combinatorial search, using so-called streamlining, with a human computation component. We use this approach to discover efficient constructive procedures for generating certain classes of combinatorial objects of any size. More specifically, using our framework, we discovered two complementary efficient constructions for generating so-called Spatially Balanced Latin squares (SBLS) of any order N, such that 2N+1 is prime. Previously constructions for SBLSs were not known. Our approach also enabled us to derive a new lower bound for so-called weak Schur numbers, improving on a series of earlier results for Schur numbers.


Exploiting Correlation in Sparse Signal Recovery Problems: Multiple Measurement Vectors, Block Sparsity, and Time-Varying Sparsity

Zhang, Zhilin, Rao, Bhaskar D.

arXiv.org Machine Learning

A trend in compressed sensing (CS) is to exploit structure for improved reconstruction performance. In the basic CS model, exploiting the clustering structure among nonzero elements in the solution vector has drawn much attention, and many algorithms have been proposed. However, few algorithms explicitly consider correlation within a cluster. Meanwhile, in the multiple measurement vector (MMV) model correlation among multiple solution vectors is largely ignored. Although several recently developed algorithms consider the exploitation of the correlation, these algorithms need to know a priori the correlation structure, thus limiting their effectiveness in practical problems. Recently, we developed a sparse Bayesian learning (SBL) algorithm, namely T-SBL, and its variants, which adaptively learn the correlation structure and exploit such correlation information to significantly improve reconstruction performance. Here we establish their connections to other popular algorithms, such as the group Lasso, iterative reweighted $\ell_1$ and $\ell_2$ algorithms, and algorithms for time-varying sparsity. We also provide strategies to improve these existing algorithms.


Sparse Estimation Using General Likelihoods and Non-Factorial Priors

Wipf, David P., Nagarajan, Srikantan S.

Neural Information Processing Systems

Finding maximally sparse representations from overcomplete feature dictionaries frequently involves minimizing a cost function composed of a likelihood (or data fit) term and a prior (or penalty function) that favors sparsity. While typically the prior is factorial, here we examine non-factorial alternatives that have a number of desirable properties relevant to sparse estimation and are easily implemented using an efficient, globally-convergent reweighted $\ell_1$ minimization procedure. The first method under consideration arises from the sparse Bayesian learning (SBL) framework. Although based on a highly non-convex underlying cost function, in the context of canonical sparse estimation problems, we prove uniform superiority of this method over the Lasso in that, (i) it can never do worse, and (ii) for any dictionary and sparsity profile, there will always exist cases where it does better. These results challenge the prevailing reliance on strictly convex penalty functions for finding sparse solutions. We then derive a new non-factorial variant with similar properties that exhibits further performance improvements in empirical tests. For both of these methods, as well as traditional factorial analogs, we demonstrate the effectiveness of reweighted $\ell_1$-norm algorithms in handling more general sparse estimation problems involving classification, group feature selection, and non-negativity constraints. As a byproduct of this development, a rigorous reformulation of sparse Bayesian classification (e.g., the relevance vector machine) is derived that, unlike the original, involves no approximation steps and descends a well-defined objective function.


Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

Rao, Bhaskar D., Wipf, David P.

Neural Information Processing Systems

Given a redundant dictionary of basis vectors (or atoms), our goal is to find maximally sparse representations of signals. Previously, we have argued that a sparse Bayesian learning (SBL) framework is particularly well-suited for this task, showing that it has far fewer local minima than other Bayesian-inspired strategies. In this paper, we provide further evidence forthis claim by proving a restricted equivalence condition, based on the distribution of the nonzero generating model weights, whereby the SBL solution will equal the maximally sparse representation. We also prove that if these nonzero weights are drawn from an approximate Jeffreys prior,then with probability approaching one, our equivalence condition issatisfied.


Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

Rao, Bhaskar D., Wipf, David P.

Neural Information Processing Systems

Given a redundant dictionary of basis vectors (or atoms), our goal is to find maximally sparse representations of signals. Previously, we have argued that a sparse Bayesian learning (SBL) framework is particularly well-suited for this task, showing that it has far fewer local minima than other Bayesian-inspired strategies. In this paper, we provide further evidence for this claim by proving a restricted equivalence condition, based on the distribution of the nonzero generating model weights, whereby the SBL solution will equal the maximally sparse representation. We also prove that if these nonzero weights are drawn from an approximate Jeffreys prior, then with probability approaching one, our equivalence condition is satisfied. Finally, we motivate the worst-case scenario for SBL and demonstrate that it is still better than the most widely used sparse representation algorithms.