Goto

Collaborating Authors

 Banff


Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly Detection

arXiv.org Artificial Intelligence

This paper proposes a new defense against neural network backdooring attacks that are maliciously trained to mispredict in the presence of attacker-chosen triggers. Our defense is based on the intuition that the feature extraction layers of a backdoored network embed new features to detect the presence of a trigger and the subsequent classification layers learn to mispredict when triggers are detected. Therefore, to detect backdoors, the proposed defense uses two synergistic anomaly detectors trained on clean validation data: the first is a novelty detector that checks for anomalous features, while the second detects anomalous mappings from features to outputs by comparing with a separate classifier trained on validation data. The approach is evaluated on a wide range of backdoored networks (with multiple variations of triggers) that successfully evade state-of-the-art defenses. Additionally, we evaluate the robustness of our approach on imperceptible perturbations, scalability on large-scale datasets, and effectiveness under domain shift. This paper also shows that the defense can be further improved using data augmentation.


Quantized Variational Inference

arXiv.org Machine Learning

We show how Optimal Voronoi Tesselation produces variance free gradients for Evidence Lower Bound (ELBO) optimization at the cost of introducing asymptotically decaying bias. Subsequently, we propose a Richardson extrapolation type method to improve the asymptotic bound. We show that using the Quantized Variational Inference framework leads to fast convergence for both score function and the reparametrized gradient estimator at a comparable computational cost. Finally, we propose several experiments to assess the performance of our method and its limitations.


Latent Causal Invariant Model

arXiv.org Machine Learning

Current supervised learning can learn spurious correlation during the data-fitting process, imposing issues regarding interpretability, out-of-distribution (OOD) generalization, and robustness. To avoid spurious correlation, we propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction. Specifically, we introduce latent variables that are separated into (a) output-causative factors and (b) others that are spuriously correlated to the output via confounders, to model the underlying causal factors. We further assume the generating mechanisms from latent space to observed data to be causally invariant. We give the identifiable claim of such invariance, particularly the disentanglement of output-causative factors from others, as a theoretical guarantee for precise inference and avoiding spurious correlation. We propose a Variational-Bayesian-based method for estimation and to optimize over the latent space for prediction. The utility of our approach is verified by improved interpretability, prediction power on various OOD scenarios (including healthcare) and robustness on security.


Learning Causal Semantic Representation for Out-of-Distribution Prediction

arXiv.org Artificial Intelligence

Conventional supervised learning methods, especially deep ones, are found to be sensitive to out-of-distribution (OOD) examples, largely because the learned representation mixes the semantic factor with the variation factor due to their domain-specific correlation, while only the semantic factor causes the output. To address the problem, we propose a Causal Semantic Generative model (CSG) based on causality to model the two factors separately, and learn it on a single training domain for prediction without (OOD generalization) or with (domain adaptation) unsupervised data in a test domain. We prove that CSG identifies the semantic factor on the training domain, and the invariance principle of causality subsequently guarantees the boundedness of OOD generalization error and the success of adaptation. We design learning methods for both effective learning and easy prediction, by leveraging the graphical structure of CSG. Empirical study demonstrates the effect of our methods to improve test accuracy for OOD generalization and domain adaptation.


Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence

arXiv.org Machine Learning

Learning from different data types is a long-standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. However, existing generative models that approximate a multimodal ELBO rely on difficult or inefficient training schemes to learn a joint distribution and the dependencies between modalities. In this work, we propose a novel, efficient objective function that utilizes the Jensen-Shannon divergence for multiple distributions. It simultaneously approximates the unimodal and joint multimodal posteriors directly via a dynamic prior. In addition, we theoretically prove that the new multimodal JS-divergence (mmJSD) objective optimizes an ELBO. In extensive experiments, we demonstrate the advantage of the proposed mmJSD model compared to previous work in unsupervised, generative learning tasks.


Reinforcement Learning with Efficient Active Feature Acquisition

arXiv.org Artificial Intelligence

Solving real-life sequential decision making problems under partial observability involves an exploration-exploitation problem. To be successful, an agent needs to efficiently gather valuable information about the state of the world for making rewarding decisions. However, in real-life, acquiring valuable information is often highly costly, e.g., in the medical domain, information acquisition might correspond to performing a medical test on a patient. This poses a significant challenge for the agent to perform optimally for the task while reducing the cost for information acquisition. In this paper, we propose a model-based reinforcement learning framework that learns an active feature acquisition policy to solve the exploration-exploitation problem during its execution. Key to the success is a novel sequential variational auto-encoder that learns high-quality representations from partially observed states, which are then used by the policy to maximize the task reward in a cost efficient manner. We demonstrate the efficacy of our proposed framework in a control domain as well as using a medical simulator. In both tasks, our proposed method outperforms conventional baselines and results in policies with greater cost efficiency.


Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection

arXiv.org Machine Learning

Weakly Labelled learning has garnered lot of attention in recent years due to its potential to scale Sound Event Detection (SED) and is formulated as Multiple Instance Learning (MIL) problem. This paper proposes a Multi-Task Learning (MTL) framework for learning from Weakly Labelled Audio data which encompasses the traditional MIL setup. To show the utility of proposed framework, we use the input TimeFrequency representation (T-F) reconstruction as the auxiliary task. We show that the chosen auxiliary task de-noises internal T-F representation and improves SED performance under noisy recordings. Our second contribution is introducing two step Attention Pooling mechanism. By having 2-steps in attention mechanism, the network retains better T-F level information without compromising SED performance. The visualisation of first step and second step attention weights helps in localising the audio-event in T-F domain. For evaluating the proposed framework, we remix the DCASE 2019 task 1 acoustic scene data with DCASE 2018 Task 2 sounds event data under 0, 10 and 20 db SNR resulting in a multi-class Weakly labelled SED problem. The proposed total framework outperforms existing benchmark models over all SNRs, specifically 22.3 %, 12.8 %, 5.9 % improvement over benchmark model on 0, 10 and 20 dB SNR respectively. We carry out ablation study to determine the contribution of each auxiliary task and 2-step Attention Pooling to the SED performance improvement. The code is publicly released


Learning Latent Space Energy-Based Prior Model

arXiv.org Machine Learning

We propose to learn energy-based model (EBM) in the latent space of a generator model, so that the EBM serves as a prior model that stands on the top-down network of the generator model. Both the latent space EBM and the top-down network can be learned jointly by maximum likelihood, which involves short-run MCMC sampling from both the prior and posterior distributions of the latent vector. Due to the low dimensionality of the latent space and the expressiveness of the top-down network, a simple EBM in latent space can capture regularities in the data effectively, and MCMC sampling in latent space is efficient and mixes well. We show that the learned model exhibits strong performances in terms of image and text generation and anomaly detection. The one-page code can be found in supplementary materials.


FaceLeaks: Inference Attacks against Transfer Learning Models via Black-box Queries

arXiv.org Machine Learning

Transfer learning is a useful machine learning framework that allows one to build task-specific models (student models) without significantly incurring training costs using a single powerful model (teacher model) pre-trained with a large amount of data. The teacher model may contain private data, or interact with private inputs. We investigate if one can leak or infer such private information without interacting with the teacher model directly. We describe such inference attacks in the context of face recognition, an application of transfer learning that is highly sensitive to personal privacy. Under black-box and realistic settings, we show that existing inference techniques are ineffective, as interacting with individual training instances through the student models does not reveal information about the teacher. We then propose novel strategies to infer from aggregate-level information. Consequently, membership inference attacks on the teacher model are shown to be possible, even when the adversary has access only to the student models. We further demonstrate that sensitive attributes can be inferred, even in the case where the adversary has limited auxiliary information. Finally, defensive strategies are discussed and evaluated. Our extensive study indicates that information leakage is a real privacy threat to the transfer learning framework widely used in real-life situations.


Wide and Deep Graph Neural Networks with Distributed Online Learning

arXiv.org Machine Learning

Graph neural networks (GNNs) learn representations from network data with naturally distributed architectures, rendering them well-suited candidates for decentralized learning. Oftentimes, this decentralized graph support changes with time due to link failures or topology variations. These changes create a mismatch between the graphs on which GNNs were trained and the ones on which they are tested. Online learning can be used to retrain GNNs at testing time, overcoming this issue. However, most online algorithms are centralized and work on convex problems (which GNNs rarely lead to). This paper proposes the Wide and Deep GNN (WD-GNN), a novel architecture that can be easily updated with distributed online learning mechanisms. The WD-GNN comprises two components: the wide part is a bank of linear graph filters and the deep part is a GNN. At training time, the joint architecture learns a nonlinear representation from data. At testing time, the deep part (nonlinear) is left unchanged, while the wide part is retrained online, leading to a convex problem. We derive convergence guarantees for this online retraining procedure and further propose a decentralized alternative. Experiments on the robot swarm control for flocking corroborate theory and show potential of the proposed architecture for distributed online learning.