Accuracy
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation
Yin, Fan, Li, Yao, Hsieh, Cho-Jui, Chang, Kai-Wei
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks and has drawn increasing attention from the Natural Language Processing (NLP) community. Despite the surge of new AED methods, our studies show that existing methods heavily rely on a shortcut to achieve good performance. In other words, current search-based adversarial attacks in NLP stop once model predictions change, and thus most adversarial examples generated by those attacks are located near model decision boundaries. To surpass this shortcut and fairly evaluate AED methods, we propose to test AED methods with \textbf{F}ar \textbf{B}oundary (\textbf{FB}) adversarial examples. Existing methods show worse than random guess performance under this scenario. To overcome this limitation, we propose a new technique, \textbf{ADDMU}, \textbf{a}dversary \textbf{d}etection with \textbf{d}ata and \textbf{m}odel \textbf{u}ncertainty, which combines two types of uncertainty estimation for both regular and FB adversarial example detection. Our new method outperforms previous methods by 3.6 and 6.0 \emph{AUC} points under each scenario. Finally, our analysis shows that the two types of uncertainty provided by \textbf{ADDMU} can be leveraged to characterize adversarial examples and identify the ones that contribute most to model's robustness in adversarial training.
Unobserved Local Structures Make Compositional Generalization Hard
Bogin, Ben, Gupta, Shivanshu, Berant, Jonathan
While recent work has convincingly showed that sequence-to-sequence models struggle to generalize to new compositions (termed compositional generalization), little is known on what makes compositional generalization hard on a particular test instance. In this work, we investigate what are the factors that make generalization to certain test instances challenging. We first substantiate that indeed some examples are more difficult than others by showing that different models consistently fail or succeed on the same test instances. Then, we propose a criterion for the difficulty of an example: a test instance is hard if it contains a local structure that was not observed at training time. We formulate a simple decision rule based on this criterion and empirically show it predicts instance-level generalization well across 5 different semantic parsing datasets, substantially better than alternative decision rules. Last, we show local structures can be leveraged for creating difficult adversarial compositional splits and also to improve compositional generalization under limited training budgets by strategically selecting examples for the training set.
Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR
Dighe, Pranay, Nayak, Prateeth, Rudovic, Oggi, Marchi, Erik, Niu, Xiaochuan, Tewfik, Ahmed
Accurate prediction of the user intent to interact with a voice assistant (VA) on a device (e.g. on the phone) is critical for achieving naturalistic, engaging, and privacy-centric interactions with the VA. To this end, we present a novel approach to predict the user's intent (the user speaking to the device or not) directly from acoustic and textual information encoded at subword tokens which are obtained via an end-to-end ASR model. Modeling directly the subword tokens, compared to modeling of the phonemes and/or full words, has at least two advantages: (i) it provides a unique vocabulary representation, where each token has a semantic meaning, in contrast to the phoneme-level representations, (ii) each subword token has a reusable "sub"-word acoustic pattern (that can be used to construct multiple full words), resulting in a largely reduced vocabulary space than of the full words. To learn the subword representations for the audio-to-intent classification, we extract: (i) acoustic information from an E2E-ASR model, which provides frame-level CTC posterior probabilities for the subword tokens, and (ii) textual information from a pre-trained continuous bag-of-words model capturing the semantic meaning of the subword tokens. The key to our approach is the way it combines acoustic subword-level posteriors with text information using the notion of positional-encoding in order to account for multiple ASR hypotheses simultaneously. We show that our approach provides more robust and richer representations for audio-to-intent classification, and is highly accurate with correctly mitigating 93.3% of unintended user audio from invoking the smart assistant at 99% true positive rate.
Benchmarking GPU and TPU Performance with Graph Neural Networks
Ju, xiangyang, Wang, Yunsong, Murnane, Daniel, Choma, Nicholas, Farrell, Steven, Calafiura, Paolo
Many artificial intelligence (AI) devices have been developed to accelerate the training and inference of neural networks models. The most common ones are the Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU). They are highly optimized for dense data representations. However, sparse representations such as graphs are prevalent in many domains, including science. It is therefore important to characterize the performance of available AI accelerators on sparse data. This work analyzes and compares the GPU and TPU performance training a Graph Neural Network (GNN) developed to solve a real-life pattern recognition problem. Characterizing the new class of models acting on sparse data may prove helpful in optimizing the design of deep learning libraries and future AI accelerators.
Augmentation by Counterfactual Explanation -- Fixing an Overconfident Classifier
Singla, Sumedha, Murali, Nihal, Arabshahi, Forough, Triantafyllou, Sofia, Batmanghelich, Kayhan
A highly accurate but overconfident model is ill-suited for deployment in critical applications such as healthcare and autonomous driving. The classification outcome should reflect a high uncertainty on ambiguous in-distribution samples that lie close to the decision boundary. The model should also refrain from making overconfident decisions on samples that lie far outside its training distribution, far-out-of-distribution (far-OOD), or on unseen samples from novel classes that lie near its training distribution (near-OOD). This paper proposes an application of counterfactual explanations in fixing an over-confident classifier. Specifically, we propose to fine-tune a given pre-trained classifier using augmentations from a counterfactual explainer (ACE) to fix its uncertainty characteristics while retaining its predictive performance. We perform extensive experiments with detecting far-OOD, near-OOD, and ambiguous samples. Our empirical results show that the revised model have improved uncertainty measures, and its performance is competitive to the state-of-the-art methods.
Adaptive re-calibration of channel-wise features for Adversarial Audio Classification
Dongre, Vardhan, Reddy, Abhinav Thimma, Reddeddy, Nikhitha
DeepFake Audio, unlike DeepFake images and videos, has been relatively less explored from detection perspective, and the solutions which exist for the synthetic speech classification either use complex networks or dont generalize to different varieties of synthetic speech obtained using different generative and optimization-based methods. Through this work, we propose a channel-wise recalibration of features using attention feature fusion for synthetic speech detection and compare its performance against different detection methods including End2End models and Resnet-based models on synthetic speech generated using Text to Speech and Vocoder systems like WaveNet, WaveRNN, Tactotron, and WaveGlow. We also experiment with Squeeze Excitation (SE) blocks in our Resnet models and found that the combination was able to get better performance. In addition to the analysis, we also demonstrate that the combination of Linear frequency cepstral coefficients (LFCC) and Mel Frequency cepstral coefficients (MFCC) using the attentional feature fusion technique creates better input features representations which can help even simpler models generalize well on synthetic speech classification tasks. Our models (Resnet based using feature fusion) trained on Fake or Real (FoR) dataset and were able to achieve 95% test accuracy with the FoR data, and an average of 90% accuracy with samples we generated using different generative models after adapting this framework.
Detecting Unintended Social Bias in Toxic Language Datasets
Sahoo, Nihar, Gupta, Himanshu, Bhattacharyya, Pushpak
Warning: This paper has contents which may be offensive, or upsetting however this cannot be avoided owing to the nature of the work. With the rise of online hate speech, automatic detection of Hate Speech, Offensive texts as a natural language processing task is getting popular. However, very little research has been done to detect unintended social bias from these toxic language datasets. This paper introduces a new dataset ToxicBias curated from the existing dataset of Kaggle competition named "Jigsaw Unintended Bias in Toxicity Classification". We aim to detect Figure 1: An illustrative example of ToxicBias. During social biases, their categories, and targeted the annotation process, hate speech/offensive text groups. The dataset contains instances annotated is provided without context. Annotators are asked to for five different bias categories, viz., mark it as biased/neutral and to provide category, target, gender, race/ethnicity, religion, political, and and implication if it has biases.
Logical Reasoning with Span-Level Predictions for Interpretable and Robust NLI Models
Stacey, Joe, Minervini, Pasquale, Dubossarsky, Haim, Rei, Marek
Current Natural Language Inference (NLI) models achieve impressive results, sometimes outperforming humans when evaluating on in-distribution test sets. However, as these models are known to learn from annotation artefacts and dataset biases, it is unclear to what extent the models are learning the task of NLI instead of learning from shallow heuristics in their training data. We address this issue by introducing a logical reasoning framework for NLI, creating highly transparent model decisions that are based on logical rules. Unlike prior work, we show that improved interpretability can be achieved without decreasing the predictive accuracy. We almost fully retain performance on SNLI, while also identifying the exact hypothesis spans that are responsible for each model prediction. Using the e-SNLI human explanations, we verify that our model makes sensible decisions at a span level, despite not using any span labels during training. We can further improve model performance and span-level decisions by using the e-SNLI explanations during training. Finally, our model is more robust in a reduced data setting. When training with only 1,000 examples, out-of-distribution performance improves on the MNLI matched and mismatched validation sets by 13% and 16% relative to the baseline. Training with fewer observations yields further improvements, both in-distribution and out-of-distribution.
Bootstrapping NLP tools across low-resourced African languages: an overview and prospects
Computing and Internet access are substantially growing markets in Southern Africa, which brings with it increasing demands for local content and tools in indigenous African languages. Since most of those languages are low-resourced, efforts have gone into the notion of bootstrapping tools for one African language from another. This paper provides an overview of these efforts for Niger-Congo B (`Bantu') languages. Bootstrapping grammars for geographically distant languages has been shown to still have positive outcomes for morphology and rules or grammar-based natural language generation. Bootstrapping with data-driven approaches to NLP tasks is difficult to use meaningfully regardless geographic proximity, which is largely due to lexical diversity due to both orthography and vocabulary. Cladistic approaches in comparative linguistics may inform bootstrapping strategies and similarity measures might serve as proxy for bootstrapping potential as well, with both fertile ground for further research.
Machine Learning based Discrimination for Excited State Promoted Readout
A limiting factor for readout fidelity for superconducting qubits is the relaxation of the qubit to the ground state before the time needed for the resonator to reach its final target state. A technique known as excited state promoted (ESP) readout was proposed to reduce this effect and further improve the readout contrast on superconducting hardware. In this work, we use readout data from IBM's five-qubit quantum systems to measure the effectiveness of using deep neural networks, like feedforward neural networks, and various classification algorithms, like k-nearest neighbors, decision trees, and Gaussian naive Bayes, for single-qubit and multi-qubit discrimination. These methods were compared to standardly used linear and quadratic discriminant analysis algorithms based on their qubit-state-assignment fidelity performance, robustness to readout crosstalk, and training time.