Country
Defending Adversarial Attacks via Semantic Feature Manipulation
Wang, Shuo, Chen, Tianle, Nepal, Surya, Rudolph, Carsten, Grobler, Marthie, Chen, Shangyu
Machine learning models have demonstrated vulnerability to adversarial attacks, more specifically misclassification of adversarial examples. In this paper, we propose a one-off and attack-agnostic Feature Manipulation (FM)-Defense to detect and purify adversarial examples in an interpretable and efficient manner. The intuition is that the classification result of a normal image is generally resistant to non-significant intrinsic feature changes, e.g., varying thickness of handwritten digits. In contrast, adversarial examples are sensitive to such changes since the perturbation lacks transferability. To enable manipulation of features, a combo-variational autoencoder is applied to learn disentangled latent codes that reveal semantic features. The resistance to classification change over the morphs, derived by varying and reconstructing latent codes, is used to detect suspicious inputs. Further, combo-VAE is enhanced to purify the adversarial examples with good quality by considering both class-shared and class-unique features. We empirically demonstrate the effectiveness of detection and the quality of purified instance. Our experiments on three datasets show that FM-Defense can detect nearly $100\%$ of adversarial examples produced by different state-of-the-art adversarial attacks. It achieves more than $99\%$ overall purification accuracy on the suspicious instances that close the manifold of normal examples.
A neural network model that learns differences in diagnosis strategies among radiologists has an improved area under the curve for aneurysm status classification in magnetic resonance angiography image series
Tachibana, Yasuhiko, Nishimori, Masataka, Kitamura, Naoyuki, Umehara, Kensuke, Ota, Junko, Obata, Takayuki, Higashi, Tatsuya
Purpose: To construct a neural network model that can learn the different diagnosing strategies of radiologists to better classify aneurysm status in magnetic resonance angiography images. Materials and methods: This retrospective study included 3423 time-of-flight brain magnetic resonance angiography image series (subjects: male 1843 [mean age, 50.2 +/- 11.7 years], female 1580 [50.8 +/- 11.3 years]) recorded from November 2017 through January 2019. The image series were read independently for aneurysm status by one of four board-certified radiologists, who were assisted by an established deep learning-based computer-assisted diagnosis (CAD) system. The constructed neural networks were trained to classify the aneurysm status of zero to five aneurysm-suspicious areas suggested by the CAD system for each image series, and any additional aneurysm areas added by the radiologists, and this classification was compared with the judgment of the annotating radiologist. Image series were randomly allocated to training and testing data in an 8:2 ratio. The accuracy of the classification was compared by receiver operating characteristic analysis between the control model that accepted only image data as input and the proposed model that additionally accepted the information of who the annotating radiologist was. The DeLong test was used to compare areas under the curves (P < 0.05 was considered significant). Results: The area under the curve was larger in the proposed model (0.845) than in the control model (0.793), and the difference was significant (P < 0.0001). Conclusion: The proposed model improved classification accuracy by learning the diagnosis strategies of individual annotating radiologists.
Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Black Box Simulators
Sreedharan, Sarath, Soni, Utkash, Verma, Mudit, Srivastava, Siddharth, Kambhampati, Subbarao
As more and more complex AI systems are introduced into our day-to-day lives, it becomes important that everyday users can work and interact with such systems with relative ease. Orchestrating such interactions require the system to be capable of providing explanations and rationale for its decisions and be able to field queries about alternative decisions. A significant hurdle to allowing for such explanatory dialogue could be the mismatch between the complex representations that the systems use to reason about the task and the terms in which the user may be viewing the task. This paper introduces methods that can be leveraged to provide contrastive explanations in terms of user-specified concepts for deterministic sequential decision-making settings where the system dynamics may be best represented in terms of black box simulators. We do this by assuming that system dynamics can at least be partly captured in terms of symbolic planning models, and we provide explanations in terms of these models. We implement this method using a simulator for a popular Atari game (Montezuma's Revenge) and perform user studies to verify whether people would find explanations generated in this form useful.
Fake News Detection by means of Uncertainty Weighted Causal Graphs
Garrido-Merchรกn, Eduardo C., Puente, Cristina, Palacios, Rafael
Society is experimenting changes in information consumption, as new information channels such as social networks let people share news that do not necessarily be trust worthy. Sometimes, these sources of information produce fake news deliberately with doubtful purposes and the consumers of that information share it to other users thinking that the information is accurate. This transmission of information represents an issue in our society, as can influence negatively the opinion of people about certain figures, groups or ideas. Hence, it is desirable to design a system that is able to detect and classify information as fake and categorize a source of information as trust worthy or not. Current systems experiment difficulties performing this task, as it is complicated to design an automatic procedure that can classify this information independent on the context. In this work, we propose a mechanism to detect fake news through a classifier based on weighted causal graphs. These graphs are specific hybrid models that are built through causal relations retrieved from texts and consider the uncertainty of causal relations. We take advantage of this representation to use the probability distributions of this graph and built a fake news classifier based on the entropy and KL divergence of learned and new information. We believe that the problem of fake news is accurately tackled by this model due to its hybrid nature between a symbolic and quantitative methodology. We describe the methodology of this classifier and add empirical evidence of the usefulness of our proposed approach in the form of synthetic experiments and a real experiment involving lung cancer.
Four Principles of Explainable AI as Applied to Biometrics and Facial Forensic Algorithms
Phillips, P. Jonathon, Przybocki, Mark
Traditionally, researchers in automatic face recognition and biometric technologies have focused on developing accurate algorithms. With this technology being integrated into operational systems, engineers and scientists are being asked, do these systems meet societal norms? The origin of this line of inquiry is `trust' of artificial intelligence (AI) systems. In this paper, we concentrate on adapting explainable AI to face recognition and biometrics, and we present four principles of explainable AI to face recognition and biometrics. The principles are illustrated by $\it{four}$ case studies, which show the challenges and issues in developing algorithms that can produce explanations.
CryptoSPN: Privacy-preserving Sum-Product Network Inference
Treiber, Amos, Molina, Alejandro, Weinert, Christian, Schneider, Thomas, Kersting, Kristian
AI algorithms, and machine learning (ML) techniques in particular, are increasingly important to individuals' lives, but have caused a range of privacy concerns addressed by, e.g., the European GDPR. Using cryptographic techniques, it is possible to perform inference tasks remotely on sensitive client data in a privacy-preserving way: the server learns nothing about the input data and the model predictions, while the client learns nothing about the ML model (which is often considered intellectual property and might contain traces of sensitive data). While such privacy-preserving solutions are relatively efficient, they are mostly targeted at neural networks, can degrade the predictive accuracy, and usually reveal the network's topology. Furthermore, existing solutions are not readily accessible to ML experts, as prototype implementations are not well-integrated into ML frameworks and require extensive cryptographic knowledge. In this paper, we present CryptoSPN, a framework for privacy-preserving inference of sum-product networks (SPNs). SPNs are a tractable probabilistic graphical model that allows a range of exact inference queries in linear time. Specifically, we show how to efficiently perform SPN inference via secure multi-party computation (SMPC) without accuracy degradation while hiding sensitive client and training information with provable security guarantees. Next to foundations, CryptoSPN encompasses tools to easily transform existing SPNs into privacy-preserving executables. Our empirical results demonstrate that CryptoSPN achieves highly efficient and accurate inference in the order of seconds for medium-sized SPNs.
FAE: A Fairness-Aware Ensemble Framework
Iosifidis, Vasileios, Fetahu, Besnik, Ntoutsi, Eirini
Automated decision making based on big data and machine learning (ML) algorithms can result in discriminatory decisions against certain protected groups defined upon personal data like gender, race, sexual orientation etc. Such algorithms designed to discover patterns in big data might not only pick up any encoded societal biases in the training data, but even worse, they might reinforce such biases resulting in more severe discrimination. The majority of thus far proposed fairness-aware machine learning approaches focus solely on the pre-, in- or post-processing steps of the machine learning process, that is, input data, learning algorithms or derived models, respectively. However, the fairness problem cannot be isolated to a single step of the ML process. Rather, discrimination is often a result of complex interactions between big data and algorithms, and therefore, a more holistic approach is required. The proposed FAE (Fairness-Aware Ensemble) framework combines fairness-related interventions at both pre- and postprocessing steps of the data analysis process. In the preprocessing step, we tackle the problems of under-representation of the protected group (group imbalance) and of class-imbalance by generating balanced training samples. In the post-processing step, we tackle the problem of class overlapping by shifting the decision boundary in the direction of fairness.
How Far are We from Effective Context Modeling ? An Exploratory Study on Semantic Parsing in Context
Liu, Qian, Chen, Bei, Guo, Jiaqi, Lou, Jian-Guang, Zhou, Bin, Zhang, Dongmei
Recently semantic parsing in context has received a considerable attention, which is challenging since there are complex contextual phenomena. Previous works verified their proposed methods in limited scenarios, which motivates us to conduct an exploratory study on context modeling methods under real-world semantic parsing in context. We present a grammar-based decoding semantic parser and adapt typical context modeling methods on top of it. We evaluate 13 context modeling methods on two large complex cross-domain datasets, and our best model achieves state-of-the-art performances on both datasets with significant improvements. Furthermore, we summarize the most frequent contextual phenomena, with a fine-grained analysis on representative models, which may shed light on potential research directions.
On the impact of modern deep-learning techniques to the performance and time-requirements of classification models in experimental high-energy physics
Beginning from a basic neural-network architecture, we test the potential benefits offered by a range of advanced techniques for machine learning and deep learning in the context of a typical classification problem encountered in the domain of high-energy physics, using a well-studied dataset: the 2014 Higgs ML Kaggle dataset. The advantages are evaluated in terms of both performance metrics and the time required to train and apply the resulting models. Techniques examined include domain-specific data-augmentation, learning rate and momentum scheduling, (advanced) ensembling in both model-space and weight-space, and alternative architectures and connection methods. Following the investigation, we arrive at a model which achieves equal performance to the winning solution of the original Kaggle challenge, whilst requiring about 1% of the training time and less than 5% of the inference time using much less specialised hardware. Additionally, a new wrapper library for PyTorch called LUMIN is presented, which incorporates all of the techniques studied.
Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform
Li, Jun, Fuxin, Li, Todorovic, Sinisa
Strictly enforcing orthonormality constraints on parameter matrices has been shown advantageous in deep learning. This amounts to Riemannian optimization on the Stiefel manifold, which, however, is computationally expensive. To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold. We specify two new optimization algorithms: Cayley SGD with momentum, and Cayley ADAM on the Stiefel manifold. Convergence of Cayley SGD is theoretically analyzed. Our experiments for CNN training demonstrate that both algorithms: (a) Use less running time per iteration relative to existing approaches that enforce orthonormality of CNN parameters; and (b) Achieve faster convergence rates than the baseline SGD and ADAM algorithms without compromising the performance of the CNN. Cayley SGD and Cayley ADAM are also shown to reduce the training time for optimizing the unitary transition matrices in RNNs.