Banff
Deterministic Gaussian Averaged Neural Networks
Campbell, Ryan, Finlay, Chris, Oberman, Adam M
We present a deterministic method to compute the Gaussian average of neural networks used in regression and classification. Our method is based on an equivalence between training with a particular regularized loss, and the expected values of Gaussian averages. We use this equivalence to certify models which perform well on clean data but are not robust to adversarial perturbations. In terms of certified accuracy and adversarial robustness, our method is comparable to known stochastic methods such as randomized smoothing, but requires only a single model evaluation during inference.
Exploring the Vulnerability of Deep Neural Networks: A Study of Parameter Corruption
Sun, Xu, Zhang, Zhiyuan, Ren, Xuancheng, Luo, Ruixuan, Li, Liangyou
We argue that the vulnerability of model parameters is of crucial value to the study of model robustness and generalization but little research has been devoted to understanding this matter. In this work, we propose an indicator to measure the robustness of neural network parameters by exploiting their vulnerability via parameter corruption. The proposed indicator describes the maximum loss variation in the non-trivial worst-case scenario under parameter corruption. For practical purposes, we give a gradient-based estimation, which is far more effective than random corruption trials that can hardly induce the worst accuracy degradation. Equipped with theoretical support and empirical validation, we are able to systematically investigate the robustness of different model parameters and reveal vulnerability of deep neural networks that has been rarely paid attention to before. Moreover, we can enhance the models accordingly with the proposed adversarial corruption-resistant training, which not only improves the parameter robustness but also translates into accuracy elevation.
Low Distortion Block-Resampling with Spatially Stochastic Networks
Hong, Sarah Jane, Arjovsky, Martin, Thompson, Ian, Barnhardt, Darryl
We formalize and attack the problem of generating new images from old ones that are as diverse as possible, only allowing them to change without restrictions in certain parts of the image while remaining globally consistent. This encompasses the typical situation found in generative modelling, where we are happy with parts of the generated data, but would like to resample others ("I like this generated castle overall, but this tower looks unrealistic, I would like a new one"). In order to attack this problem we build from the best conditional and unconditional generative models to introduce a new network architecture, training procedure, and a new algorithm for resampling parts of the image as desired.
Deep Lagrangian Constraint-based Propagation in Graph Neural Networks
Tiezzi, Matteo, Marra, Giuseppe, Melacci, Stefano, Maggini, Marco
Several real-world applications are characterized by data that exhibit a complex structure that can be represented using graphs. The popularity of deep learning techniques renewed the interest in neural architectures able to process these patterns, inspired by the Graph Neural Network (GNN) model. GNNs encode the state of the nodes of the graph by means of an iterative diffusion procedure that, during the learning stage, must be computed at every epoch, until the fixed point of a learnable state transition function is reached, propagating the information among the neighbouring nodes. We propose a novel approach to learning in GNNs, based on constrained optimization in the Lagrangian framework. Learning both the transition function and the node states is the outcome of a joint process, in which the state convergence procedure is implicitly expressed by a constraint satisfaction mechanism, avoiding iterative epoch-wise procedures and the network unfolding. Our computational structure searches for saddle points of the Lagrangian in the adjoint space composed of weights, nodes state variables and Lagrange multipliers. This process is further enhanced by multiple layers of constraints that accelerate the diffusion process. An experimental analysis shows that the proposed approach compares favourably with popular models on several benchmarks.
An Efficient $k$-modes Algorithm for Clustering Categorical Datasets
Dorman, Karin S., Maitra, Ranjan
Mining clusters from datasets is an important endeavor in many applications. The k-means algorithm is a popular and efficient, distribution-free approach for clustering numerical-valued data, but does not apply for categorical-valued observations. We provide a novel, computationally efficient implementation of k-modes, called OTQT. We prove that OTQT finds updates, undetectable to existing k-modes algorithms, that improve the objective function. Thus, although slightly slower per iteration owing to its algorithmic complexity, OTQT is always more accurate per iteration and almost always faster (and only barely slower on some datasets) to the final optimum. As a result, we recommend OTQT as the preferred, default algorithm for all k-modes implementations. We also examine five initialization methods and three types of K-selection methods, many of them novel or novel applications to k-modes. By examining performance on real and simulated datasets, we show that simple random initialization is the best initializer and that a novel K-selection method is more accurate than methods adapted from k-means. Identifying groups of similar observations in datasets is common in a wide array of applications, with many clustering methods developed in statistics, machine learning and the applied sciences [1]-[7]. The k-means algorithm [8]-[11] is arguably the most popular method for clustering numerical-valued observations. It scales to large datasets because it does not require calculation of all pairwise distances, and it is distribution-free. While distribution-free does not imply it is assumption-free [12], [13], it is a starting place for users wary of making assumptions about their data. Unfortunately, k-means does not provide an appropriate objective to minimize for datasets with categorical attributes.
UFO-BLO: Unbiased First-Order Bilevel Optimization
Likhosherstov, Valerii, Song, Xingyou, Choromanski, Krzysztof, Davis, Jared, Weller, Adrian
Bilevel optimization (BLO) is a popular approach with many applications including hyperparameter optimization, neural architecture search, adversarial robustness and model-agnostic meta-learning. However, the approach suffers from time and memory complexity proportional to the length $r$ of its inner optimization loop, which has led to several modifications being proposed. One such modification is \textit{first-order} BLO (FO-BLO) which approximates outer-level gradients by zeroing out second derivative terms, yielding significant speed gains and requiring only constant memory as $r$ varies. Despite FO-BLO's popularity, there is a lack of theoretical understanding of its convergence properties. We make progress by demonstrating a rich family of examples where FO-BLO-based stochastic optimization does not converge to a stationary point of the BLO objective. We address this concern by proposing a new FO-BLO-based unbiased estimate of outer-level gradients, enabling us to theoretically guarantee this convergence, with no harm to memory and expected time complexity. Our findings are supported by experimental results on Omniglot and Mini-ImageNet, popular few-shot meta-learning benchmarks.
Explainable Artificial Intelligence: a Systematic Review
This has led to the development of a plethora of domain-dependent and context-specific methods for dealing with the interpretation of machine learning (ML) models and the formation of explanations for humans. Unfortunately, this trend is far from being over, with an abundance of knowledge in the field which is scattered and needs organisation. The goal of this article is to systematically review research works in the field of XAI and to try to define some boundaries in the field. From several hundreds of research articles focused on the concept of explainability, about 350 have been considered for review by using the following search methodology. In a first phase, Google Scholar was queried to find papers related to "explainable artificial intelligence", "explainable machine learning" and "interpretable machine learning". Subsequently, the bibliographic section of these articles was thoroughly examined to retrieve further relevant scientific studies. The first noticeable thing, as shown in figure 2 (a), is the distribution of the publication dates of selected research articles: sporadic in the 70s and 80s, receiving preliminary attention in the 90s, showing raising interest in 2000 and becoming a recognised body of knowledge after 2010. The first research concerned the development of an explanation-based system and its integration in a computer program designed to help doctors make diagnoses [3]. Some of the more recent papers focus on work devoted to the clustering of methods for explainability, motivating the need for organising the XAI literature [4, 5, 6].
Black-box Explanation of Object Detectors via Saliency Maps
Petsiuk, Vitali, Jain, Rajiv, Manjunatha, Varun, Morariu, Vlad I., Mehra, Ashutosh, Ordonez, Vicente, Saenko, Kate
We propose D-RISE, a method for generating visual explanations for the predictions of object detectors. D-RISE can be considered "black-box" in the software testing sense, it only needs access to the inputs and outputs of an object detector. Compared to gradient-based methods, D-RISE is more general and agnostic to the particular type of object detector being tested as it does not need to know about the inner workings of the model. We show that D-RISE can be easily applied to different object detectors including one-stage detectors such as YOLOv3 and two-stage detectors such as Faster-RCNN. We present a detailed analysis of the generated visual explanations to highlight the utilization of context and the possible biases learned by object detectors.
SimPool: Towards Topology Based Graph Pooling with Structural Similarity Features
Deep learning methods for graphs have seen rapid progress in recent years with much focus awarded to generalising Convolutional Neural Networks (CNN) to graph data. CNNs are typically realised by alternating convolutional and pooling layers where the pooling layers subsample the grid and exchange spatial or temporal resolution for increased feature dimensionality. Whereas the generalised convolution operator for graphs has been studied extensively and proven useful, hierarchical coarsening of graphs is still challenging since nodes in graphs have no spatial locality and no natural order. This paper proposes two main contributions, the first is a differential module calculating structural similarity features based on the adjacency matrix. These structural similarity features may be used with various algorithms however in this paper the focus and the second main contribution is on integrating these features with a revisited pooling layer DiffPool arXiv:1806.08804 to propose a pooling layer referred to as SimPool. This is achieved by linking the concept of network reduction by means of structural similarity in graphs with the concept of hierarchical localised pooling. Experimental results demonstrate that as part of an end-to-end Graph Neural Network architecture SimPool calculates node cluster assignments that functionally resemble more to the locality preserving pooling operations used by CNNs that operate on local receptive fields in the standard grid. Furthermore the experimental results demonstrate that these features are useful in inductive graph classification tasks with no increase to the number of parameters.
Understanding the Message Passing in Graph Neural Networks via Power Iteration
The mechanism of message passing in graph neural networks(GNNs) is still mysterious for the literature. No one, to our knowledge, has given another possible theoretical origin for GNNs apart from convolutional neural networks. Somewhat to our surprise, the message passing can be best understood in terms of the power iteration. By removing activation functions and layer weights of GNNs, we propose power iteration clustering (SPIC) models which are naturally interpretable and scalable. The experiment shows our models extend the existing GNNs and enhance its capability of processing random featured networks. Moreover, we demonstrate the redundancy of some state-of-the-art GNNs in designing and define a lower limit for model evaluation by randomly initializing the aggregator of message passing. All the findings in this paper push the boundaries of our understanding of neural networks.