Country
Fiedler Regularization: Learning Neural Networks with Graph Sparsity
We introduce a novel regularization approach for deep learning that incorporates and respects the underlying graphical structure of the neural network. Existing regularization methods often focus on dropping/penalizing weights in a global manner that ignores the connectivity structure of the neural network. We propose to use the Fiedler value of the neural network's underlying graph as a tool for regularization. We provide theoretical support for this approach via spectral graph theory. We demonstrate the convexity of this penalty and provide an approximate, variational approach for fast computation in practical training of neural networks. We provide bounds on such approximations. We provide an alternative but equivalent formulation of this framework in the form of a structurally weighted L1 penalty, thus linking our approach to sparsity induction. We performed experiments on datasets that compare Fiedler regularization with traditional regularization methods such as dropout and weight decay. Results demonstrate the efficacy of Fiedler regularization.
Benchmarking Graph Neural Networks
Dwivedi, Vijay Prakash, Joshi, Chaitanya K., Laurent, Thomas, Bengio, Yoshua, Bresson, Xavier
Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. They have been successfully applied to a myriad of domains including chemistry, physics, social sciences, knowledge graphs, recommendation, and neuroscience. As the field grows, it becomes critical to identify the architectures and key mechanisms which generalize across graphs sizes, enabling us to tackle larger, more complex datasets and domains. Unfortunately, it has been increasingly difficult to gauge the effectiveness of new GNNs and compare models in the absence of a standardized benchmark with consistent experimental settings and large datasets. In this paper, we propose a reproducible GNN benchmarking framework, with the facility for researchers to add new datasets and models conveniently. We apply this benchmarking framework to novel medium-scale graph datasets from mathematical modeling, computer vision, chemistry and combinatorial problems to establish key operations when designing effective GNNs. Precisely, graph convolutions, anisotropic diffusion, residual connections and normalization layers are universal building blocks for developing robust and scalable GNNs.
Differential Privacy at Risk: Bridging Randomness and Privacy Budget
Dandekar, Ashish, Basu, Debabrota, Bressan, Stephane
The calibration of noise for a privacy-preserving mechanism depends on the sensitivity of the query and the prescribed privacy level. A data steward must make the non-trivial choice of a privacy level that balances the requirements of users and the monetary constraints of the business entity. We analyse roles of the sources of randomness, namely the explicit randomness induced by the noise distribution and the implicit randomness induced by the data-generation distribution, that are involved in the design of a privacy-preserving mechanism. The finer analysis enables us to provide stronger privacy guarantees with quantifiable risks. Thus, we propose privacy at risk that is a probabilistic calibration of privacy-preserving mechanisms. We provide a composition theorem that leverages privacy at risk. We instantiate the probabilistic calibration for the Laplace mechanism by providing analytical results. We also propose a cost model that bridges the gap between the privacy level and the compensation budget estimated by a GDPR compliant business entity. The convexity of the proposed cost model leads to a unique fine-tuning of privacy level that minimises the compensation budget. We show its effectiveness by illustrating a realistic scenario that avoids overestimation of the compensation budget by using privacy at risk for the Laplace mechanism. We quantitatively show that composition using the cost optimal privacy at risk provides stronger privacy guarantee than the classical advanced composition.
Bayesian Neural Networks With Maximum Mean Discrepancy Regularization
Pomponi, Jary, Scardapane, Simone, Uncini, Aurelio
Bayesian Neural Networks (BNNs) are trained to optimize an entire distribution over their weights instead of a single set, having significant advantages in terms of, e.g., interpretability, multi-task learning, and calibration. Because of the intractability of the resulting optimization problem, most BNNs are either sampled through Monte Carlo methods, or trained by minimizing a suitable Evidence Lower BOund (ELBO) on a variational approximation. In this paper, we propose a variant of the latter, wherein we replace the Kullback-Leibler divergence in the ELBO term with a Maximum Mean Discrepancy (MMD) estimator, inspired by recent work in variational inference. After motivating our proposal based on the properties of the MMD term, we proceed to show a number of empirical advantages of the proposed formulation over the state-of-the-art. In particular, our BNNs achieve higher accuracy on multiple benchmarks, including several image classification tasks. In addition, they are more robust to the selection of a prior over the weights, and they are better calibrated. As a second contribution, we provide a new formulation for estimating the uncertainty on a given prediction, showing it performs in a more robust fashion against adversarial attacks and the injection of noise over their inputs, compared to more classical criteria such as the differential entropy.
Few-shot Learning with Weakly-supervised Object Localization
Few-shot learning (FSL) aims to learn novel visual categories from very few samples, which is a challenging problem in real-world applications. Many data generation methods have improved the performance of FSL models, but require lots of annotated images to train a specialized network (e.g., GAN) dedicated to hallucinate new samples. We argue that localization is a more efficient approach because it provides the most discriminative regions without using extra samples. In this paper, we propose a novel method to address the FSL task by achieving weakly-supervised object localization within performing few-shot classification. To this end, we design (i) a triplet-input module to obtain the initial object seeds and (ii) an Image-To-Class-Distance (ITCD) based localizer to activate the deep descriptors of the key objects, thus obtaining the more discriminative representations used to perform few-shot classification. Extensive experiments show our method outperforms the state-of-the-art methods on benchmark datasets under various settings. Besides, our method achieves superior performance over previous methods when training the model on miniImageNet and evaluating it on the different datasets (e.g., Stanford Dogs), demonstrating its superior generalization capacity. Extra visualization shows the proposed method can localize the key objects accurately.
Addressing target shift in zero-shot learning using grouped adversarial learning
Chemmengath, Saneem Ahmed, Bharadwaj, Samarth, Paul, Soumava, Samanta, Suranjana, Sankaranarayanan, Karthik
In this paper, we present a new paradigm to zero-shot learning (ZSL) that is trained by utilizing additional information (such as attribute-class mapping) for specific set of unseen classes. We conjecture that such additional information about unseen classes is more readily available than unsupervised image sets. Further, on close examination of the underlying attribute predictors of popular ZSL algorithms, we find that they often leverage attribute correlations to make predictions. While attribute correlations that remain intact in the unseen classes (test) benefit the prediction of difficult attributes, change in correlations can have an adverse effect on ZSL performance. For example, detecting an attribute 'brown' may be the same as detecting 'fur' over an animals' image dataset captured in the tropics. However, such a model might fail on unseen images of Arctic animals. To address this effect, termed target-shift in ZSL, we utilize our proposed framework to design grouped adversarial learning. We introduce grouping of attributes to enable the model to continue to benefit from useful correlations, while restricting cross-group correlations that may be harmful for generalization. Our analysis shows that it is possible to not only constrain the model from leveraging unwanted correlations, but also adjust them to specific test setting using only the additional information (the already available attribute-class mapping). We show empirical results for zero-shot predictions on standard benchmark datasets, namely, aPY, AwA2, SUN and CUB datasets. We further introduce to the research community, a new experimental train-test split that maximizes target-shift to further study its effects.
A General Framework for Symmetric Property Estimation
Charikar, Moses, Shiragur, Kirankumar, Sidford, Aaron
Symmetric property estimation is a fundamental and well studied problem in machine learning and statistics. In this problem, we are given n i.i.d samples from an unknown distribution 1 p and asked to estimate f(p), where f is a symmetric property (i.e. it does not depend on the labels of the symbols). Over the past few years, the computational and sample complexities for estimating many symmetric properties have been extensively studied. Estimators with optimal sample complexities have been obtained for several properties including entropy [VV11b, WY16a, JVHW15], distance to uniformity [VV11a, JHW16], and support [VV11b, WY15]. All aforementioned estimators were property specific and therefore, a natural question is to design a universal estimator. In [ADOS16], the authors showed that the distribution that maximizes the profile likelihood, i.e. the likelihood of the multiset of frequencies of elements in the sample, referred to as profile maximum likelihood (PML) distribution, can be used as a universal plugin estimator.
Realistic River Image Synthesis using Deep Generative Adversarial Networks
Gautam, Akshat, Sit, Muhammed, Demir, Ibrahim
In this paper, we investigate an application of image generation for river satellite imagery. Specifically, we propose a generative adversarial network (GAN) model capable of generating high-resolution and realistic river images that can be used to support models in surface water estimation, river meandering, wetland loss and other hydrological research studies. First, we summarized an augmented, diverse repository of overhead river images to be used in training. Second, we incorporate the Progressive Growing GAN (PGGAN), a network architecture that iteratively trains smaller-resolution GANs to gradually build up to a very high resolution, to generate 256x256 river satellite imagery. With conventional GAN architectures, difficulties soon arise in terms of exponential increase of training time and vanishing/exploding gradient issues, which the PGGAN implementation seems to significantly reduce. Our preliminary results show great promise in capturing the detail of river flow and green areas present in river satellite images that can be used for supporting hydroinformatics studies.
Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems
Chatziafratis, Vaggos, Nagarajan, Sai Ganesh, Panageas, Ioannis
Deep Neural Networks (NNs) with many hidden layers are now at the core of modern machine learning applications and can achieve remarkable performance that was previously unattainable using shallow networks. But why are deeper networks better than shallow? Perhaps intuitively, one can understand that the nature of computation done by deep and shallow networks is different; simple one hidden layer NNs extract independent features of the input and return their weighted sum, while deeper NNs can compute features of features, making the features computed by deeper layers no longer independent. Another line of intuition (Poole et al. (2016)), is that highly complicated manifolds in input space can actually turn into flattened manifolds in hidden space, thus helping with downstream tasks (e.g., classification). To make the above intuitions formal and understand the benefits of depth, researchers try to understand the expressivity of NNs and prove depth separation results. Early results in this area sometimes referred to as universality theorems (Cybenko, 1989; Hornik et al., 1989), state that NNs of just one hidden layer, equipped with standard activation units (e.g., sigmoids, ReLUs etc.) are "dense" in the space of continuous functions, meaning that any continuous function can be represented by an appropriate combination of these activation units.
Analysis via Orthonormal Systems in Reproducing Kernel Hilbert $C^*$-Modules and Applications
Hashimoto, Yuka, Ishikawa, Isao, Ikeda, Masahiro, Komura, Fuyuta, Katsura, Takeshi, Kawahara, Yoshinobu
Kernel methods have been among the most popular techniques in machine learning, where learning tasks are solved using the property of reproducing kernel Hilbert space (RKHS). In this paper, we propose a novel data analysis framework with reproducing kernel Hilbert $C^*$-module (RKHM), which is another generalization of RKHS than vector-valued RKHS (vv-RKHS). Analysis with RKHMs enables us to deal with structures among variables more explicitly than vv-RKHS. We show the theoretical validity for the construction of orthonormal systems in Hilbert $C^*$-modules, and derive concrete procedures for orthonormalization in RKHMs with those theoretical properties in numerical computations. Moreover, we apply those to generalize with RKHM kernel principal component analysis and the analysis of dynamical systems with Perron-Frobenius operators. The empirical performance of our methods is also investigated by using synthetic and real-world data.