Banff
When are Non-Parametric Methods Robust?
Bhattacharjee, Robi, Chaudhuri, Kamalika
A growing body of research has shown that many classifiers are susceptible to {\em{adversarial examples}} -- small strategic modifications to test inputs that lead to misclassification. In this work, we study general non-parametric methods, with a view towards understanding when they are robust to these modifications. We establish general conditions under which non-parametric methods are r-consistent -- in the sense that they converge to optimally robust and accurate classifiers in the large sample limit. Concretely, our results show that when data is well-separated, nearest neighbors and kernel classifiers are r-consistent, while histograms are not. For general data distributions, we prove that preprocessing by Adversarial Pruning (Yang et. al., 2019) -- that makes data well-separated -- followed by nearest neighbors or kernel classifiers also leads to r-consistency.
A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs
Sun, Zequn, Zhang, Qingheng, Hu, Wei, Wang, Chengming, Chen, Muhao, Akrami, Farahnaz, Li, Chengkai
Entity alignment seeks to find entities in different knowledge graphs (KGs) that refer to the same real-world object. Recent advancement in KG embedding impels the advent of embedding-based entity alignment, which encodes entities in a continuous embedding space and measures entity similarities based on the learned embeddings. In this paper, we conduct a comprehensive experimental study of this emerging field. This study surveys 23 recent embedding-based entity alignment approaches and categorizes them based on their techniques and characteristics. We further observe that current approaches use different datasets in evaluation, and the degree distributions of entities in these datasets are inconsistent with real KGs. Hence, we propose a new KG sampling algorithm, with which we generate a set of dedicated benchmark datasets with various heterogeneity and distributions for a realistic evaluation. This study also produces an open-source library, which includes 12 representative embedding-based entity alignment approaches. We extensively evaluate these approaches on the generated datasets, to understand their strengths and limitations. Additionally, for several directions that have not been explored in current approaches, we perform exploratory experiments and report our preliminary findings for future studies. The benchmark datasets, open-source library and experimental results are all accessible online and will be duly maintained.
Amortized variance reduction for doubly stochastic objectives
Boustati, Ayman, Vakili, Sattar, Hensman, James, John, ST
Approximate inference in complex probabilistic models such as deep Gaussian processes requires the optimisation of doubly stochastic objective functions. These objectives incorporate randomness both from mini-batch subsampling of the data and from Monte Carlo estimation of expectations. If the gradient variance is high, the stochastic optimisation problem becomes difficult with a slow rate of convergence. Control variates can be used to reduce the variance, but past approaches do not take into account how mini-batch stochasticity affects sampling stochasticity, resulting in sub-optimal variance reduction. We propose a new approach in which we use a recognition network to cheaply approximate the optimal control variate for each mini-batch, with no additional model gradient computations. We illustrate the properties of this proposal and test its performance on logistic regression and deep Gaussian processes.
Learning discrete state abstractions with deep variational inference
Biza, Ondrej, Platt, Robert, van de Meent, Jan-Willem, Wong, Lawson L. S.
Abstraction is crucial for effective sequential decision making in domains with large state spaces. In this work, we propose a variational information bottleneck method for learning approximate bisimulations, a type of state abstraction. We use a deep neural net encoder to map states onto continuous embeddings. The continuous latent space is then compressed into a discrete representation using an action-conditioned hidden Markov model, which is trained end-to-end with the neural network. Our method is suited for environments with high-dimensional states and learns from a stream of experience collected by an agent acting in a Markov decision process. Through a learned discrete abstract model, we can efficiently plan for unseen goals in a multi-goal Reinforcement Learning setting. We test our method in simplified robotic manipulation domains with image states. We also compare it against previous model-based approaches to finding bisimulations in discrete grid-world-like environments.
Graphs, Convolutions, and Neural Networks
Gama, Fernando, Isufi, Elvin, Leus, Geert, Ribeiro, Alejandro
Network data can be conveniently modeled as a graph signal, where data values are assigned to nodes of a graph that describes the underlying network topology. Successful learning from network data is built upon methods that effectively exploit this graph structure. In this work, we overview graph convolutional filters, which are linear, local and distributed operations that adequately leverage the graph structure. We then discuss graph neural networks (GNNs), built upon graph convolutional filters, that have been shown to be powerful nonlinear learning architectures. We show that GNNs are permutation equivariant and stable to changes in the underlying graph topology, allowing them to scale and transfer. We also introduce GNN extensions using edgevarying and autoregressive moving average graph filters, and discuss their properties. Finally, we study the use of GNNs in learning decentralized controllers for robot swarm and in addressing the recommender system problem. I. INTRODUCTION Data generated by networks are increasingly common in power grids, robotics, biological, social and economic networks, and recommender systems among others. The irregular and complex nature of these network data poses unique challenges, therefore, making successful learning possible only by incorporating the structure into the inner-working mechanisms of the model [1]. Work in this paper is supported by NSF CCF 1717120, ARO W911NF1710438, ARL DCIST CRA W911NF-17- 2-0181, ISTC-WAS and Intel DevCloud. E. Isufi is with the Multimedia Computing Group and G. Leus is with the Circuits and Systems Group, Delft Univ. of Technology, The Netherlands.
Knowledge Graphs
Hogan, Aidan, Blomqvist, Eva, Cochez, Michael, d'Amato, Claudia, de Melo, Gerard, Gutierrez, Claudio, Gayo, José Emilio Labra, Kirrane, Sabrina, Neumaier, Sebastian, Polleres, Axel, Navigli, Roberto, Ngomo, Axel-Cyrille Ngonga, Rashid, Sabbir M., Rula, Anisa, Schmelzeisen, Lukas, Sequeda, Juan, Staab, Steffen, Zimmermann, Antoine
In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After a general introduction, we motivate and contrast various graph-based data models and query languages that are used for knowledge graphs. We discuss the roles of schema, identity, and context in knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We summarise methods for the creation, enrichment, quality assessment, refinement, and publication of knowledge graphs. We provide an overview of prominent open knowledge graphs and enterprise knowledge graphs, their applications, and how they use the aforementioned techniques. We conclude with high-level future research directions for knowledge graphs.
Analyzing Accuracy Loss in Randomized Smoothing Defenses
Gao, Yue, Rosenberg, Harrison, Fawaz, Kassem, Jha, Somesh, Hsu, Justin
Recent advances in machine learning (ML) algorithms, especially deep neural networks (DNNs), have demonstrated remarkable success (sometimes exceeding human-level performance) on several tasks, including face and speech recognition. However, ML algorithms are vulnerable to \emph{adversarial attacks}, such test-time, training-time, and backdoor attacks. In test-time attacks an adversary crafts adversarial examples, which are specially crafted perturbations imperceptible to humans which, when added to an input example, force a machine learning model to misclassify the given input example. Adversarial examples are a concern when deploying ML algorithms in critical contexts, such as information security and autonomous driving. Researchers have responded with a plethora of defenses. One promising defense is \emph{randomized smoothing} in which a classifier's prediction is smoothed by adding random noise to the input example we wish to classify. In this paper, we theoretically and empirically explore randomized smoothing. We investigate the effect of randomized smoothing on the feasible hypotheses space, and show that for some noise levels the set of hypotheses which are feasible shrinks due to smoothing, giving one reason why the natural accuracy drops after smoothing. To perform our analysis, we introduce a model for randomized smoothing which abstracts away specifics, such as the exact distribution of the noise. We complement our theoretical results with extensive experiments.
Certification of Semantic Perturbations via Randomized Smoothing
Fischer, Marc, Baader, Maximilian, Vechev, Martin
Deep neural networks are vulnerable to adversarial examples (Szegedy et al., 2014) - semantical preserving changes such as l p -noise, geometrical perturbations (e.g., rotations and translation) (Engstrom et al., 2017), and Wasserstein perturbations (Wong et al., 2019) which can affect the output of the network in undesirable ways. This is especially problematic when these models are used in safety critical tasks such as medical diagnosis (Amato et al., 2013) or autonomous driving (Bojarski et al., 2016). As a result, recent work (e.g., Gehr et al. (2018); Weng et al. (2018)) started investigating robustness certification methods which guarantee the absence of adversarial examples. However, even with training methods tailored to produce networks amenable to l -certification (Wong & Kolter, 2018; Mirman et al., 2018), current verification techniques still cannot scale to realistic models and datasets. Recently, a promising approach called randomized smoothing was proposed by (Cohen et al., 2019) - it works by constructing a probabilistic classifier with probabilistic certificates and produces state-of-the-art results for l 2 -norm bounded noise on ImageNet.
Knowledge Cores in Large Formal Contexts
Knowledge computation tasks are often infeasible for large data sets. This is in particular true when deriving knowledge bases in formal concept analysis (FCA). Hence, it is essential to come up with techniques to cope with this problem. Many successful methods are based on random processes to reduce the size of the investigated data set. This, however, makes them hardly interpretable with respect to the discovered knowledge. Other approaches restrict themselves to highly supported subsets and omit rare and interesting patterns. An essentially different approach is used in network science, called $k$-cores. These are able to reflect rare patterns if they are well connected in the data set. In this work, we study $k$-cores in the realm of FCA by exploiting the natural correspondence to bi-partite graphs. This structurally motivated approach leads to a comprehensible extraction of knowledge cores from large formal contexts data sets.
Variational Inference and Bayesian CNNs for Uncertainty Estimation in Multi-Factorial Bone Age Prediction
Eggenreich, Stefan, Payer, Christian, Urschler, Martin, Štern, Darko
Additionally to the extensive use in clinical medicine, biological age (BA) in legal medicine is used to assess unknown chronological age (CA) in applications where identification documents are not available. Automatic methods for age estimation proposed in the literature are predicting point estimates, which can be misleading without the quantification of predictive uncertainty. In our multi-factorial age estimation method from MRI data, we used the Variational Inference approach to estimate the uncertainty of a Bayesian CNN model. Distinguishing model uncertainty from data uncertainty, we interpreted data uncertainty as biological variation, i.e. the range of possible CA of subjects having the same BA.