Banff
Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections
Muzellec, Boris, Cuturi, Marco
Sliced Wasserstein metrics between probability measures solve the optimal transport (OT) problem on univariate projections, and average such maps across projections. The recent interest for the SW distance shows that much can be gained by looking at optimal maps between measures in smaller subspaces, as opposed to the curse-of-dimensionality price one has to pay in higher dimensions. Any transport estimated in a subspace remains, however, an object that can only be used in that subspace. We propose in this work two methods to extrapolate, from an transport map that is optimal on a subspace, one that is nearly optimal in the entire space. We prove that the best optimal transport plan that takes such "subspace detours" is a generalization of the Knothe-Rosenblatt transport. We show that these plans can be explicitly formulated when comparing Gaussians measures (between which the Wasserstein distance is usually referred to as the Bures or Fr\'echet distance). Building from there, we provide an algorithm to select optimal subspaces given pairs of Gaussian measures, and study scenarios in which that mediating subspace can be selected using prior information. We consider applications to NLP and evaluation of image quality (FID scores).
Stability Properties of Graph Neural Networks
Gama, Fernando, Bruna, Joan, Ribeiro, Alejandro
Data stemming from networks exhibit an irregular support, whereby each data element is related by arbitrary pairwise relationships determined by the network. Graph neural networks (GNNs) have emerged as information processing architectures that exploit the particularities of this underlying support. The use of nonlinearities in GNNs, coupled with the fact that filters are learned from data, raises mathematical challenges that have precluded the development of theoretical results that would give insight in the reasons for the remarkable performance of GNNs. In this work, we prove the property of stability, that states that a small change in the support of the data leads to a small (bounded) change in the output of the GNN. More specifically, we prove that the bound on the output difference of the GNN computed on one graph or another, is proportional to the difference between the graphs and the design parameters of the GNN, as long as the trained filters are integral Lipschitz. We exploit this result to provide some insights in the crucial effect that nonlinearities have in obtaining an architecture that is both stable and selective, a feat that is impossible to achieve if using only linear filters.
A Fundamental Performance Limitation for Adversarial Classification
Makdah, Abed AlRahman Al, Katewa, Vaibhav, Pasqualetti, Fabio
Despite the widespread use of machine learning algorithms to solve problems of technological, economic, and social relevance, provable guarantees on the performance of these data-driven algorithms are critically lacking, especially when the data originates from unreliable sources and is transmitted over unprotected and easily accessible channels. In this paper we take an important step to bridge this gap and formally show that, in a quest to optimize their accuracy, binary classification algorithms -- including those based on machine-learning techniques -- inevitably become more sensitive to adversarial manipulation of the data. Further, for a given class of algorithms with the same complexity (i.e., number of classification boundaries), the fundamental tradeoff curve between accuracy and sensitivity depends solely on the statistics of the data, and cannot be improved by tuning the algorithm.
An Approach to Characterize Graded Entailment of Arguments through a Label-based Framework
Budán, Maximiliano C. D., Simari, Gerardo I., Viglizzo, Ignacio, Simari, Guillermo R.
Argumentation theory is a powerful paradigm that formalizes a type of commonsense reasoning that aims to simulate the human ability to resolve a specific problem in an intelligent manner. A classical argumentation process takes into account only the properties related to the intrinsic logical soundness of an argument in order to determine its acceptability status. However, these properties are not always the only ones that matter to establish the argument's acceptability---there exist other qualities, such as strength, weight, social votes, trust degree, relevance level, and certainty degree, among others.
MaCow: Masked Convolutional Generative Flow
Flow-based generative models, conceptually attractive due to tractability of both the exact log-likelihood computation and latent-variable inference, and efficiency of both training and sampling, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations. Despite their computational efficiency, the density estimation performance of flow-based generative models significantly falls behind those of state-of-the-art autoregressive models. In this work, we introduce masked convolutional generative flow (MaCow), a simple yet effective architecture of generative flow using masked convolution. By restricting the local connectivity in a small kernel, MaCow enjoys the properties of fast and stable training, and efficient sampling, while achieving significant improvements over Glow for density estimation on standard image benchmarks, considerably narrowing the gap to autoregressive models.
Binarized Knowledge Graph Embeddings
Kishimoto, Koki, Hayashi, Katsuhiko, Akai, Genki, Shimbo, Masashi, Komatani, Kazunori
Tensor factorization has become an increasingly popular approach to knowledge graph completion(KGC), which is the task of automatically predicting missing facts in a knowledge graph. However, even with a simple model like CANDECOMP/PARAFAC(CP) tensor decomposition, KGC on existing knowledge graphs is impractical in resource-limited environments, as a large amount of memory is required to store parameters represented as 32-bit or 64-bit floating point numbers. This limitation is expected to become more stringent as existing knowledge graphs, which are already huge, keep steadily growing in scale. To reduce the memory requirement, we present a method for binarizing the parameters of the CP tensor decomposition by introducing a quantization function to the optimization problem. This method replaces floating point-valued parameters with binary ones after training, which drastically reduces the model size at run time. We investigate the trade-off between the quality and size of tensor factorization models for several KGC benchmark datasets. In our experiments, the proposed method successfully reduced the model size by more than an order of magnitude while maintaining the task performance. Moreover, a fast score computation technique can be developed with bitwise operations.
MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders
Ma, Xuezhe, Zhou, Chunting, Hovy, Eduard
Variational Autoencoder (VAE), a simple and effective deep generative model, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations. However, recent studies demonstrate that, when equipped with expressive generative distributions (aka. decoders), VAE suffers from learning uninformative latent representations with the observation called KL Varnishing, in which case VAE collapses into an unconditional generative model. In this work, we introduce mutual posterior-divergence regularization, a novel regularization that is able to control the geometry of the latent space to accomplish meaningful representation learning, while achieving comparable or superior capability of density estimation. Experiments on three image benchmark datasets demonstrate that, when equipped with powerful decoders, our model performs well both on density estimation and representation learning.
Unary and Binary Classification Approaches and their Implications for Authorship Verification
Halvani, Oren, Winter, Christian, Graner, Lukas
Retrieving indexed documents, not by their topical content but their writing style opens the door for a number of applications in information retrieval (IR). One application is to retrieve textual content of a certain author X, where the queried IR system is provided beforehand with a set of reference texts of X. Authorship verification (AV), which is a research subject in the field of digital text forensics, is suitable for this purpose. The task of AV is to determine if two documents (i.e. an indexed and a reference document) have been written by the same author X. Even though AV represents a unary classification problem, a number of existing approaches consider it as a binary classification task. However, the underlying classification model of an AV method has a number of serious implications regarding its prerequisites, evaluability, and applicability. In our comprehensive literature review, we observed several misunderstandings regarding the differentiation of unary and binary AV approaches that require consideration. The objective of this paper is, therefore, to clarify these by proposing clear criteria and new properties that aim to improve the characterization of existing and future AV approaches. Given both, we investigate the applicability of eleven existing unary and binary AV methods as well as four generic unary classification algorithms on two self-compiled corpora. Furthermore, we highlight an important issue concerning the evaluation of AV methods based on fixed decision criterions, which has not been paid attention in previous AV studies.
Robustness of Adaptive Quantum-Enhanced Phase Estimation
Palittapongarnpim, Pantita, Sanders, Barry C.
As all physical adaptive quantum-enhanced metrology schemes operate under noisy conditions with only partially understood noise characteristics, so a practical control policy must be robust even for unknown noise. We aim to devise a test to evaluate the robustness of AQEM policies and assess the resource used by the policies. The robustness test is performed on adaptive phase estimation by simulating the scheme under four phase noise models corresponding to the normal-distribution noise, the random telegraph noise, the skew-normal-distribution noise, and the log-normal-distribution noise. The control policies are devised either by a reinforcement-learning algorithm in the same noise condition, albeit ignorant of its properties, or a Bayesian-based feedback method that assumes no noise. Our robustness test and resource comparison can be used to determining the efficacy and selecting a suitable policy.
Geodesic Clustering in Deep Generative Models
Yang, Tao, Arvanitidis, Georgios, Fu, Dongmei, Li, Xiaogang, Hauberg, Søren
Deep generative models are tremendously successful in learning low-dimensional latent representations that well-describe the data. These representations, however, tend to much distort relationships between points, i.e. pairwise distances tend to not reflect semantic similarities well. This renders unsupervised tasks, such as clustering, difficult when working with the latent representations. We demonstrate that taking the geometry of the generative model into account is sufficient to make simple clustering algorithms work well over latent representations. Leaning on the recent finding that deep generative models constitute stochastically immersed Riemannian manifolds, we propose an efficient algorithm for computing geodesics (shortest paths) and computing distances in the latent space, while taking its distortion into account. We further propose a new architecture for modeling uncertainty in variational autoencoders, which is essential for understanding the geometry of deep generative models. Experiments show that the geodesic distance is very likely to reflect the internal structure of the data.