Goto

Collaborating Authors

 Banff


Diverse Exploration via InfoMax Options

arXiv.org Artificial Intelligence

In this paper, we study the problem of autonomously discovering temporally abstracted actions, or options, for exploration in reinforcement learning. For learning diverse options suitable for exploration, we introduce the infomax termination objective defined as the mutual information between options and their corresponding state transitions. We derive a scalable optimization scheme for maximizing this objective via the termination condition of options, yielding the InfoMax Option Critic (IMOC) algorithm. Through illustrative experiments, we empirically show that IMOC learns diverse options and utilizes them for exploration. Moreover, we show that IMOC scales well to continuous control tasks.


Probabilistic 3D surface reconstruction from sparse MRI information

arXiv.org Artificial Intelligence

Surface reconstruction from magnetic resonance (MR) imaging data is indispensable in medical image analysis and clinical research. A reliable and effective reconstruction tool should: be fast in prediction of accurate well localised and high resolution models, evaluate prediction uncertainty, work with as little input data as possible. Current deep learning state of the art (SOTA) 3D reconstruction methods, however, often only produce shapes of limited variability positioned in a canonical position or lack uncertainty evaluation. In this paper, we present a novel probabilistic deep learning approach for concurrent 3D surface reconstruction from sparse 2D MR image data and aleatoric uncertainty prediction. Our method is capable of reconstructing large surface meshes from three quasi-orthogonal MR imaging slices from limited training sets whilst modelling the location of each mesh vertex through a Gaussian distribution. Prior shape information is encoded using a built-in linear principal component analysis (PCA) model. Extensive experiments on cardiac MR data show that our probabilistic approach successfully assesses prediction uncertainty while at the same time qualitatively and quantitatively outperforms SOTA methods in shape prediction. Compared to SOTA, we are capable of properly localising and orientating the prediction via the use of a spatially aware neural network.


Adversarial Boot Camp: label free certified robustness in one epoch

arXiv.org Machine Learning

Machine learning models are vulnerable to adversarial attacks. One approach to addressing this vulnerability is certification, which focuses on models that are guaranteed to be robust for a given perturbation size. A drawback of recent certified models is that they are stochastic: they require multiple computationally expensive model evaluations with random noise added to a given input. In our work, we present a deterministic certification approach which results in a certifiably robust model. This approach is based on an equivalence between training with a particular regularized loss, and the expected values of Gaussian averages. We achieve certified models on ImageNet-1k by retraining a model with this loss for one epoch without the use of label information.


Learning Structured Latent Factors from Dependent Data:A Generative Model Framework from Information-Theoretic Perspective

arXiv.org Machine Learning

Learning controllable and generalizable representation of multivariate data with desired structural properties remains a fundamental problem in machine learning. In this paper, we present a novel framework for learning generative models with various underlying structures in the latent space. We represent the inductive bias in the form of mask variables to model the dependency structure in the graphical model and extend the theory of multivariate information bottleneck to enforce it. Our model provides a principled approach to learn a set of semantically meaningful latent factors that reflect various types of desired structures like capturing correlation or encoding invariance, while also offering the flexibility to automatically estimate the dependency structure from data. We show that our framework unifies many existing generative models and can be applied to a variety of tasks including multi-modal data modeling, algorithmic fairness, and invariant risk minimization.


Improved High Dimensional Discrete Bayesian Network Inference using Triplet Region Construction

Journal of Artificial Intelligence Research

Performing efficient inference on high dimensional discrete Bayesian Networks (BNs) is challenging. When using exact inference methods the space complexity can grow exponentially with the tree-width, thus making computation intractable. This paper presents a general purpose approximate inference algorithm, based on a new region belief approximation method, called Triplet Region Construction (TRC). TRC reduces the cluster space complexity for factorized models from worst-case exponential to polynomial by performing graph factorization and producing clusters of limited size. Unlike previous generations of region-based algorithms, TRC is guaranteed to converge and effectively addresses the region choice problem that bedevils other region-based algorithms used for BN inference. Our experiments demonstrate that it also achieves significantly more accurate results than competing algorithms.


Learning Optimal Representations with the Decodable Information Bottleneck

arXiv.org Machine Learning

We address the question of characterizing and finding optimal representations for supervised learning. Traditionally, this question has been tackled using the Information Bottleneck, which compresses the inputs while retaining information about the targets, in a decoder-agnostic fashion. In machine learning, however, our goal is not compression but rather generalization, which is intimately linked to the predictive family or decoder of interest (e.g. linear classifier). We propose the Decodable Information Bottleneck (DIB) that considers information retention and compression from the perspective of the desired predictive family. As a result, DIB gives rise to representations that are optimal in terms of expected test performance and can be estimated with guarantees. Empirically, we show that the framework can be used to enforce a small generalization gap on downstream classifiers and to predict the generalization ability of neural networks.


Softmax Tempering for Training Neural Machine Translation Models

arXiv.org Artificial Intelligence

Neural machine translation (NMT) models are typically trained using a softmax cross-entropy loss where the softmax distribution is compared against smoothed gold labels. In low-resource scenarios, NMT models tend to over-fit because the softmax distribution quickly approaches the gold label distribution. To address this issue, we propose to divide the logits by a temperature coefficient, prior to applying softmax, during training. In our experiments on 11 language pairs in the Asian Language Treebank dataset and the WMT 2019 English-to-German translation task, we observed significant improvements in translation quality by up to 3.9 BLEU points. Furthermore, softmax tempering makes the greedy search to be as good as beam search decoding in terms of translation quality, enabling 1.5 to 3.5 times speed-up. We also study the impact of softmax tempering on multilingual NMT and recurrently stacked NMT, both of which aim to reduce the NMT model size by parameter sharing thereby verifying the utility of temperature in developing compact NMT models. Finally, an analysis of softmax entropies and gradients reveal the impact of our method on the internal behavior of NMT models.


A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning

arXiv.org Artificial Intelligence

Structured belief states are crucial for user goal tracking and database query in task-oriented dialog systems. However, training belief trackers often requires expensive turn-level annotations of every user utterance. In this paper we aim at alleviating the reliance on belief state labels in building end-to-end dialog systems, by leveraging unlabeled dialog data towards semi-supervised learning. We propose a probabilistic dialog model, called the LAtent BElief State (LABES) model, where belief states are represented as discrete latent variables and jointly modeled with system responses given user inputs. Such latent variable modeling enables us to develop semi-supervised learning under the principled variational learning framework. Furthermore, we introduce LABES-S2S, which is a copy-augmented Seq2Seq model instantiation of LABES. In supervised experiments, LABES-S2S obtains strong results on three benchmark datasets of different scales. In utilizing unlabeled dialog data, semi-supervised LABES-S2S significantly outperforms both supervised-only and semi-supervised baselines. Remarkably, we can reduce the annotation demands to 50% without performance loss on MultiWOZ.


Defending SVMs against Poisoning Attacks: the Hardness and DBSCAN Approach

arXiv.org Machine Learning

Adversarial machine learning has attracted a great amount of attention in recent years. In a poisoning attack, the adversary can inject a small number of specially crafted samples into the training data which make the decision boundary severely deviate and cause unexpected misclassification. Due to the great importance and popular use of support vector machines (SVM), we consider defending SVM against poisoning attacks in this paper. We study two commonly used strategies for defending: designing robust SVM algorithms and data sanitization. Though several robust SVM algorithms have been proposed before, most of them either are in lack of adversarial-resilience, or rely on strong assumptions about the data distribution or the attacker's behavior. Moreover, the research on their complexities is still quite limited. We are the first, to the best of our knowledge, to prove that even the simplest hard-margin one-class SVM with outliers problem is NP-complete, and has no fully PTAS unless P$=$NP (that means it is hard to achieve an even approximate algorithm). For the data sanitization defense, we link it to the intrinsic dimensionality of data; in particular, we provide a sampling theorem in doubling metrics for explaining the effectiveness of DBSCAN (as a density-based outlier removal method) for defending against poisoning attacks. In our empirical experiments, we compare several defenses including the DBSCAN and robust SVM methods, and investigate the influences from the intrinsic dimensionality and data density to their performances.


Question Directed Graph Attention Network for Numerical Reasoning over Text

arXiv.org Artificial Intelligence

Although NumNet achieves superior performance than Numerical reasoning over texts, such as addition, other numerically-aware models (Hu et al., 2019a; Andor subtraction, sorting and counting, is a et al., 2019; Geva et al., 2020; Chen et al., 2020), we challenging machine reading comprehension argue that NumNet is insufficient for sophisticated numerical task, since it requires both natural language understanding reasoning, since it lacks two critical ingredients and arithmetic computation. To for numerical reasoning: address this challenge, we propose a heterogeneous 1. Number Type and Entity Mention. The number graph representation for the context of comparison graph in NumNet is not able to identify the passage and question needed for such reasoning, different number types, and lacks the information of and design a question directed graph entities mentioned in the document that connect the attention network to drive multi-step numerical number nodes.