Oceania
Fast and Provable ADMM for Learning with Generative Priors
Gómez, Fabian Latorre, Eftekhari, Armin, Cevher, Volkan
In this work, we propose a (linearized) Alternating Direction Method-of-Multipliers (ADMM) algorithm for minimizing a convex function subject to a nonconvex constraint. We focus on the special case where such constraint arises from the specification that a variable should lie in the range of a neural network. This is motivated by recent successful applications of Generative Adversarial Networks (GANs) in tasks like compressive sensing, denoising and robustness against adversarial examples. The derived rates for our algorithm are characterized in terms of certain geometric properties of the generator network, which we show hold for feedforward architectures, under mild assumptions. Unlike gradient descent (GD), it can efficiently handle non-smooth objectives as well as exploit efficient partial minimization procedures, thus being faster in many practical scenarios.
On Conforming and Conflicting Values
Chhogyal, Kinzang, Nayak, Abhaya, Ghose, Aditya, Orgun, Mehmet, Dam, Hoa
Values are things that are important to us. Actions activate values - they either go against our values or they promote our values. Values themselves can either be conforming or conflicting depending on the action that is taken. In this short paper, we argue that values may be classified as one of two types - conflicting and inherently conflicting values. They are distinguished by the fact that the latter in some sense can be thought of as being independent of actions. This allows us to do two things: i) check whether a set of values is consistent and ii) check whether it is in conflict with other sets of values.
Learning Neural Sequence-to-Sequence Models from Weak Feedback with Bipolar Ramp Loss
Jehl, Laura, Lawrence, Carolin, Riezler, Stefan
In many machine learning scenarios, supervision by gold labels is not available and consequently neural models cannot be trained directly by maximum likelihood estimation (MLE). In a weak supervision scenario, metric-augmented objectives can be employed to assign feedback to model outputs, which can be used to extract a supervision signal for training. We present several objectives for two separate weakly supervised tasks, machine translation and semantic parsing. We show that objectives should actively discourage negative outputs in addition to promoting a surrogate gold structure. This notion of bipolarity is naturally present in ramp loss objectives, which we adapt to neural models. We show that bipolar ramp loss objectives outperform other non-bipolar ramp loss objectives and minimum risk training (MRT) on both weakly supervised tasks, as well as on a supervised machine translation task. Additionally, we introduce a novel token-level ramp loss objective, which is able to outperform even the best sequence-level ramp loss on both weakly supervised tasks.
Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction
Senoussaoui, Mohammed, Cardinal, Patrick, Koerich, Alessandro Lameiras
In this paper we present a novel approach for extracting a Bag-of-Words (BoW) representation based on a Neural Network codebook. The conventional BoW model is based on a dictionary (codebook) built from elementary representations which are selected randomly or by using a clustering algorithm on a training dataset. A metric is then used to assign unseen elementary representations to the closest dictionary entries in order to produce a histogram. In the proposed approach, an autoencoder (AE) encompasses the role of both the dictionary creation and the assignment metric. The dimension of the encoded layer of the AE corresponds to the size of the dictionary and the output of its neurons represents the assignment metric. Experimental results for the continuous emotion prediction task on the AVEC 2017 audio dataset have shown an improvement of the Concordance Correlation Coefficient (CCC) from 0.225 to 0.322 for arousal dimension and from 0.244 to 0.368 for valence dimension relative to the conventional BoW version implemented in a baseline system.
Generalization Bounds for Unsupervised Cross-Domain Mapping with WGANs
Galanti, Tomer, Benaim, Sagie, Wolf, Lior
The recent empirical success of unsupervised cross-domain mapping algorithms, between two domains that share common characteristics, is not well-supported by theoretical justifications. This lacuna is especially troubling, given the clear ambiguity in such mappings. We work with the adversarial training method called the Wasserstein GAN and derive a novel generalization bound, which limits the risk between the learned mapping $h$ and the target mapping $y$, by a sum of two terms: (i) the risk between $h$ and the most distant alternative mapping that was learned by the same cross-domain mapping algorithm, and (ii) the minimal Wasserstein GAN divergence between the target domain and the domain obtained by applying a hypothesis $h^*$ on the samples of the source domain, where $h^*$ is a hypothesis selected by the same algorithm. The bound is directly related to Occam's razor and encourages the selection of the minimal architecture that supports a small Wasserstein GAN divergence. The bound leads to multiple algorithmic consequences, including a method for hyperparameters selection and for an early stopping in cross-domain mapping GANs. We also demonstrate a novel capability for unsupervised learning of estimating confidence in the mapping of every specific sample. Lastly, we show how non-minimal architectures can be effectively trained by an inverted knowledge distillation, in which a minimal architecture is used to train a larger one, leading to higher quality outputs.
Evaluating the distribution learning capabilities of GANs
Rege, Amit, Monteleoni, Claire
We find that by and large GANs fail to faithfully datasets. To our knowledge, the only instance of synthetic recreate point datasets which contain discontinous image datasets used for GAN evaluation have been to learn support or sharp bends with noise. Additionally, manifolds of convex polygons (specifically triangles) (Lucic on image datasets, we find that GANs do et al., 2018). Although, we also use polygons as a testbed not seem to learn to count the number of objects for our experiments, we focus on learning a manifold with of the same kind in an image. We also highlight multiple polygons where their number is fixed.
Probabilistic CCA with Implicit Distributions
Shi, Yaxin, Pan, Yuangang, Xu, Donna, Tsang, Ivor
Canonical Correlation Analysis (CCA) is a classic technique for multi-view data analysis. To overcome the deficiency of linear correlation in practical multi-view learning tasks, various CCA variants were proposed to capture nonlinear dependency. However, it is non-trivial to have an in-principle understanding of these variants due to their inherent restrictive assumption on the data and latent code distributions. Although some works have studied probabilistic interpretation for CCA, these models still require the explicit form of the distributions to achieve a tractable solution for the inference. In this work, we study probabilistic interpretation for CCA based on implicit distributions. We present Conditional Mutual Information (CMI) as a new criterion for CCA to consider both linear and nonlinear dependency for arbitrarily distributed data. To eliminate direct estimation for CMI, in which explicit form of the distributions is still required, we derive an objective which can provide an estimation for CMI with efficient inference methods. To facilitate Bayesian inference of multi-view analysis, we propose Adversarial CCA (ACCA), which achieves consistent encoding for multi-view data with the consistent constraint imposed on the marginalization of the implicit posteriors. Such a model would achieve superiority in the alignment of the multi-view data with implicit distributions. It is interesting to note that most of the existing CCA variants can be connected with our proposed CCA model by assigning specific form for the posterior and likelihood distributions. Extensive experiments on nonlinear correlation analysis and cross-view generation on benchmark and real-world datasets demonstrate the superiority of our model.
Gathering Cyber Threat Intelligence from Twitter Using Novelty Classification
Le, Ba Dung, Wang, Guanhua, Nasim, Mehwish, Babar, Ali
Preventing organizations from Cyber exploits needs timely intelligence about Cyber vulnerabilities and attacks, referred as threats. Cyber threat intelligence can be extracted from various sources including social media platforms where users publish the threat information in real time. Gathering Cyber threat intelligence from social media sites is a time consuming task for security analysts that can delay timely response to emerging Cyber threats. We propose a framework for automatically gathering Cyber threat intelligence from Twitter by using a novelty detection model. Our model learns the features of Cyber threat intelligence from the threat descriptions published in public repositories such as Common Vulnerabilities and Exposures (CVE) and classifies a new unseen tweet as either normal or anomalous to Cyber threat intelligence. We evaluate our framework using a purpose-built data set of tweets from 50 influential Cyber security related accounts over twelve months (in 2018). Our classifier achieves the F1-score of 0.643 for classifying Cyber threat tweets and outperforms several baselines including binary classification models. Our analysis of the classification results suggests that Cyber threat relevant tweets on Twitter do not often include the CVE identifier of the related threats. Hence, it would be valuable to collect these tweets and associate them with the related CVE identifier for cyber security applications.
Circuit-Based Intrinsic Methods to Detect Overfitting
Chatterjee, Sat, Mishchenko, Alan
The focus of this paper is on intrinsic methods to detect overfitting. These rely only on the model and the training data, as opposed to traditional extrinsic methods that rely on performance on a test set or on bounds from model complexity. We propose a family of intrinsic methods called Counterfactual Simulation (CFS) which analyze the flow of training examples through the model by identifying and perturbing rare patterns. By applying CFS to logic circuits we get a method that has no hyper-parameters and works uniformly across different types of models such as neural networks, random forests and lookup tables. Experimentally, CFS can separate models with different levels of overfit using only their logic circuit representations without any access to the high level structure. By comparing lookup tables, neural networks, and random forests using CFS, we get insight into why neural networks generalize. In particular, we find that stochastic gradient descent in neural nets does not lead to "brute force" memorization, but finds common patterns (whether we train with actual or randomized labels), and neural networks are not unlike forests in this regard. Finally, we identify a limitation with our proposal that makes it unsuitable in an adversarial setting, but points the way to future work on robust intrinsic methods.
Gaussian Mixture Marginal Distributions for Modelling Remaining Pipe Wall Thickness of Critical Water Mains in Non-Destructive Evaluation
Nguyen, Linh, Miro, Jaime Valls, Shi, Lei, Vidal-Calleja, Teresa
Rapidly estimating the remaining wall thickness (RWT) is paramount for the non-destructive condition assessment evaluation of large critical metallic pipelines. A robotic vehicle with embedded magnetism-based sensors has been developed to traverse the inside of a pipeline and conduct inspections at the location of a break. However its sensing speed is constrained by the magnetic principle of operation, thus slowing down the overall operation in seeking dense RWT mapping. To ameliorate this drawback, this work proposes the partial scanning of the pipe and then employing Gaussian Processes (GPs) to infer RWT at the unseen pipe sections. Since GP prediction assumes to have normally distributed input data - which does correspond with real RWT measurements - Gaussian mixture (GM) models are proven in this work as fitting marginal distributions to effectively capture the probability of any RWT value in the inspected data. The effectiveness of the proposed approach is extensively validated from real-world data collected in collaboration with a water utility from a cast iron water main pipeline in Sydney, Australia.