Directed Networks
Partitioned integrators for thermodynamic parameterization of neural networks
Leimkuhler, Benedict, Matthews, Charles, Vlaar, Tiffany
Stochastic Gradient Langevin Dynamics, the "unadjusted Langevin algorithm", and Adaptive Langevin Dynamics (also known as Stochastic Gradient Nos\'{e}-Hoover dynamics) are examples of existing thermodynamic parameterization methods in use for machine learning, but these can be substantially improved. We find that by partitioning the parameters based on natural layer structure we obtain schemes with rapid convergence for data sets with complicated loss landscapes. We describe easy-to-implement hybrid partitioned numerical algorithms, based on discretized stochastic differential equations, which are adapted to feed-forward neural networks, including LaLa (a multi-layer Langevin algorithm), AdLaLa (combining the adaptive Langevin and Langevin algorithms) and LOL (combining Langevin and Overdamped Langevin); we examine the convergence of these methods using numerical studies and compare their performance among themselves and in relation to standard alternatives such as stochastic gradient descent and ADAM. We present evidence that thermodynamic parameterization methods can be (i) faster, (ii) more accurate, and (iii) more robust than standard algorithms incorporated into machine learning frameworks, in particular for data sets with complicated loss landscapes. Moreover, we show in numerical studies that methods based on sampling excite many degrees of freedom. The equipartition property, which is a consequence of their ergodicity, means that these methods keep in play an ensemble of low-loss states during the training process. By drawing parameter states from a sufficiently rich distribution of nearby candidate states, we show that the thermodynamic schemes produce smoother classifiers, improve generalization and reduce overfitting compared to traditional optimizers.
A Queuing Approach to Parking: Modeling, Verification, and Prediction
Tavafoghi, Hamidreza, Poolla, Kameshwar, Varaiya, Pravin
We present a queuing model of parking dynamics and a model-based prediction method to provide real-time probabilistic forecasts of future parking occupancy. The queuing model has a non-homogeneous arrival rate and time-varying service time distribution. All statistical assumptions of the model are verified using data from 29 truck parking locations, each with between 55 and 299 parking spots. For each location and each spot the data specifies the arrival and departure times of a truck, for 16 months of operation. The modeling framework presented in this paper provides empirical support for queuing models adopted in many theoretical studies and policy designs. We discuss how our framework can be used to study parking problems in different environments. Based on the queuing model, we propose two prediction methods, a microscopic method and a macroscopic method, that provide a real-time probabilistic forecast of parking occupancy for an arbitrary forecast horizon. These model-based methods convert a probabilistic forecast problem into a parameter estimation problem that can be tackled using classical estimation methods such as regressions or pure machine learning algorithms. We characterize a lower bound for an arbitrary real-time prediction algorithm. We evaluate the performance of these methods using the truck data comparing the outcomes of their implementations with other model-based and model-free methods proposed in the literature.
Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture Model
Bando, Yoshiaki, Sasaki, Yoko, Yoshii, Kazuyoshi
This paper presents an unsupervised method that trains neural source separation by using only multichannel mixture signals. Conventional neural separation methods require a lot of supervised data to achieve excellent performance. Although multichannel methods based on spatial information can work without such training data, they are often sensitive to parameter initialization and degraded with the sources located close to each other. The proposed method uses a cost function based on a spatial model called a complex Gaussian mixture model (cGMM). This model has the time-frequency (TF) masks and direction of arrivals (DoAs) of sources as latent variables and is used for training separation and localization networks that respectively estimate these variables. This joint training solves the frequency permutation ambiguity of the spatial model in a unified deep Bayesian framework. In addition, the pre-trained network can be used not only for conducting monaural separation but also for efficiently initializing a multichannel separation algorithm. Experimental results with simulated speech mixtures showed that our method outperformed a conventional initialization method.
Deep Neural Network Ensembles against Deception: Ensemble Diversity, Accuracy and Robustness
Liu, Ling, Wei, Wenqi, Chow, Ka-Ho, Loper, Margaret, Gursoy, Emre, Truex, Stacey, Wu, Yanzhao
We develop a three - step diversity ensemble creation algorithm: (1) Creating a pool of candidate ensemble member models, or so called base models; (2) Creating a pool of candidate ensemble teams with their diversity scores higher than the pre - defined minimum diversity threshold; and (3) Developing robust ensemble consensus methods, which can effectively combine, rank and integrate predictions from members of an ensemble committee to produce high accuracy ensemble prediction output again st adversarial examples. D ifferent ensemble creation methods tend to have varying level of diversity. A. Creating Ensemble s of Type 1 diversity We want to construct a pool of N redundant DNN models trained on the same learning task as the base classifiers. Preferably, the best ensemble committee members are those base classifiers that are relatively diverse and have high individual test accuracy. T he type 1 diversity ensemble creation algorithm requires that every base model in the pool meet s the type 1 dive rsity and ha s high benign test accuracy comparable to that of the target model under protection. One approach is to add one member model to the pool at a time. Assume that we initialize the pool with a privately trained DNN model. We only allow the next mo del to be added to the pool if it is trained independently using different hyper - parameters or different neural network structures or algorithms and it meet s the high benign test accuracy requirement.
Deep Learning Theory Review: An Optimal Control and Dynamical Systems Perspective
Liu, Guan-Horng, Theodorou, Evangelos A.
Attempts from different disciplines to provide a fundamental understanding of deep learning have advanced rapidly in recent years, yet a unified framework remains relatively limited. In this article, we provide one possible way to align existing branches of deep learning theory through the lens of dynamical system and optimal control. By viewing deep neural networks as discrete-time nonlinear dynamical systems, we can analyze how information propagates through layers using mean field theory. When optimization algorithms are further recast as controllers, the ultimate goal of training processes can be formulated as an optimal control problem. In addition, we can reveal convergence and generalization properties by studying the stochastic dynamics of optimization algorithms. This viewpoint features a wide range of theoretical study from information bottleneck to statistical physics. It also provides a principled way for hyper-parameter tuning when optimal control theory is introduced. Our framework fits nicely with supervised learning and can be extended to other learning problems, such as Bayesian learning, adversarial training, and specific forms of meta learning, without efforts. The review aims to shed lights on the importance of dynamics and optimal control when developing deep learning theory.
Bayes EMbedding (BEM): Refining Representation by Integrating Knowledge Graphs and Behavior-specific Networks
Ye, Yuting, Wang, Xuwu, Yao, Jiangchao, Jia, Kunyang, Zhou, Jingren, Xiao, Yanghua, Yang, Hongxia
Low-dimensional embeddings of knowledge graphs and behavior graphs have proved remarkably powerful in varieties of tasks, from predicting unobserved edges between entities to content recommendation. The two types of graphs can contain distinct and complementary information for the same entities/nodes. However, previous works focus either on knowledge graph embedding or behavior graph embedding while few works consider both in a unified way. Here we present BEM , a Bayesian framework that incorporates the information from knowledge graphs and behavior graphs. To be more specific, BEM takes as prior the pre-trained embeddings from the knowledge graph, and integrates them with the pre-trained embeddings from the behavior graphs via a Bayesian generative model. BEM is able to mutually refine the embeddings from both sides while preserving their own topological structures. To show the superiority of our method, we conduct a range of experiments on three benchmark datasets: node classification, link prediction, triplet classification on two small datasets related to Freebase, and item recommendation on a large-scale e-commerce dataset.
On the overestimation of widely applicable Bayesian information criterion
A widely applicable Bayesian information criterion (Watanabe, 2013) is applicable for both regular and singular models in the model selection problem. This criterion tends to overestimate the log marginal likelihood. We identify an overestimating term of a widely applicable Bayesian information criterion. Adjustment of the term gives an asymptotically unbiased estimator of the leading two terms of asymptotic expansion of the log marginal likelihood. In numerical experiments on regular and singular models, the adjustment resulted in smaller bias than the original criterion.
Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification
Xiang, Zhen, Miller, David J., Kesidis, George
Recently, a special type of data poisoning (DP) attack targeting Deep Neural Network (DNN) classifiers, known as a backdoor, was proposed. These attacks do not seek to degrade classification accuracy, but rather to have the classifier learn to classify to a target class whenever the backdoor pattern is present in a test example. Launching backdoor attacks does not require knowledge of the classifier or its training process - it only needs the ability to poison the training set with (a sufficient number of) exemplars containing a sufficiently strong backdoor pattern (labeled with the target class). Here we address post-training detection of backdoor attacks in DNN image classifiers, seldom considered in existing works, wherein the defender does not have access to the poisoned training set, but only to the trained classifier itself, as well as to clean examples from the classification domain. This is an important scenario because a trained classifier may be the basis of e.g. a phone app that will be shared with many users. Detecting backdoors post-training may thus reveal a widespread attack. We propose a purely unsupervised anomaly detection (AD) defense against imperceptible backdoor attacks that: i) detects whether the trained DNN has been backdoor-attacked; ii) infers the source and target classes involved in a detected attack; iii) we even demonstrate it is possible to accurately estimate the backdoor pattern. We test our AD approach, in comparison with alternative defenses, for several backdoor patterns, data sets, and attack settings and demonstrate its favorability. Our defense essentially requires setting a single hyperparameter (the detection threshold), which can e.g. be chosen to fix the system's false positive rate.
Model Selection With Graphical Neighbour Information
Accurate m odel selection is a fundamental requirement for statistical analysis (1 - 5) . In many real - world applications of graphical modelling, correct model structure ident ifica tion is the ultimate objective. S tandard model validation procedures such as information theoretic scores and cross validation have demonstr ated poor performance when . Specialised methods such as EBIC, StARS and RIC have been developed for the explicit purpose of high - dimensional Gaussian graphical model selection. We present a novel model score criterion, Graphical Neighbour Information. This method demonstrates oracle performance in high - dimensional model selection, outperforming the current state - of - the - a rt in our simulations. The Graphical Neighbour Information criterion has the additional advantage of efficient, closed - form computability, sparing the costly inference of multiple models on data subsamples. We provide a theoretic analysis of the method and benchmark simulations versus the current state of the art .