Country
Multi-MotifGAN (MMGAN): Motif-targeted Graph Generation and Prediction
Gamage, Anuththari, Chien, Eli, Peng, Jianhao, Milenkovic, Olgica
Classical stochastic models, such as the Erd os-R enyi, Barabasi-Albert, and the stochastic block model generate graphs based on a predefined set of parameters, such as the probability of edge formation within and between communities [1]. In contrast, modern approaches to graph generation based on deep learning, including NetGAN [2], GraphGAN [3], and GraphRNN [4], are flexible enough to learn multiple different properties of an input graph simultaneously. The graphs generated by these architectures may be used for downstream learning tasks such as data augmentation [5], recommendation [6], and link prediction [7]. Many real-world networks consist of entities with complex mutual interrelations. Such networks cannot be modeled effectively as graphs with simple pairwise relations, despite the fact that pairwise relations provide a wealth of information for learning. Studying higher-order relationships in a graph is fundamental for our understanding of the network behavior and function. Higher-order relationships are usually termed hyperedges (collections of more than two nodes) [8, 9] or network motifs (recurrent node connectivity patterns that are statistically significant compared to some ground truth random graph model) [10]. These higher-order structures are the actual building blocks of complex networks, as they capture fundamental functional properties.
Bayesian Graph Convolutional Neural Networks using Node Copying
Pal, Soumyasundar, Regol, Florence, Coates, Mark
Graph convolutional neural networks (GCNN) have numerous applications in different graph based learning tasks. Although the techniques obtain impressive results, they often fall short in accounting for the uncertainty associated with the underlying graph structure. In the recently proposed Bayesian GCNN (BGCN) framework, this issue is tackled by viewing the observed graph as a sample from a parametric random graph model and targeting joint inference of the graph and the GCNN weights. In this paper, we introduce an alternative generative model for graphs based on copying nodes and incorporate it within the BGCN framework. Our approach has the benefit that it uses information provided by the node features and training labels in the graph topology inference. Experiments show that the proposed algorithm compares favorably to the state-of-the-art in benchmark node classification tasks.
Relevance Vector Machines for harmonization of MRI brain volumes using image descriptors
Meyer, Maria Ines, de la Rosa, Ezequiel, Van Leemput, Koen, Sima, Diana M.
With the increased need for multi-center magnetic resonance imaging studies, problems arise related to differences in hardware and software between centers. Namely, current algorithms for brain volume quantification are unreliable for the longitudinal assessment of volume changes in this type of setting. Currently most methods attempt to decrease this issue by regressing the scanner- and/or center-effects from the original data. In this work, we explore a novel approach to harmonize brain volume measurements by using only image descriptors. First, we explore the relationships between volumes and image descriptors. Then, we train a Relevance Vector Machine (RVM) model over a large multi-site dataset of healthy subjects to perform volume harmonization. Finally, we validate the method over two different datasets: i) a subset of unseen healthy controls; and ii) a test-retest dataset of multiple sclerosis (MS) patients. The method decreases scanner and center variability while preserving measurements that did not require correction in MS patient data. We show that image descriptors can be used as input to a machine learning algorithm to improve the reliability of longitudinal volumetric studies.
Maximum a-Posteriori Estimation for the Gaussian Mixture Model via Mixed Integer Nonlinear Programming
Flaherty, Patrick, Wiratchotisatian, Pitchaya, Lee, Ji Ah, Trapp, Andrew C.
We present a global optimization approach for solving the classical maximum a-posteriori (MAP) estimation problem for the Gaussian mixture model. Our approach formulates the MAP estimation problem as a mixed-integer nonlinear optimization problem (MINLP). Our method provides a certificate of global optimality, can accommodate side constraints, and is extendable to other finite mixture models. We propose an approximation to the MINLP hat transforms it into a mixed integer quadratic program (MIQP) which preserves global optimality within desired accuracy and improves computational aspects. Numerical experiments compare our method to standard estimation approaches and show that our method finds the globally optimal MAP for some standard data sets, providing a benchmark for comparing estimation methods.
Intrusion Detection for Industrial Control Systems: Evaluation Analysis and Adversarial Attacks
Zizzo, Giulio, Hankin, Chris, Maffeis, Sergio, Jones, Kevin
--Neural networks are increasingly used in security applications for intrusion detection on industrial control systems. In this work we examine two areas that must be considered for their effective use. Firstly, is their vulnerability to adversarial attacks when used in a time series setting. Secondly, is potential overestimation of performance arising from data leakage artefacts. T o investigate these areas we implement a long short-term memory (LSTM) based intrusion detection system (IDS) which effectively detects cyber-physical attacks on a water treatment testbed representing a strong baseline IDS. The first attacker is able to manipulate sensor readings on a subset of the Secure Water Treatment (SWaT) system. By creating a stream of adversarial data the attacker is able to hide the cyber-physical attacks from the IDS. For the cyber-physical attacks which are detected by the IDS, the attacker required on average 2.48 out of 12 total sensors to be compromised for the cyber-physical attacks to be hidden from the IDS. The second attacker model we explore is an L bounded attacker who can send fake readings to the IDS, but to remain imperceptible, limits their perturbations to the smallest L value needed. Additionally, we examine data leakage problems arising from tuning for F 1 score on the whole SWaT attack set and propose a method to tune detection parameters that does not utilise any attack data. If attack aftereffects are accounted for then our new parameter tuning method achieved an F 1 score of 0.811 0.0103. I NTRODUCTION Deep learning systems are known to be vulnerable to adversarial attacks at test time. By applying small changes to an input an attacker can cause a machine learning system to mis-classify with a high degree of success. There has been much work on both developing more powerful attacks [1] as well as defences [2]. However, the majority of adversarial machine learning research is focused on the image domain, with consideration of the different challenges that arise within other fields needed [3]. This phenomenon of adversarial examples becomes particularly pertinent when aiming to defend machine learn-Pre-print.
Provable Computational and Statistical Guarantees for Efficient Learning of Continuous-Action Graphical Games
In this paper, we study the problem of learning the set of pure strategy Nash equilibria and the exact structure of a continuous-action graphical game with quadratic payoffs by observing a small set of perturbed equilibria. A continuous-action graphical game can possibly have an uncountable set of Nash euqilibria. We propose a $\ell_{12}-$ block regularized method which recovers a graphical game, whose Nash equilibria are the $\epsilon$-Nash equilibria of the game from which the data was generated (true game). Under a slightly stringent condition on the parameters of the true game, our method recovers the exact structure of the graphical game. Our method has a logarithmic sample complexity with respect to the number of players. It also runs in polynomial time.
On the Relationship between Self-Attention and Convolutional Layers
Cordonnier, Jean-Baptiste, Loukas, Andreas, Jaggi, Martin
A BSTRACT Recent trends of incorporating attention mechanisms in vision have led researchers to reconsider the supremacy of convolutional layers as a primary building block. Beyond helping CNNs to handle long-range dependencies, Ramachan-dran et al. (2019) showed that attention can completely replace convolution and achieve state-of-the-art performance on vision tasks. This raises the question: do learned attention layers operate similarly to convolutional layers? This work provides evidence that attention layers can perform convolution and, indeed, they often learn to do so in practice. Specifically, we prove that a multi-head self-attention layer with sufficient number of heads is at least as expressive as any convolutional layer. Our numerical experiments then show that the phenomenon also occurs in practice, corroborating our analysis. Our code is publicly available 1 . 1 I NTRODUCTION Recent advances in Natural Language Processing (NLP) are largely attributed to the rise of the transformer (V aswani et al., 2017).
Community-preserving Graph Convolutions for Structural and Functional Joint Embedding of Brain Networks
Liu, Jiahao, Ma, Guixiang, Jiang, Fei, Lu, Chun-Ta, Yu, Philip S., Ragin, Ann B.
--Brain networks have received considerable attention given the critical significance for understanding human brain organization, for investigating neurological disorders and for clinical diagnostic applications. Most existing works in brain network analysis focus on either structural or functional connectivity, which cannot leverage the complementary information from each other . Although multi-view learning methods have been proposed to learn from both networks (or views), these methods aim to reach a consensus among multiple views, and thus distinct intrinsic properties of each view may be ignored. How to jointly learn representations from structural and functional brain networks while preserving their inherent properties is a critical problem. In this paper, we propose a framework of Siamese community-preserving graph convolutional network (SCP-GCN) to learn the structural and functional joint embedding of brain networks. Specifically, we use graph convolutions to learn the structural and functional joint embedding, where the graph structure is defined with structural connectivity and node features are from the functional connectivity. Moreover, we propose to preserve the community structure of brain networks in the graph convolutions by considering the intra-community and inter-community properties in the learning process. Furthermore, we use Siamese architecture which models the pairwise similarity learning to guide the learning process. T o evaluate the proposed approach, we conduct extensive experiments on two real brain network datasets. The experimental results demonstrate the superior performance of the proposed approach in structural and functional joint embedding for neurological disorder analysis, indicating its promising value for clinical applications. This work was done when the author was at the University of Illinois at Chicago.
Degrees of freedom for off-the-grid sparse estimation
Clarice Poon, Gabriel Peyr e † November 12, 2019 Abstract A central question in modern machine learning and imaging sciences is to quantify the number of effective parameters of vastly over-parameterized models. The degrees of freedom is a mathematically convenient way to define this number of parameters. Its computation and properties are well understood when dealing with discretized linear models, possibly regularized using sparsity. In this paper, we argue that this way of thinking is plagued when dealing with models having very large parameter spaces. In this case it makes more sense to consider "off-the-grid" approaches, using a continuous parameter space. This type of approach is the one favoured when training multi-layer perceptrons, and is also becoming popular to solve super-resolution problems in imaging. Training these off-the-grid models with a sparsity inducing prior can be achieved by solving a convex optimization problem over the space of measures, which is often called the Beurling Lasso (Blasso), and is the continuous counterpart of the celebrated Lasso parameter selection method. In previous works [41, 19], the degrees of freedom for the Lasso was shown to coincide with the size of the smallest solution support. Our main contribution is a proof of a continuous counterpart to this result for the Blasso. While in dimension d, each of the k nonzero recovered atom in the recovered measure carries over d 1 parameters ( d for the position and 1 for the weight), a surprising implication of our new formula it that the degrees of freedom for these off-the-grid models is in general strictly smaller ( d 1)k . Our findings thus suggest that discretized methods actually vastly overestimate the number of intrinsic continuous degrees of freedom. Our second contribution is a detailed study of the case of sampling Fourier coefficients in 1D, which corresponds to a super-resolution problem. We show that our formula for the degrees of freedom is valid outside of a set of measure zero of observations, which in turn justifies its use to compute an unbiased estimator of the prediction risk using the Stein Unbiased Risk Estimator (SURE). We also report numerical results for both the case of Fourier sampling and the learning of a multilayers perceptron with a single hidden layer.
DZip: improved general-purpose lossless compression based on novel neural network modeling
Goyal, Mohit, Tatwawadi, Kedar, Chandak, Shubham, Ochoa, Idoia
We consider lossless compression based on statistical data modeling followed by prediction-based encoding, where an accurate statistical model for the input data leads to substantial improvements in compression. We propose DZip, a general-purpose compressor for sequential data that exploits the well-known modeling capabilities of neural networks (NNs) for prediction, followed by arithmetic coding. Dzip uses a novel hybrid architecture based on adaptive and semi-adaptive training. Unlike most NN based compressors, DZip does not require additional training data and is not restricted to specific data types, only needing the alphabet size of the input data. The proposed compressor outperforms general-purpose compressors such as Gzip (on average 26% reduction) on a variety of real datasets, achieves near-optimal compression on synthetic datasets, and performs close to specialized compressors for large sequence lengths, without any human input. The main limitation of DZip in its current implementation is the encoding/decoding time, which limits its practicality. Nevertheless, the results showcase the potential of developing improved general-purpose compressors based on neural networks and hybrid modeling.