Systems & Languages
Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity
Functional brain networks are well described and estimated from data with Gaussian Graphical Models (GGMs), e.g.\ using sparse inverse covariance estimators. Comparing functional connectivity of subjects in two populations calls for comparing these estimated GGMs. Our goal is to identify differences in GGMs known to have similar structure. We characterize the uncertainty of differences with confidence intervals obtained using a parametric distribution on parameters of a sparse estimator. Sparse penalties enable statistical guarantees and interpretable models even in high-dimensional and low-sample settings. Characterizing the distributions of sparse models is inherently challenging as the penalties produce a biased estimator.
Stochastic Variational Deep Kernel Learning
Deep kernel learning combines the non-parametric flexibility of kernel methods with the inductive biases of deep learning architectures. We propose a novel deep kernel learning model and stochastic variational inference procedure which generalizes deep kernel learning approaches to enable classification, multi-task learning, additive covariance structures, and stochastic gradient training. Specifically, we apply additive base kernels to subsets of output features from deep neural architectures, and jointly learn the parameters of the base kernels and deep network through a Gaussian process marginal likelihood objective. Within this framework, we derive an efficient form of stochastic variational inference which leverages local kernel interpolation, inducing points, and structure exploiting algebra. We show improved performance over stand alone deep networks, SVMs, and state of the art scalable Gaussian processes on several classification benchmarks, including an airline delay dataset containing 6 million training points, CIFAR, and ImageNet.
Adapting Neural Architectures Between Domains (Supplementary Material) Yanxi Li1
This supplementary material consists of three parts, including the proofs of all lemmas, theorems and corollaries (Section A), details of the experiment setting (Section B) and some additional experiment results (Section C). A.1 Proof of Lemma 1 Lemma 1. [2] Let R be a representation function R: X Z, and D A.2 Proof of Theorem 2 Theorem 2. Let m be the size of Ũ By taking union bound of Eq. 7 over all h H By combining Theorem 2 and Lemma 3, we can derive the proof of Corollary 4. Let Ũ Finally, by applying the bound between the expected domain distance with the empirical domain distance according to [6], we can have Eq. B.1 NAS Search Space Following many previous works [3, 5, 7, 9, 10], we use the NASNet search space [10]. There are 2 kinds of cells in the search space, including normal cells and reduction cells. Normal cells use stride 1 and maintain the size of feature maps.
EZNAS: Evolving Zero-Cost Proxies For Neural Architecture Scoring J. Pablo Muñoz
Neural Architecture Search (NAS) has significantly improved productivity in the design and deployment of neural networks (NN). As NAS typically evaluates multiple models by training them partially or completely, the improved productivity comes at the cost of significant carbon footprint. To alleviate this expensive training routine, zero-shot/cost proxies analyze an NN at initialization to generate a score, which correlates highly with its true accuracy. Zero-cost proxies are currently designed by experts conducting multiple cycles of empirical testing on possible algorithms, datasets, and neural architecture design spaces. This experimentation lowers productivity and is an unsustainable approach towards zero-cost proxy design as deep learning use-cases diversify in nature. Additionally, existing zerocost proxies fail to generalize across neural architecture design spaces. In this paper, we propose a genetic programming framework to automate the discovery of zero-cost proxies for neural architecture scoring. Our methodology efficiently discovers an interpretable and generalizable zero-cost proxy that gives state of the art score-accuracy correlation on all datasets and search spaces of NASBench-201 and Network Design Spaces (NDS). We believe that this research indicates a promising direction towards automatically discovering zero-cost proxies that can work across network architecture design spaces, datasets, and tasks.
BayesPCN: A Continually Learnable Predictive Coding Associative Memory
Associative memory plays an important role in human intelligence and its mechanisms have been linked to attention in machine learning. While the machine learning community's interest in associative memories has recently been rekindled, most work has focused on memory recall (read) over memory learning (write). In this paper, we present BayesPCN, a hierarchical associative memory capable of performing continual one-shot memory writes without meta-learning. Moreover, BayesPCN is able to gradually forget past observations (f orget) to free its memory. Experiments show that BayesPCN can recall corrupted i.i.d.
Estimating graphical models for count data with applications to single-cell gene network
Graphical models such as Gaussian graphical models have been widely applied for direct interaction inference in many different areas. In many modern applications, such as single-cell RNA sequencing (scRNA-seq) studies, the observed data are counts and often contain many small counts. Traditional graphical models for continuous data are inappropriate for network inference of count data. We consider the Poisson log-normal (PLN) graphical model for count data and the precision matrix of the latent normal distribution represents the network. We propose a twostep method PLNet to estimate the precision matrix. PLNet first estimates the latent covariance matrix using the maximum marginal likelihood estimator (MMLE) and then estimates the precision matrix by minimizing the lasso-penalized D-trace loss function. We establish the convergence rate of the MMLE of the covariance matrix and further establish the convergence rate and the sign consistency of the proposed PLNet estimator of the precision matrix in the high dimensional setting. Importantly, although the PLN model is not sub-Gaussian, we show that the PLNet estimator is consistent even if the model dimension goes to infinity exponentially as the sample size increases. The performance of PLNet is evaluated and compared with available methods using simulation and gene regulatory network analysis of real scRNA-seq data.
The Multiple Quantile Graphical Model
Alnur Ali, J. Zico Kolter, Ryan J. Tibshirani
We introduce the Multiple Quantile Graphical Model (MQGM), which extends the neighborhood selection approach of Meinshausen and Bühlmann for learning sparse graphical models. The latter is defined by the basic subproblem of modeling the conditional mean of one variable as a sparse function of all others. Our approach models a set of conditional quantiles of one variable as a sparse function of all others, and hence offers a much richer, more expressive class of conditional distribution estimates. We establish that, under suitable regularity conditions, the MQGM identifies the exact conditional independencies with probability tending to one as the problem size grows, even outside of the usual homoskedastic Gaussian data model. We develop an efficient algorithm for fitting the MQGM using the alternating direction method of multipliers. We also describe a strategy for sampling from the joint distribution that underlies the MQGM estimate. Lastly, we present detailed experiments that demonstrate the flexibility and effectiveness of the MQGM in modeling hetereoskedastic non-Gaussian data.
Hierarchical Neural Architecture Search for Deep Stereo Matching - Supplementary Materials
In this supplemental material, we briefly introduce three widely used stereo matching benchmarks, provide details of the separate-search ( 3.3 of the main manuscript) of the Feature Net and the Matching Net, and show more qualitative results of our method on various datasets and screenshots of benchmarks. KITTI 2012 and 2015 datasets These two datasets are both real-world datasets collected from a driving car. KITTI 2012 contains 194 training image pairs and 195 test image pairs. KITTI 2015 contains 200 stereo pairs for training and 200 for testing. The typical resolution of KITTI images is 376 1240. For KITTI 2012, the semi-dense ground truth disparity maps are generated by Velodyne HDL64E LiDARs, while for KITTI 2015, 3D CAD models for cars are manually inserted [1].
Unsupervised Graph Neural Architecture Search with Disentangled Self-supervision
The existing graph neural architecture search (GNAS) methods heavily rely on supervised labels during the search process, failing to handle ubiquitous scenarios where supervisions are not available. In this paper, we study the problem of unsupervised graph neural architecture search, which remains unexplored in the literature. The key problem is to discover the latent graph factors that drive the formation of graph data as well as the underlying relations between the factors and the optimal neural architectures. Handling this problem is challenging given that the latent graph factors together with architectures are highly entangled due to the nature of the graph and the complexity of the neural architecture search process. To address the challenge, we propose a novel Disentangled Self-supervised Graph Neural Architecture Search (DSGAS) model, which is able to discover the optimal architectures capturing various latent graph factors in a self-supervised fashion based on unlabeled graph data. Specifically, we first design a disentangled graph super-network capable of incorporating multiple architectures with factor-wise disentanglement, which are optimized simultaneously. Then, we estimate the performance of architectures under different factors by our proposed self-supervised training with joint architecture-graph disentanglement. Finally, we propose a contrastive search with architecture augmentations to discover architectures with factor-specific expertise. Extensive experiments on 11 real-world datasets demonstrate that the proposed DSGAS model is able to achieve state-ofthe-art performance against several baseline methods in an unsupervised manner.
A Related work (expanded)
GNNs Recently, GNNs [Gilmer et al., 2017, Scarselli et al., 2009] emerged as the most prominent graph representation learning architecture. Notable instances of this architecture include, e.g., [Duvenaud et al., 2015, Hamilton et al., 2017, Veličković et al., 2018], which can be subsumed under the message-passing framework introduced in [Gilmer et al., 2017]. Surveys of recent advancements in GNN techniques can be found, e.g., in Chami et al. [2020], Wu et al. [2019], Zhou et al. [2018]. Specifically, [Morris et al., 2019, Xu et al., 2019] showed that the expressive power of any possible GNN architecture is limited by the 1-WL in terms of distinguishing non-isomorphic graphs. Triggered by the above results, a large set of papers proposed architectures to overcome the expressivity limitations of 1-WL. Morris et al. [2019] introduced k-dimensional GNNs which rely on a message-passing scheme between subgraphs of cardinality k. Similar to [Morris et al., 2017], the paper employed a local, set-based (neural) variant of the 1-WL. Later, this was refined in [Azizian and Lelarge, 2020, Maron et al., 2019] by introducing k-order folklore graph neural networks, which are equivalent to the folklore or oblivious variant of the k-WL [Grohe, 2021, Morris et al., 2021] in terms of distinguishing non-isomorphic graphs. Subsequently, Morris et al. [2020b] introduced neural architectures based on a local version of the k-WL, which only considers a subset of the original neighborhood, taking sparsity of the underlying graph (to some extent) into account. Chen et al. [2019b] connected the theory of universal approximations of permutation-invariant functions and the graph isomorphism viewpoint and introduced a variation of the 2-WL. Geerts and Reutter [2022] introduced a higher-order message-passing framework that allows us to obtain upper bounds of extension of GNNs in terms of k-WL. See Morris et al. [2021] for an in-depth survey on this topic.