Lim, Derek
Graph Inductive Biases in Transformers without Message Passing
Ma, Liheng, Lin, Chen, Lim, Derek, Romero-Soriano, Adriana, Dokania, Puneet K., Coates, Mark, Torr, Philip, Lim, Ser-Nam
Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) -- a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive -- it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.
Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods
Lim, Derek, Hohne, Felix, Li, Xiuyu, Huang, Sijia Linda, Gupta, Vaishnavi, Bhalerao, Omkar, Lim, Ser-Nam
Many widely used datasets for graph machine learning tasks have generally been homophilous, where nodes with similar labels connect to each other. Recently, new Graph Neural Networks (GNNs) have been developed that move beyond the homophily regime; however, their evaluation has often been conducted on small graphs with limited application domains. We collect and introduce diverse non-homophilous datasets from a variety of application areas that have up to 384x more nodes and 1398x more edges than prior datasets. We further show that existing scalable graph learning and graph minibatching techniques lead to performance degradation on these non-homophilous datasets, thus highlighting the need for further work on scalable non-homophilous methods. To address these concerns, we introduce LINKX -- a strong simple method that admits straightforward minibatch training and inference. Extensive experimental results with representative simple methods and GNNs across our proposed datasets show that LINKX achieves state-of-the-art performance for learning on non-homophilous graphs. Our codes and data are available at https://github.com/CUAI/Non-Homophily-Large-Scale.
Counting Substructures with Higher-Order Graph Neural Networks: Possibility and Impossibility Results
Tahmasebi, Behrooz, Lim, Derek, Jegelka, Stefanie
While message passing Graph Neural Networks (GNNs) have become increasingly popular architectures for learning with graphs, recent works have revealed important shortcomings in their expressive power. In response, several higher-order GNNs have been proposed that substantially increase the expressive power, albeit at a large computational cost. Motivated by this gap, we explore alternative strategies and lower bounds. In particular, we analyze a new recursive pooling technique of local neighborhoods that allows different tradeoffs of computational cost and expressive power. First, we prove that this model can count subgraphs of size $k$, and thereby overcomes a known limitation of low-order GNNs. Second, we show how recursive pooling can exploit sparsity to reduce the computational complexity compared to the existing higher-order GNNs. More generally, we provide a (near) matching information-theoretic lower bound for counting subgraphs with graph representations that pool over representations of derived (sub-)graphs. We also discuss lower bounds on time complexity.
Equivariant Subgraph Aggregation Networks
Bevilacqua, Beatrice, Frasca, Fabrizio, Lim, Derek, Srinivasan, Balasubramaniam, Cai, Chen, Balamurugan, Gopinath, Bronstein, Michael M., Maron, Haggai
Message-passing neural networks (MPNNs) are the leading architecture for deep learning on graph-structured data, in large part due to their simplicity and scalability. Unfortunately, it was shown that these architectures are limited in their expressive power. This paper proposes a novel framework called Equivariant Subgraph Aggregation Networks (ESAN) to address this issue. Our main observation is that while two graphs may not be distinguishable by an MPNN, they often contain distinguishable subgraphs. Thus, we propose to represent each graph as a set of subgraphs derived by some predefined policy, and to process it using a suitable equivariant architecture. We develop novel variants of the 1-dimensional Weisfeiler-Leman (1-WL) test for graph isomorphism, and prove lower bounds on the expressiveness of ESAN in terms of these new WL variants. We further prove that our approach increases the expressive power of both MPNNs and more expressive architectures. Moreover, we provide theoretical results that describe how design choices such as the subgraph selection policy and equivariant neural architecture affect our architecture's expressive power. To deal with the increased computational cost, we propose a subgraph sampling scheme, which can be viewed as a stochastic version of our framework. A comprehensive set of experiments on real and synthetic datasets demonstrates that our framework improves the expressive power and overall performance of popular GNN architectures.
Equivariant Manifold Flows
Katsman, Isay, Lou, Aaron, Lim, Derek, Jiang, Qingxuan, Lim, Ser-Nam, De Sa, Christopher
Tractably modelling distributions over manifolds has long been an important goal in the natural sciences. Recent work has focused on developing general machine learning models to learn such distributions. However, for many applications these distributions must respect manifold symmetries -- a trait which most previous models disregard. In this paper, we lay the theoretical foundations for learning symmetry-invariant distributions on arbitrary manifolds via equivariant manifold flows. We demonstrate the utility of our approach by using it to learn gauge invariant densities over $SU(n)$ in the context of quantum field theory.
Doubly Stochastic Subspace Clustering
Lim, Derek, Vidal, René, Haeffele, Benjamin D.
Many state-of-the-art subspace clustering methods follow a two-step process by first constructing an affinity matrix between data points and then applying spectral clustering to this affinity. Most of the research into these methods focuses on the first step of generating the affinity matrix, which often exploits the self-expressive property of linear subspaces, with little consideration typically given to the spectral clustering step that produces the final clustering. Moreover, existing methods obtain the affinity by applying ad-hoc postprocessing steps to the self-expressive representation of the data, and this postprocessing can have a significant impact on the subsequent spectral clustering step. In this work, we propose to unify these two steps by jointly learning both a self-expressive representation of the data and an affinity matrix that is well-normalized for spectral clustering. In the proposed model, we constrain the affinity matrix to be doubly stochastic, which results in a principled method for affinity matrix normalization while also exploiting the known benefits of doubly stochastic normalization in spectral clustering. While our proposed model is non-convex, we give a convex relaxation that is provably equivalent in many regimes; we also develop an efficient approximation to the full model that works well in practice. Experiments show that our method achieves state-of-the-art subspace clustering performance on many common datasets in computer vision.
Neural Manifold Ordinary Differential Equations
Lou, Aaron, Lim, Derek, Katsman, Isay, Huang, Leo, Jiang, Qingxuan, Lim, Ser-Nam, De Sa, Christopher
To better conform to data geometry, recent deep generative modelling techniques adapt Euclidean constructions to non-Euclidean spaces. In this paper, we study normalizing flows on manifolds. Previous work has developed flow models for specific cases; however, these advancements hand craft layers on a manifold-by-manifold basis, restricting generality and inducing cumbersome design constraints. We overcome these issues by introducing Neural Manifold Ordinary Differential Equations, a manifold generalization of Neural ODEs, which enables the construction of Manifold Continuous Normalizing Flows (MCNFs). MCNFs require only local geometry (therefore generalizing to arbitrary manifolds) and compute probabilities with continuous change of variables (allowing for a simple and expressive flow construction). We find that leveraging continuous manifold dynamics produces a marked improvement for both density estimation and downstream tasks.