Cao, Kaidi
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
Wu, Shirley, Zhao, Shiyu, Yasunaga, Michihiro, Huang, Kexin, Cao, Kaidi, Huang, Qian, Ioannidis, Vassilis N., Subbian, Karthik, Zou, James, Leskovec, Jure
Answering real-world complex queries, such as complex product search, often requires accurate retrieval from semi-structured knowledge bases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, previous works have mostly studied textual and relational retrieval tasks as separate topics. To address the gap, we develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Relational K nowledge Bases. Our benchmark covers three domains/datasets: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties, together with their ground-truth answers (items). We conduct rigorous human evaluation to validate the quality of our synthesized queries. We further enhance the benchmark with high-quality human-generated queries to provide an authentic reference. STARK serves as a comprehensive testbed for evaluating the performance of retrieval systems driven by large language models (LLMs). Our experiments suggest that STARK presents significant challenges to the current retrieval and LLM systems, indicating the demand for building more capable retrieval systems. The benchmark data and code are available on https://github.com/snap-stanford/stark.
PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning
Hu, Weihua, Yuan, Yiwen, Zhang, Zecheng, Nitta, Akihiro, Cao, Kaidi, Kocijan, Vid, Leskovec, Jure, Fey, Matthias
We present PyTorch Frame, a PyTorch-based framework for deep learning over multi-modal tabular data. PyTorch Frame makes tabular deep learning easy by providing a PyTorch-based data structure to handle complex tabular data, introducing a model abstraction to enable modular implementation of tabular models, and allowing external foundation models to be incorporated to handle complex columns (e.g., LLMs for text columns). We demonstrate the usefulness of PyTorch Frame by implementing diverse tabular models in a modular way, successfully applying these models to complex multi-modal tabular data, and integrating our framework with PyTorch Geometric, a PyTorch library for Graph Neural Networks (GNNs), to perform end-to-end learning over relational databases.
GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts
Wu, Shirley, Cao, Kaidi, Ribeiro, Bruno, Zou, James, Leskovec, Jure
Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to complex non-synthetic distributional shifts naturally occurring in the real world. Here we develop GraphMETRO, a Graph Neural Network architecture, that reliably models natural diversity and captures complex distributional shifts. GraphMETRO employs a Mixture-of-Experts (MoE) architecture with a gating model and multiple expert models, where each expert model targets a specific distributional shift to produce a shift-invariant representation, and the gating model identifies shift components. Additionally, we design a novel objective that aligns the representations from different expert models to ensure smooth optimization. GraphMETRO achieves state-of-the-art results on four datasets from GOOD benchmark comprised of complex and natural real-world distribution shifts, improving by 67% and 4.2% on WebKB and Twitch datasets.
TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs
Phothilimthana, Phitchaya Mangpo, Abu-El-Haija, Sami, Cao, Kaidi, Fatemi, Bahare, Burrows, Mike, Mendis, Charith, Perozzi, Bryan
Precise hardware performance models play a crucial role in code optimizations. They can assist compilers in making heuristic decisions or aid autotuners in identifying the optimal configuration for a given program. For example, the autotuner for XLA, a machine learning compiler, discovered 10-20% speedup on state-of-the-art models serving substantial production traffic at Google. Although there exist a few datasets for program performance prediction, they target small sub-programs such as basic blocks or kernels. This paper introduces TpuGraphs, a performance prediction dataset on full tensor programs, represented as computational graphs, running on Tensor Processing Units (TPUs). Each graph in the dataset represents the main computation of a machine learning workload, e.g., a training epoch or an inference step. Each data sample contains a computational graph, a compilation configuration, and the execution time of the graph when compiled with the configuration. The graphs in the dataset are collected from open-source machine learning programs, featuring popular model architectures, e.g., ResNet, EfficientNet, Mask R-CNN, and Transformer. TpuGraphs provides 25x more graphs than the largest graph property prediction dataset (with comparable graph sizes), and 770x larger graphs on average compared to existing performance prediction datasets on machine learning programs. This graph-level prediction task on large graphs introduces new challenges in learning, ranging from scalability, training efficiency, to model quality.
Learning Large Graph Property Prediction via Graph Segment Training
Cao, Kaidi, Phothilimthana, Phitchaya Mangpo, Abu-El-Haija, Sami, Zelle, Dustin, Zhou, Yanqi, Mendis, Charith, Leskovec, Jure, Perozzi, Bryan
Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint. GST first divides a large graph into segments and then backpropagates through only a few segments sampled per training iteration. We refine the GST paradigm by introducing a historical embedding table to efficiently obtain embeddings for segments not sampled for backpropagation. To mitigate the staleness of historical embeddings, we design two novel techniques. First, we finetune the prediction head to fix the input distribution shift. Second, we introduce Stale Embedding Dropout to drop some stale embeddings during training to reduce bias. We evaluate our complete method GST-EFD (with all the techniques together) on two large graph property prediction benchmarks: MalNet and TpuGraphs. Our experiments show that GST-EFD is both memory-efficient and fast, while offering a slight boost on test accuracy over a typical full graph training regime.
TuneUp: A Simple Improved Training Strategy for Graph Neural Networks
Hu, Weihua, Cao, Kaidi, Huang, Kexin, Huang, Edward W, Subbian, Karthik, Kawaguchi, Kenji, Leskovec, Jure
Despite recent advances in Graph Neural Networks (GNNs), their training strategies remain largely under-explored. The conventional training strategy learns over all nodes in the original graph(s) equally, which can be sub-optimal as certain nodes are often more difficult to learn than others. Here we present TuneUp, a simple curriculum-based training strategy for improving the predictive performance of GNNs. TuneUp trains a GNN in two stages. In the first stage, TuneUp applies conventional training to obtain a strong base GNN. The base GNN tends to perform well on head nodes (nodes with large degrees) but less so on tail nodes (nodes with small degrees). Therefore, the second stage of TuneUp focuses on improving prediction on the difficult tail nodes by further training the base GNN on synthetically generated tail node data. We theoretically analyze TuneUp and show it provably improves generalization performance on tail nodes. TuneUp is simple to implement and applicable to a broad range of GNN architectures and prediction tasks. Extensive evaluation of TuneUp on five diverse GNN architectures, three types of prediction tasks, and both transductive and inductive settings shows that TuneUp significantly improves the performance of the base GNN on tail nodes, while often even improving the performance on head nodes. Altogether, TuneUp produces up to 57.6% and 92.2% relative predictive performance improvement in the transductive and the challenging inductive settings, respectively.
Communication-Free Distributed GNN Training with Vertex Cut
Cao, Kaidi, Deng, Rui, Wu, Shirley, Huang, Edward W, Subbian, Karthik, Leskovec, Jure
Training Graph Neural Networks (GNNs) on real-world graphs consisting of billions of nodes and edges is quite challenging, primarily due to the substantial memory needed to store the graph and its intermediate node and edge features, and there is a pressing need to speed up the training process. A common approach to achieve speed up is to divide the graph into many smaller subgraphs, which are then distributed across multiple GPUs in one or more machines and processed in parallel. However, existing distributed methods require frequent and substantial cross-GPU communication, leading to significant time overhead and progressively diminishing scalability. Here, we introduce CoFree-GNN, a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. The framework utilizes a Vertex Cut partitioning, i.e., rather than partitioning the graph by cutting the edges between partitions, the Vertex Cut partitions the edges and duplicates the node information to preserve the graph structure. Furthermore, the framework maintains high model accuracy by incorporating a reweighting mechanism to handle a distorted graph distribution that arises from the duplicated nodes. We also propose a modified DropEdge technique to further speed up the training process. Using an extensive set of experiments on real-world networks, we demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
AutoTransfer: AutoML with Knowledge Transfer -- An Application to Graph Neural Networks
Cao, Kaidi, You, Jiaxuan, Liu, Jiaju, Leskovec, Jure
AutoML has demonstrated remarkable success in finding an effective neural architecture for a given machine learning task defined by a specific dataset and an evaluation metric. However, most present AutoML techniques consider each task independently from scratch, which requires exploring many architectures, leading to high computational cost. Here we propose AutoTransfer, an AutoML solution that improves search efficiency by transferring the prior architectural design knowledge to the novel task of interest. Our key innovation includes a task-model bank that captures the model performance over a diverse set of GNN architectures and tasks, and a computationally efficient task embedding that can accurately measure the similarity among different tasks. Based on the task-model bank and the task embeddings, we estimate the design priors of desirable models of the novel task, by aggregating a similarity-weighted sum of the top-K design distributions on tasks that are similar to the task of interest. The computed design priors can be used with any AutoML search algorithm. We evaluate AutoTransfer on six datasets in the graph machine learning domain. Experiments demonstrate that (i) our proposed task embedding can be computed efficiently, and that tasks with similar embeddings have similar best-performing architectures; (ii) AutoTransfer significantly improves search efficiency with the transferred design priors, reducing the number of explored architectures by an order of magnitude. Finally, we release GNN-Bank-101, a large-scale dataset of detailed GNN training information of 120,000 task-model combinations to facilitate and inspire future research.
Relational Multi-Task Learning: Modeling Relations between Data and Tasks
Cao, Kaidi, You, Jiaxuan, Leskovec, Jure
A key assumption in multi-task learning is that at the inference time the multi-task model only has access to a given data point but not to the data point's labels from other tasks. This presents an opportunity to extend multi-task learning to utilize data point's labels from other auxiliary tasks, and this way improves performance on the new task. Here we introduce a novel relational multi-task learning setting where we leverage data point labels from auxiliary tasks to make more accurate predictions on the new task. We develop MetaLink, where our key innovation is to build a knowledge graph that connects data points and tasks and thus allows us to leverage labels from auxiliary tasks. The knowledge graph consists of two types of nodes: (1) data nodes, where node features are data embeddings computed by the neural network, and (2) task nodes, with the last layer's weights for each task as node features. The edges in this knowledge graph capture data-task relationships, and the edge label captures the label of a data point on a particular task. Under MetaLink, we reformulate the new task as a link label prediction problem between a data node and a task node. The MetaLink framework provides flexibility to model knowledge transfer from auxiliary task labels to the task of interest. We evaluate MetaLink on 6 benchmark datasets in both biochemical and vision domains. Experiments demonstrate that MetaLink can successfully utilize the relations among different tasks, outperforming the state-of-the-art methods under the proposed relational multi-task learning setting, with up to 27% improvement in ROC AUC.
Learning Backward Compatible Embeddings
Hu, Weihua, Bansal, Rajas, Cao, Kaidi, Rao, Nikhil, Subbian, Karthik, Leskovec, Jure
Embeddings, low-dimensional vector representation of objects, are fundamental in building modern machine learning systems. In industrial settings, there is usually an embedding team that trains an embedding model to solve intended tasks (e.g., product recommendation). The produced embeddings are then widely consumed by consumer teams to solve their unintended tasks (e.g., fraud detection). However, as the embedding model gets updated and retrained to improve performance on the intended task, the newly-generated embeddings are no longer compatible with the existing consumer models. This means that historical versions of the embeddings can never be retired or all consumer teams have to retrain their models to make them compatible with the latest version of the embeddings, both of which are extremely costly in practice. Here we study the problem of embedding version updates and their backward compatibility. We formalize the problem where the goal is for the embedding team to keep updating the embedding version, while the consumer teams do not have to retrain their models. We develop a solution based on learning backward compatible embeddings, which allows the embedding model version to be updated frequently, while also allowing the latest version of the embedding to be quickly transformed into any backward compatible historical version of it, so that consumer teams do not have to retrain their models. Under our framework, we explore six methods and systematically evaluate them on a real-world recommender system application. We show that the best method, which we call BC-Aligner, maintains backward compatibility with existing unintended tasks even after multiple model version updates. Simultaneously, BC-Aligner achieves the intended task performance similar to the embedding model that is solely optimized for the intended task.