Goto

Collaborating Authors

 Wu, Qitian


Diffusion-nested Auto-Regressive Synthesis of Heterogeneous Tabular Data

arXiv.org Artificial Intelligence

Autoregressive models are predominant in natural language generation, while their application in tabular data remains underexplored. We posit that this can be attributed to two factors: 1) tabular data contains heterogeneous data type, while the autoregressive model is primarily designed to model discrete-valued data; 2) tabular data is column permutation-invariant, requiring a generation model to generate columns in arbitrary order. DAR) to address these issues. DAR employs a diffusion model to parameterize the conditional distribution of continuous features. DAR resorts to masked transformers with bi-directional attention, which simulate various permutations of column order, hence enabling it to learn the conditional distribution of a target column given an arbitrary combination of other columns. DAR to not only freely handle heterogeneous tabular data but also support convenient and flexible unconditional/conditional sampling. DAR outperforms previous state-of-the-art methods by 18% to 45% on eight metrics across three distinct aspects. The code is available at https://github.com/fangliancheng/TabDAR. Today is a good day! Figure 1: Challenges in Auto-Regressive tabular data generation. Due to the widespread application of synthetic tabular data in real-world scenarios, such as data augmentation, privacy protection, and missing value prediction (Fonseca & Bacao, 2023; Assefa et al., 2021; Hernandez et al., 2022), an increasing number of studies have begun to focus on deep generative models for synthetic tabular data generation. In this domain, various approaches, including Variational Autoencoders (VAEs)(Liu et al., 2023), Generative Adversarial Networks (GANs)(Xu et al., 2019), Diffusion Models (Zhang et al., 2024b), and even Large Language Models (LLMs)(Borisov et al., 2023), have demonstrated significant progress.


GeoMix: Towards Geometry-Aware Data Augmentation

arXiv.org Artificial Intelligence

Mixup has shown considerable success in mitigating the challenges posed by limited labeled data in image classification. By synthesizing samples through the interpolation of features and labels, Mixup effectively addresses the issue of data scarcity. However, it has rarely been explored in graph learning tasks due to the irregularity and connectivity of graph data. Specifically, in node classification tasks, Mixup presents a challenge in creating connections for synthetic data. In this paper, we propose Geometric Mixup (GeoMix), a simple and interpretable Mixup approach leveraging in-place graph editing. It effectively utilizes geometry information to interpolate features and labels with those from the nearby neighborhood, generating synthetic nodes and establishing connections for them. We conduct theoretical analysis to elucidate the rationale behind employing geometry information for node Mixup, emphasizing the significance of locality enhancement-a critical aspect of our method's design. Extensive experiments demonstrate that our lightweight Geometric Mixup achieves state-of-the-art results on a wide variety of standard datasets with limited labeled data. Furthermore, it significantly improves the generalization capability of underlying GNNs across various challenging out-of-distribution generalization tasks. Our code is available at https://github.com/WtaoZhao/geomix.


Learning Divergence Fields for Shift-Robust Graph Representations

arXiv.org Artificial Intelligence

Real-world data generation often involves certain geometries (e.g., graphs) that induce instance-level interdependence. This characteristic makes the generalization of learning models more difficult due to the intricate interdependent patterns that impact data-generative distributions and can vary from training to testing. In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging generalization problem with interdependent data. We generalize the diffusion equation with stochastic diffusivity at each time step, which aims to capture the multi-faceted information flows among interdependent data. Furthermore, we derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains. Regarding practical implementation, we introduce three model instantiations that can be considered as the generalized versions of GCN, GAT, and Transformers, respectively, which possess advanced robustness against distribution shifts. We demonstrate their promising efficacy for out-of-distribution generalization on diverse real-world datasets.


SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

arXiv.org Artificial Intelligence

Learning representations on large-sized graphs is a long-standing challenge due to the inter-dependence nature involved in massive data points. Transformers, as an emerging class of foundation encoders for graph-structured data, have shown promising performance on small graphs due to its global attention capable of capturing all-pair influence beyond neighboring nodes. Even so, existing approaches tend to inherit the spirit of Transformers in language and vision tasks, and embrace complicated models by stacking deep multi-head attentions. In this paper, we critically demonstrate that even using a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks where node numbers range from thousand-level to billion-level. This encourages us to rethink the design philosophy for Transformers on large graphs, where the global attention is a computation overhead hindering the scalability. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model that can efficiently propagate information among arbitrary nodes in one layer. SGFormer requires none of positional encodings, feature/graph pre-processing or augmented loss. Empirically, SGFormer successfully scales to the web-scale graph ogbn-papers100M and yields up to 141x inference acceleration over SOTA Transformers on medium-sized graphs. Beyond current results, we believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.


Unleashing the Power of Graph Data Augmentation on Covariate Distribution Shift

arXiv.org Artificial Intelligence

The issue of distribution shifts is emerging as a critical concern in graph representation learning. From the perspective of invariant learning and stable learning, a recently well-established paradigm for out-of-distribution generalization, stable features of the graph are assumed to causally determine labels, while environmental features tend to be unstable and can lead to the two primary types of distribution shifts. The correlation shift is often caused by the spurious correlation between environmental features and labels that differs between the training and test data; the covariate shift often stems from the presence of new environmental features in test data. However, most strategies, such as invariant learning or graph augmentation, typically struggle with limited training environments or perturbed stable features, thus exposing limitations in handling the problem of covariate shift. To address this challenge, we propose a simple-yet-effective data augmentation strategy, Adversarial Invariant Augmentation (AIA), to handle the covariate shift on graphs. Specifically, given the training data, AIA aims to extrapolate and generate new environments, while concurrently preserving the original stable features during the augmentation process. Such a design equips the graph classification model with an enhanced capability to identify stable features in new environments, thereby effectively tackling the covariate shift in data. Extensive experiments with in-depth empirical analysis demonstrate the superiority of our approach. The implementation codes are publicly available at https://github.com/yongduosui/AIA.


TDeLTA: A Light-weight and Robust Table Detection Method based on Learning Text Arrangement

arXiv.org Artificial Intelligence

The diversity of tables makes table detection a great challenge, leading to existing models becoming more tedious and complex. Despite achieving high performance, they often overfit to the table style in training set, and suffer from significant performance degradation when encountering out-of-distribution tables in other domains. To tackle this problem, we start from the essence of the table, which is a set of text arranged in rows and columns. Based on this, we propose a novel, light-weighted and robust Table Detection method based on Learning Text Arrangement, namely TDeLTA. TDeLTA takes the text blocks as input, and then models the arrangement of them with a sequential encoder and an attention module. To locate the tables precisely, we design a text-classification task, classifying the text blocks into 4 categories according to their semantic roles in the tables. Experiments are conducted on both the text blocks parsed from PDF and extracted by open-source OCR tools, respectively. Compared to several state-of-the-art methods, TDeLTA achieves competitive results with only 3.1M model parameters on the large-scale public datasets. Moreover, when faced with the cross-domain data under the 0-shot setting, TDeLTA outperforms baselines by a large margin of nearly 7%, which shows the strong robustness and transferability of the proposed model.


Advective Diffusion Transformers for Topological Generalization in Graph Learning

arXiv.org Artificial Intelligence

Graph diffusion equations are intimately related to graph neural networks (GNNs) and have recently attracted attention as a principled framework for analyzing GNN dynamics, formalizing their expressive power, and justifying architectural choices. One key open questions in graph learning is the generalization capabilities of GNNs. A major limitation of current approaches hinges on the assumption that the graph topologies in the training and test sets come from the same distribution. In this paper, we make steps towards understanding the generalization of GNNs by exploring how graph diffusion equations extrapolate and generalize in the presence of varying graph topologies. We first show deficiencies in the generalization capability of existing models built upon local diffusion on graphs, stemming from the exponential sensitivity to topology variation. Our subsequent analysis reveals the promise of non-local diffusion, which advocates for feature propagation over fully-connected latent graphs, under the assumption of a specific data-generating condition. In addition to these findings, we propose a novel graph encoder backbone, Advective Diffusion Transformer (ADiT), inspired by advective graph diffusion equations that have a closed-form solution backed up with theoretical guarantees of desired generalization under topological distribution shifts. The new model, functioning as a versatile graph Transformer, demonstrates superior performance across a wide range of graph learning tasks.


How Graph Neural Networks Learn: Lessons from Training Dynamics in Function Space

arXiv.org Artificial Intelligence

A long-standing goal in deep learning has been to characterize the learning behavior of black-box models in a more interpretable manner. For graph neural networks (GNNs), considerable advances have been made in formalizing what functions they can represent, however it remains less clear whether and how GNNs learn desired functions during the optimization process. To fill this critical gap, we study the learning dynamics of GNNs in function space via the analytic framework of overparameterization. In particular, we find that the seemingly complicated training process of GNNs can be re-cast into a more familiar label propagation framework, due to the graph inductive bias implicit in this process. From this vantage point, we provide explanations for why the learned GNN functions successfully generalize and for their pathological behavior on heterophilic graphs, which are consistent with observations. Practically, sparsifying and implementing the learning dynamics lead to a minimalist semi-supervised learning algorithm with the efficiency of classic algorithms and the effectiveness of modern GNNs.


Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs

arXiv.org Artificial Intelligence

Graph neural networks (GNNs), as the de-facto model class for representation learning on graphs, are built upon the multi-layer perceptrons (MLP) architecture with additional message passing layers to allow features to flow across nodes. While conventional wisdom commonly attributes the success of GNNs to their advanced expressivity, we conjecture that this is not the main cause of GNNs' superiority in node-level prediction tasks. This paper pinpoints the major source of GNNs' performance gain to their intrinsic generalization capability, by introducing an intermediate model class dubbed as P(ropagational)MLP, which is identical to standard MLP in training, but then adopts GNN's architecture in testing. Intriguingly, we observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training. This finding sheds new insights into understanding the learning behavior of GNNs, and can be used as an analytic tool for dissecting various GNN-related research problems. As an initial step to analyze the inherent generalizability of GNNs, we show the essential difference between MLP and PMLP at infinite-width limit lies in the NTK feature map in the post-training stage. Moreover, by examining their extrapolation behavior, we find that though many GNNs and their PMLP counterparts cannot extrapolate non-linear functions for extremely out-of-distribution samples, they have greater potential to generalize to testing samples near the training data range as natural advantages of GNN architectures.


GraphGLOW: Universal and Generalizable Structure Learning for Graph Neural Networks

arXiv.org Artificial Intelligence

Graph structure learning is a well-established problem that aims at optimizing graph structures adaptive to specific graph datasets to help message passing neural networks (i.e., GNNs) to yield effective and robust node embeddings. However, the common limitation of existing models lies in the underlying \textit{closed-world assumption}: the testing graph is the same as the training graph. This premise requires independently training the structure learning model from scratch for each graph dataset, which leads to prohibitive computation costs and potential risks for serious over-fitting. To mitigate these issues, this paper explores a new direction that moves forward to learn a universal structure learning model that can generalize across graph datasets in an open world. We first introduce the mathematical definition of this novel problem setting, and describe the model formulation from a probabilistic data-generative aspect. Then we devise a general framework that coordinates a single graph-shared structure learner and multiple graph-specific GNNs to capture the generalizable patterns of optimal message-passing topology across datasets. The well-trained structure learner can directly produce adaptive structures for unseen target graphs without any fine-tuning. Across diverse datasets and various challenging cross-graph generalization protocols, our experiments show that even without training on target graphs, the proposed model i) significantly outperforms expressive GNNs trained on input (non-optimized) topology, and ii) surprisingly performs on par with state-of-the-art models that independently optimize adaptive structures for specific target graphs, with notably orders-of-magnitude acceleration for training on the target graph.