Dong, Xiaowen
GNNs as Predictors of Agentic Workflow Performances
Zhang, Yuanshuo, Hou, Yuchen, Tang, Bohan, Chen, Shuo, Zhang, Muhan, Dong, Xiaowen, Chen, Siheng
Agentic workflows invoked by Large Language Models (LLMs) have achieved remarkable success in handling complex tasks. However, optimizing such workflows is costly and inefficient in real-world applications due to extensive invocations of LLMs. To fill this gap, this position paper formulates agentic workflows as computational graphs and advocates Graph Neural Networks (GNNs) as efficient predictors of agentic workflow performances, avoiding repeated LLM invocations for evaluation. To empirically ground this position, we construct FLORA-Bench, a unified platform for benchmarking GNNs for predicting agentic workflow performances. With extensive experiments, we arrive at the following conclusion: GNNs are simple yet effective predictors. This conclusion supports new applications of GNNs and a novel direction towards automating agentic workflow optimization. All codes, models, and data are available at https://github.com/youngsoul0731/Flora-Bench.
Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement
Liang, Huidong, Borde, Haitz Sรกez de Ocรกriz, Sripathmanathan, Baskaran, Bronstein, Michael, Dong, Xiaowen
Long-range dependencies are critical for effective graph representation learning, yet most existing datasets focus on small graphs tailored to inductive tasks, offering limited insight into long-range interactions. Current evaluations primarily compare models employing global attention (e.g., graph transformers) with those using local neighborhood aggregation (e.g., message-passing neural networks) without a direct measurement of long-range dependency. In this work, we introduce City-Networks, a novel large-scale transductive learning dataset derived from real-world city roads. This dataset features graphs with over $10^5$ nodes and significantly larger diameters than those in existing benchmarks, naturally embodying long-range information. We annotate the graphs using an eccentricity-based approach, ensuring that the classification task inherently requires information from distant nodes. Furthermore, we propose a model-agnostic measurement based on the Jacobians of neighbors from distant hops, offering a principled quantification of long-range dependencies. Finally, we provide theoretical justifications for both our dataset design and the proposed measurement - particularly by focusing on over-smoothing and influence score dilution - which establishes a robust foundation for further exploration of long-range interactions in graph neural networks.
Heterogeneous Graph Structure Learning through the Lens of Data-generating Processes
Jiang, Keyue, Tang, Bohan, Dong, Xiaowen, Toni, Laura
Inferring the graph structure from observed data is a key task in graph machine learning to capture the intrinsic relationship between data entities. While significant advancements have been made in learning the structure of homogeneous graphs, many real-world graphs exhibit heterogeneous patterns where nodes and edges have multiple types. This paper fills this gap by introducing the first approach for heterogeneous graph structure learning (HGSL). To this end, we first propose a novel statistical model for the data-generating process (DGP) of heterogeneous graph data, namely hidden Markov networks for heterogeneous graphs (H2MN). Then we formalize HGSL as a maximum a-posterior estimation problem parameterized by such DGP and derive an alternating optimization method to obtain a solution together with a theoretical justification of the optimization conditions. Finally, we conduct extensive experiments on both synthetic and real-world datasets to demonstrate that our proposed method excels in learning structure on heterogeneous graphs in terms of edge type identification and edge weight recovery.
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription
Gutteridge, Benjamin, Jackson, Matthew Thomas, Kukurin, Toni, Dong, Xiaowen
Handwritten text recognition (HTR) remains a challenging task, particularly for multi-page documents where pages share common formatting and contextual features. While modern optical character recognition (OCR) engines are proficient with printed text, their performance on handwriting is limited, often requiring costly labeled data for fine-tuning. In this paper, we explore the use of multi-modal large language models (MLLMs) for transcribing multi-page handwritten documents in a zero-shot setting. We investigate various configurations of commercial OCR engines and MLLMs, utilizing the latter both as end-to-end transcribers and as post-processors, with and without image components. We propose a novel method, + FIRST PAGE, which enhances MLLM transcription by providing the OCR output of the entire document along with just the first page image . This approach leverages shared document features without incurring the high cost of processing all images. Experiments on a multi-page version of the IAM Handwriting Database demonstrate that + FIRST PAGE improves transcription accuracy, balances cost with performance, and even enhances results on out-of-sample text by extrapolating formatting and OCR error patterns from a single page. Introduction A significant proportion of all human-written text exists only in the form of physical handwritten documents.
On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning
Arroyo, รlvaro, Gravina, Alessio, Gutteridge, Benjamin, Barbero, Federico, Gallicchio, Claudio, Dong, Xiaowen, Bronstein, Michael, Vandergheynst, Pierre
Graph Neural Networks (GNNs) are models that leverage the graph structure to transmit information between nodes, typically through the message-passing operation. While widely successful, this approach is well known to suffer from the over-smoothing and over-squashing phenomena, which result in representational collapse as the number of layers increases and insensitivity to the information contained at distant and poorly connected nodes, respectively. In this paper, we present a unified view of these problems through the lens of vanishing gradients, using ideas from linear control theory for our analysis. We propose an interpretation of GNNs as recurrent models and empirically demonstrate that a simple state-space formulation of a GNN effectively alleviates over-smoothing and over-squashing at no extra trainable parameter cost. Further, we show theoretically and empirically that (i) GNNs are by design prone to extreme gradient vanishing even after a few layers; (ii) Over-smoothing is directly related to the mechanism causing vanishing gradients; (iii) Over-squashing is most easily alleviated by a combination of graph rewiring and vanishing gradient mitigation. We believe our work will help bridge the gap between the recurrent and graph neural network literature and will unlock the design of new deep and performant GNNs.
Graph Classification Gaussian Processes via Hodgelet Spectral Features
Alain, Mathieu, Takao, So, Dong, Xiaowen, Rieck, Bastian, Noutahi, Emmanuel
The problem of classifying graphs is ubiquitous in machine learning. While it is standard to apply graph neural networks or graph kernel methods, Gaussian processes can be employed by transforming spatial features from the graph domain into spectral features in the Euclidean domain, and using them as the input points of classical kernels. However, this approach currently only takes into account features on vertices, whereas some graph datasets also support features on edges. In this work, we present a Gaussian process-based classification algorithm that can leverage one or both vertex and edges features. Furthermore, we take advantage of the Hodge decomposition to better capture the intricate richness of vertex and edge features, which can be beneficial on diverse tasks.
Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review
Ye, Rui, Pang, Xianghe, Chai, Jingyi, Chen, Jiaao, Yin, Zhenfei, Xiang, Zhen, Dong, Xiaowen, Shao, Jing, Chen, Siheng
Scholarly peer review is a cornerstone of scientific advancement, but the system is under strain due to increasing manuscript submissions and the labor-intensive nature of the process. Recent advancements in large language models (LLMs) have led to their integration into peer review, with promising results such as substantial overlaps between LLM- and human-generated reviews. However, the unchecked adoption of LLMs poses significant risks to the integrity of the peer review system. In this study, we comprehensively analyze the vulnerabilities of LLM-generated reviews by focusing on manipulation and inherent flaws. Our experiments show that injecting covert deliberate content into manuscripts allows authors to explicitly manipulate LLM reviews, leading to inflated ratings and reduced alignment with human reviews. In a simulation, we find that manipulating 5% of the reviews could potentially cause 12% of the papers to lose their position in the top 30% rankings. Implicit manipulation, where authors strategically highlight minor limitations in their papers, further demonstrates LLMs' susceptibility compared to human reviewers, with a 4.5 times higher consistency with disclosed limitations. Additionally, LLMs exhibit inherent flaws, such as potentially assigning higher ratings to incomplete papers compared to full papers and favoring well-known authors in single-blind review process. These findings highlight the risks of over-reliance on LLMs in peer review, underscoring that we are not yet ready for widespread adoption and emphasizing the need for robust safeguards.
Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning
Borde, Haitz Sรกez de Ocรกriz, Lukoianov, Artem, Kratsios, Anastasis, Bronstein, Michael, Dong, Xiaowen
Traditionally, Graph Neural Networks (GNNs) [1] have primarily been applied to model functions over graphs with a relatively modest number of nodes. However, recently there has been a growing interest in exploring the application of GNNs to large-scale graph benchmarks, including datasets with up to a hundred million nodes [2]. This exploration could potentially lead to better models for industrial applications such as large-scale network analysis in social media, where there are typically millions of users, or in biology, where proteins and other macromolecules are composed of a large number of atoms. This presents a significant challenge in designing GNNs that are scalable while retaining their effectiveness. To this end, we take inspiration from the literature on Large Language Models (LLMs) and propose a simple modification to how GNN architectures are typically arranged. Our framework, Scalable Message Passing Neural Networks (SMPNNs), enables the construction of deep and scalable architectures that outperform the current state-of-the-art models for large graph benchmarks in transductive classification. More specifically, we find that following the typical construction of the Pre-Layer Normalization (Pre-LN) Transformer formulation [3] and replacing attention with standard message-passing convolution is enough to outperform the best Graph Transformers in the literature. Moreover, since our formulation does not necessarily require attention, our architecture scales better than Graph Transformers.
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Tang, Shuo, Pang, Xianghe, Liu, Zexi, Tang, Bohan, Ye, Rui, Dong, Xiaowen, Wang, Yanfeng, Chen, Siheng
We conducted experiments comparing the effectiveness of using simpler versus more complex dataset in different stages of the post-training process to better understand the optimal post-training strategy for large language models. Here we conduct comparison experiment on two kinds of instructions: simple instructions and specialized instructions, denoted as type 1 and type 2. As showen in Table 10, we observe that performing SFT on simpler instructions helps the model to establish a foundational level of instruction-following ability. This is reflected in moderate performance on AlpacaEval 2 (LC 16.25%, WR 17.62%) but lower performance on the more challenging Arena-Hard benchmark (WR 10.7%). When the model is fine-tuned on more specialized and complex data, there is a marginal improvement (LC 14.70%, WR 16.01%, Arena-Hard WR 14.7%), and the significant performance gains are achieved when DPO is applied after SFT. For example, SFT followed by DPO with complex, specialized instructions yields substantial improvements (LC 21.64%, WR 30.06%,
Separation Power of Equivariant Neural Networks
Pacini, Marco, Dong, Xiaowen, Lepri, Bruno, Santin, Gabriele
The separation power of a machine learning model refers to its capacity to distinguish distinct inputs, and it is often employed as a proxy for its expressivity. In this paper, we propose a theoretical framework to investigate the separation power of equivariant neural networks with point-wise activations. Using the proposed framework, we can derive an explicit description of inputs indistinguishable by a family of neural networks with given architecture, demonstrating that it remains unaffected by the choice of non-polynomial activation function employed. We are able to understand the role played by activation functions in separability. Indeed, we show that all non-polynomial activations, such as ReLU and sigmoid, are equivalent in terms of expressivity, and that they reach maximum discrimination capacity. We demonstrate how assessing the separation power of an equivariant neural network can be simplified to evaluating the separation power of minimal representations. We conclude by illustrating how these minimal components form a hierarchy in separation power.