Goto

Collaborating Authors

 subgraph


Optimal community detection in dense bipartite graphs

Neural Information Processing Systems

We consider the problem of detecting a community of densely connected vertices in a high-dimensional bipartite graph of size n1 n2. Under the null hypothesis, the observed graph is drawn from a bipartite Erd os-Renyi distribution with connection probability p0. Under the alternative hypothesis, there exists an unknown bipartite subgraph of size k1 k2 in which edges appear with probability p1 = p0 +ฮดfor some ฮด > 0, while all other edges outside the subgraph appear with probability p0. Specifically, we provide non-asymptotic upper and lower bounds on the smallest signal strength ฮด that is both necessary and sufficient to ensure the existence of a test with small enough Type I and Type II errors. We also derive novel minimax-optimal tests achieving these fundamental limits when the underlying graph is sufficiently dense. Our proposed tests involve a combination of hardthresholded nonlinear statistics of the adjacency matrix, the analysis of which may be of independent interest. In contrast with previous work, our non-asymptotic upper and lower bounds match for any configuration of n1,n2,k1,k2.


Functional Matching of Logic Subgraphs: Beyond Structural Isomorphism

Neural Information Processing Systems

Subgraph matching in logic circuits is foundational for numerous Electronic Design Automation (EDA) applications, including datapath optimization, arithmetic verification, and hardware trojan detection. However, existing techniques rely primarily on structural graph isomorphism and thus fail to identify function-related subgraphs when synthesis transformations substantially alter circuit topology. To overcome this critical limitation, we introduce the concept of functional subgraph matching, a novel approach that identifies whether a given logic function is implicitly present within a larger circuit, irrespective of structural variations induced by synthesis or technology mapping. Specifically, we propose a two-stage multi-modal framework: (1) learning robust functional embeddings across AIG and post-mapping netlists for functional subgraph detection, and (2) identifying fuzzy boundaries using a graph segmentation approach. Evaluations on standard benchmarks (ITC99, OpenABCD, ForgeEDA) demonstrate significant performance improvements over existing structural methods, with average 93.8% accuracy in functional subgraph detection and a dice score of 91.3% in fuzzy boundary identification. The source code and implementation details can be found at our repository.


SignFlow Bipartite Subgraph Network For Large-Scale Graph Link Sign Prediction

Neural Information Processing Systems

Link sign prediction in signed bipartite graphs, which are extensively utilized across diverse domains such as social networks and recommendation systems, has recently emerged as a pivotal challenge. However, significant space and time complexities associated with the scalability of bipartite graphs pose substantial challenges, particularly in large-scale environments. To address these issues, this paper introduces the SignFlow Bipartite Subgraph Network (SBSN), balancing sublinear training memory growth through a heuristic subgraph extraction method integrated with a novel message passing module, with optimal inference efficiency achieved via the node feature distillation module. Our subgraph sampling approach reduces the graph size by focusing on neighborhoods around target links and employs an optimized directed message passing mechanism to aggregate critical structural patterns. This mechanism allows SBSN to efficiently learn rich local structural patterns essential for accurate sign prediction. Furthermore, to overcome the inefficiency of subgraph sampling-based models during inference, SBSN incorporates a node feature distillation module after the first training stage.


Rethinking Protein Protein Interaction Prediction from Pairs to Graphs

Neural Information Processing Systems

Deep learning-based computational methods have achieved promising results in predicting protein-protein interactions (PPIs). However, existing benchmarks predominantly focus on isolated pairwise evaluations, overlooking a model's capability to reconstruct biologically meaningful PPI networks, which is crucial for biology research. To address this gap, we introduce PRING, the first comprehensive benchmark that evaluates PRotein-protein INteraction prediction from a Graph-level perspective. PRINGcurates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions, with well-designed strategies to address both data redundancy and leakage. Building on this golden-standard dataset, we establish two complementary evaluation paradigms: (1) topologyoriented tasks, which assess intra and cross-species PPI network construction, and (2) function-oriented tasks, including protein complex pathway prediction, GO module analysis, and essential protein justification. These evaluations not only reflect the model's capability to understand the network topology but also facilitate protein function annotation, biological module detection, and even disease mechanism analysis. Extensive experiments on four representative model categories, consisting of sequence similarity-based, naive sequence-based, protein language model-based, and structure-based approaches, demonstrate that current PPI models have potential limitations in recovering both structural and functional properties of PPI networks, highlighting the gap in supporting real-world biological applications. We believe PRINGprovides a reliable platform to guide the development of more effective PPI prediction models for the community.


85ec26ef94c3acb4c195e905df1ff4f7-Paper-Conference.pdf

Neural Information Processing Systems

Machine unlearning techniques aim to mitigate unintended memorization in large language models (LLMs). However, existing approaches predominantly focus on the explicit removal of isolated facts, often overlooking latent inferential dependencies and the non-deterministic nature of knowledge within LLMs. Consequently, facts presumed forgotten may persist implicitly through correlated information. To address these challenges, we propose a knowledge unlearning evaluation framework that more accurately captures the implicit structure of real-world knowledge by representing relevant factual contexts as knowledge graphs with associated confidence scores. We further develop an inference-based evaluation protocol leveraging powerful LLMs as judges; these judges reason over the extracted knowledge subgraph to determine unlearning success. Our LLM judges utilize carefully designed prompts and are calibrated against human evaluations to ensure their trustworthiness and stability. Extensive experiments on our newly constructed benchmark demonstrate that our framework provides a more realistic and rigorous assessment of unlearning performance. Moreover, our findings reveal that current evaluation strategies tend to overestimate unlearning effectiveness.


MIHC: Multi-View Interpretable Hypergraph Neural Networks with Information Bottleneck for Chip Congestion Prediction

Neural Information Processing Systems

With the advancement of artificial intelligence (AI) and increasing integrated circuit (IC) design complexity, efficient chip design through electronic design automation (EDA) has become critical. Fast and accurate congestion prediction in chip layout and routing can significantly enhance automated design performance. Existing congestion modeling methods are limited by (i) ineffective processing and fusion of multi-view circuit data information, and (ii) insufficient reliability and interpretability in the prediction process. To address these challenges, we propose the Multi-view Interpretable Hypergraph for Chip (MIHC), a trustworthy multi-view hypergraph neural network framework that (i) processes both graph and image information in unified hypergraph representations, capturing topological and geometric circuit data; (ii) implements a novel subgraph Information Bottleneck mechanism, identifying critical congestion-correlated regions to guide predictions. This work is the first attempt to incorporate such interpretability into congestion prediction through informative graph reasoning. Experiments show that the MIHC method reduces NMAE by 16.67% and 8.57% in cell-based and grid-based predictions on ISPD2015, and 5.26% and 2.44% on CircuitNet-N28, respectively, compared to state-of-the-art methods. Rigorous cross-design generalization experiments further validate our method's capability to handle entirely unseen circuit designs.


Quantifying Distributional Invariance in Causal Subgraph for IRM-Free Graph Generalization

Neural Information Processing Systems

Out-of-distribution generalization under distributional shifts remains a critical challenge for graph neural networks. Existing methods generally adopt the Invariant Risk Minimization (IRM) framework, requiring costly environment annotations or heuristically generated synthetic splits. To circumvent these limitations, in this work, we aim to develop an IRM-free method for capturing causal subgraphs. We first identify that causal subgraphs exhibit substantially smaller distributional variations than non-causal components across diverse environments, which we formalize as the Invariant Distribution Criterion and theoretically prove in this paper. Building on this criterion, we systematically uncover the quantitative relationship between distributional shift and representation norm for identifying the causal subgraph, and investigate its underlying mechanisms in depth. Finally, we propose an IRM-free method by introducing a norm-guided invariant distribution objective for causal subgraph discovery and prediction. Extensive experiments on two widely used benchmarks demonstrate that our method consistently outperforms state-of-the-art methods in graph generalization. Code is available at https: //github.com/anders1123/IDG.


Bayesian Ego-graph Inference for Networked Multi-Agent Reinforcement Learning

Neural Information Processing Systems

In networked multi-agent reinforcement learning (Networked-MARL), decentralized agents must act autonomously under local observability and constrained communication over fixed physical graphs. Existing methods often assume static neighborhoods, limiting adaptability to dynamic or heterogeneous environments. While centralized frameworks can learn dynamic graphs, their reliance on global state access and centralized infrastructure is impractical in real-world decentralized systems. We propose a stochastic graph-based policy for Networked-MARL, where each agent conditions its decision on a sampled subgraph over its local physical neighborhood. Building on this formulation, we introduce BayesG, a decentralized actor-critic framework that learns sparse, context-aware interaction structures via Bayesian variational inference. Each agent operates over an ego-graph and samples a latent communication mask to guide message passing and policy computation. The variational distribution is trained end-to-end alongside the policy using an evidence lower bound (ELBO) objective, enabling agents to jointly learn both interaction topology and decision-making strategies. BayesG outperforms strong MARL baselines on large-scale traffic control tasks with up to 167 agents, demonstrating superior scalability, efficiency, and performance.


Flatten Graphs as Sequences: Transformers are Scalable Graph Generators

Neural Information Processing Systems

We introduce AUTOGRAPH, a scalable autoregressive model for attributed graph generation using decoder-only transformers. By flattening graphs into random sequences of tokens through a reversible process, AUTOGRAPH enables modeling graphs as sequences without relying on additional node features that are expensive to compute, in contrast to diffusion-based approaches.


Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking

Neural Information Processing Systems

Knowledge Graph Question Answering (KGQA) systems rely on high-quality benchmarks to evaluate complex multi-hop reasoning. However, despite their widespread use, popular datasets such as WebQSP and CWQ suffer from critical quality issues, including inaccurate or incomplete ground-truth annotations, poorly constructed questions that are ambiguous, trivial, or unanswerable, and outdated or inconsistent knowledge. Through a manual audit of 16 popular KGQA datasets--including WebQSPand CWQ--we find that the average factual correctness rate is only 57%. To address these issues, we introduce KGQAGen, an LLM-inthe-loop framework that systematically resolves these pitfalls. KGQAGencombines structured knowledge grounding, LLM-guided generation, and symbolic verification to produce challenging and verifiable QA instances. Using KGQAGen, we construct KGQAGen-10k, a 10K-scale benchmark grounded in Wikidata, and evaluate a diverse set of KG-RAG models. Experimental results demonstrate that even state-of-the-art systems struggle on this benchmark, highlighting its ability to expose limitations of existing models. Our findings advocate for more rigorous benchmark construction and position KGQAGen as a scalable framework for advancing KGQA evaluation 1.