Goto

Collaborating Authors

 node importance


PINE: Pipeline for Important Node Exploration in Attributed Networks

arXiv.org Artificial Intelligence

A graph with semantically attributed nodes are a common data structure in a wide range of domains. It could be interlinked web data or citation networks of scientific publications. The essential problem for such a data type is to determine nodes that carry greater importance than all the others, a task that markedly enhances system monitoring and management. Traditional methods to identify important nodes in networks introduce centrality measures, such as node degree or more complex PageRank. However, they consider only the network structure, neglecting the rich node attributes. Recent methods adopt neural networks capable of handling node features, but they require supervision. This work addresses the identified gap--the absence of approaches that are both unsupervised and attribute-aware--by introducing a Pipeline for Important Node Exploration (PINE). At the core of the proposed framework is an attention-based graph model that incorporates node semantic features in the learning process of identifying the structural graph properties. The PINE's node importance scores leverage the obtained attention distribution. We demonstrate the superior performance of the proposed PINE method on various homogeneous and heterogeneous attributed networks. As an industry-implemented system, PINE tackles the real-world challenge of unsupervised identification of key entities within large-scale enterprise graphs.


Importance Ranking in Complex Networks via Influence-aware Causal Node Embedding

arXiv.org Artificial Intelligence

Abstract--Understanding and quantifying node importance is a fundamental problem in network science and engineering, underpinning a wide range of applications such as influence maximization, social recommendation, and network dismantling. Prior research often relies on centrality measures or advanced graph embedding techniques using structural information, followed by downstream classification or regression tasks to identify critical nodes. However, these methods typically decouple node representation learning from the ranking objective and rely on the topological structure of target networks, leading to feature-task inconsistency and limited generalization across networks. This paper proposes a novel framework that leverages causal representation learning to get robust, invariant node embeddings for cross-network ranking tasks. Firstly, we introduce an influence-aware causal node embedding module within an autoencoder architecture to extract node embeddings that are causally related to node importance. Moreover, we introduce a causal ranking loss and design a unified optimization framework that jointly optimizes the reconstruction and ranking objectives, enabling mutual reinforcement between node representation learning and ranking optimization. This design allows the proposed model to be trained on synthetic networks and to generalize effectively across diverse real-world networks. Extensive experiments on multiple benchmark datasets demonstrate that the proposed model consistently outperforms state-of-the-art baselines in terms of both ranking accuracy and cross-network transferability, offering new insights for network analysis and engineering applications--particularly in scenarios where the target network's structure is inaccessible in advance due to privacy or security constraints. Complex networks provide a powerful framework for modeling and analyzing a wide range of systems across diverse domains, including social networks, transportation systems, and biological networks [1]. In these networks, nodes represent entities within a real system such as individuals, infrastructure components, or functional units, while edges capture interactions or relationships between them. A key challenge in network science and engineering is identifying important nodes, as they play pivotal roles in maintaining network functionality, performance, stability, and robustness [2].


Emergent Directedness in Social Contagion

arXiv.org Artificial Intelligence

An enduring challenge in contagion theory is that the pathways contagions follow through social networks exhibit emergent complexities that are difficult to predict using network structure. Here, we address this challenge by developing a causal modeling framework that (i) simulates the possible network pathways that emerge as contagions spread and (ii) identifies which edges and nodes are most impactful on diffusion across these possible pathways. This yields a surprising discovery. If people require exposure to multiple peers to adopt a contagion (a.k.a., 'complex contagions'), the pathways that emerge often only work in one direction. In fact, the more complex a contagion is, the more asymmetric its paths become. This emergent directedness problematizes canonical theories of how networks mediate contagion. Weak ties spanning network regions - widely thought to facilitate mutual influence and integration - prove to privilege the spread contagions from one community to the other. Emergent directedness also disproportionately channels complex contagions from the network periphery to the core, inverting standard centrality models. We demonstrate two practical applications. We show that emergent directedness accounts for unexplained nonlinearity in the effects of tie strength in a recent study of job diffusion over LinkedIn. Lastly, we show that network evolution is biased toward growing directed paths, but that cultural factors (e.g., triadic closure) can curtail this bias, with strategic implications for network building and behavioral interventions.


Information Entropy-Based Scheduling for Communication-Efficient Decentralized Learning

arXiv.org Artificial Intelligence

This paper addresses decentralized stochastic gradient descent (D-SGD) over resource-constrained networks by introducing node-based and link-based scheduling strategies to enhance communication efficiency. In each iteration of the D-SGD algorithm, only a few disjoint subsets of nodes or links are randomly activated, subject to a given communication cost constraint. We propose a novel importance metric based on information entropy to determine node and link scheduling probabilities. We validate the effectiveness of our approach through extensive simulations, comparing it against state-of-the-art methods, including betweenness centrality (BC) for node scheduling and \textit{MATCHA} for link scheduling. The results show that our method consistently outperforms the BC-based method in the node scheduling case, achieving faster convergence with up to 60\% lower communication budgets. At higher communication budgets (above 60\%), our method maintains comparable or superior performance. In the link scheduling case, our method delivers results that are superior to or on par with those of \textit{MATCHA}.


Critical Nodes Identification in Complex Networks: A Survey

arXiv.org Artificial Intelligence

Complex networks have become essential tools for understanding diverse phenomena in social systems, traffic systems, biomolecular systems, and financial systems. Identifying critical nodes is a central theme in contemporary research, serving as a vital bridge between theoretical foundations and practical applications. Nevertheless, the intrinsic complexity and structural heterogeneity characterizing real-world networks, with particular emphasis on dynamic and higher-order networks, present substantial obstacles to the development of universal frameworks for critical node identification. This paper provides a comprehensive review of critical node identification techniques, categorizing them into seven main classes: centrality, critical nodes deletion problem, influence maximization, network control, artificial intelligence, higher-order and dynamic methods. Our review bridges the gaps in existing surveys by systematically classifying methods based on their methodological foundations and practical implications, and by highlighting their strengths, limitations, and applicability across different network types. Our work enhances the understanding of critical node research by identifying key challenges, such as algorithmic universality, real-time evaluation in dynamic networks, analysis of higher-order structures, and computational efficiency in large-scale networks. The structured synthesis consolidates current progress and highlights open questions, particularly in modeling temporal dynamics, advancing efficient algorithms, integrating machine learning approaches, and developing scalable and interpretable metrics for complex systems.


Graph Representations for Reading Comprehension Analysis using Large Language Model and Eye-Tracking Biomarker

arXiv.org Artificial Intelligence

Reading comprehension is a fundamental skill in human cognitive development. With the advancement of Large Language Models (LLMs), there is a growing need to compare how humans and LLMs understand language across different contexts and apply this understanding to functional tasks such as inference, emotion interpretation, and information retrieval. Our previous work used LLMs and human biomarkers to study the reading comprehension process. The results showed that the biomarkers corresponding to words with high and low relevance to the inference target, as labeled by the LLMs, exhibited distinct patterns, particularly when validated using eye-tracking data. However, focusing solely on individual words limited the depth of understanding, which made the conclusions somewhat simplistic despite their potential significance. This study used an LLM-based AI agent to group words from a reading passage into nodes and edges, forming a graph-based text representation based on semantic meaning and question-oriented prompts. We then compare the distribution of eye fixations on important nodes and edges. Our findings indicate that LLMs exhibit high consistency in language understanding at the level of graph topological structure. These results build on our previous findings and offer insights into effective human-AI co-learning strategies.


Biologically Plausible Brain Graph Transformer

arXiv.org Artificial Intelligence

State-of-the-art brain graph analysis methods fail to fully encode the small-world architecture of brain graphs (accompanied by the presence of hubs and functional modules), and therefore lack biological plausibility to some extent. This limitation hinders their ability to accurately represent the brain's structural and functional properties, thereby restricting the effectiveness of machine learning models in tasks such as brain disorder detection. In this work, we propose a novel Biologically Plausible Brain Graph Transformer (BioBGT) that encodes the small-world architecture inherent in brain graphs. Specifically, we present a network entanglement-based node importance encoding technique that captures the structural importance of nodes in global information propagation during brain graph communication, highlighting the biological properties of the brain structure. Furthermore, we introduce a functional module-aware self-attention to preserve the functional segregation and integration characteristics of brain graphs in the learned representations. Hub2 (a) Hubs play essential roles (b) Functional modules in the brain. One Figure 1: Small-world architecture of brain graphs. of the most important characteristics of brain graphs is their small-world architecture, with scientific evidence supporting the presence of hubs and functional modules in brain graphs (Liao et al., 2017; Swanson et al., 2024). First, it is demonstrated that nodes in brain graphs exhibit a high degree of difference in their importance, with certain nodes having more central roles in information propagation (Lynn & Bassett, 2019; Betzel et al., 2024). These nodes are perceived as hubs, as shown in Figure 1 (a) (the visualization is based on findings by Seguin et al. (2023)), which are usually highly connected so as to support efficient communication within the brain. Second, human brain consists of various functional modules (e.g., visual cortex), where ROIs within the same module exhibit high functional coherence, termed functional integration, while ROIs from different modules show lower functional coherence, termed functional segregation (Rubinov & Sporns, 2010; Seguin et al., 2022). Therefore, brain graphs are characterized by community structure, reflecting functional modules. Our code is available at https://github.com/pcyyyy/BioBGT. ROIs in the same module have strong connections (high temporal correlations), while those from different modules show weaker connections. With the significant ability of graph transformers in capturing interactions between nodes (Ma et al., 2023a; Shehzad et al., 2024; Yi et al., 2024), Transformer-based brain graph learning methods have gained prominence (Kan et al., 2022; Bannadabhavi et al., 2023).


Node Importance Estimation Leveraging LLMs for Semantic Augmentation in Knowledge Graphs

arXiv.org Artificial Intelligence

Node Importance Estimation (NIE) is a task that quantifies the importance of node in a graph. Recent research has investigated to exploit various information from Knowledge Graphs (KGs) to estimate node importance scores. However, the semantic information in KGs could be insufficient, missing, and inaccurate, which would limit the performance of existing NIE models. To address these issues, we leverage Large Language Models (LLMs) for semantic augmentation thanks to the LLMs' extra knowledge and ability of integrating knowledge from both LLMs and KGs. To this end, we propose the LLMs Empowered Node Importance Estimation (LENIE) method to enhance the semantic information in KGs for better supporting NIE tasks. To our best knowledge, this is the first work incorporating LLMs into NIE. Specifically, LENIE employs a novel clustering-based triplet sampling strategy to extract diverse knowledge of a node sampled from the given KG. After that, LENIE adopts the node-specific adaptive prompts to integrate the sampled triplets and the original node descriptions, which are then fed into LLMs for generating richer and more precise augmented node descriptions. These augmented descriptions finally initialize node embeddings for boosting the downstream NIE model performance. Extensive experiments demonstrate LENIE's effectiveness in addressing semantic deficiencies in KGs, enabling more informative semantic augmentation and enhancing existing NIE models to achieve the state-of-the-art performance. The source code of LENIE is freely available at \url{https://github.com/XinyuLin-FZ/LENIE}.


Deep Structural Knowledge Exploitation and Synergy for Estimating Node Importance Value on Heterogeneous Information Networks

arXiv.org Artificial Intelligence

Node importance estimation problem has been studied conventionally with homogeneous network topology analysis. To deal with network heterogeneity, a few recent methods employ graph neural models to automatically learn diverse sources of information. However, the major concern revolves around that their full adaptive learning process may lead to insufficient information exploration, thereby formulating the problem as the isolated node value prediction with underperformance and less interpretability. In this work, we propose a novel learning framework: SKES. Different from previous automatic learning designs, SKES exploits heterogeneous structural knowledge to enrich the informativeness of node representations. Based on a sufficiently uninformative reference, SKES estimates the importance value for any input node, by quantifying its disparity against the reference. This establishes an interpretable node importance computation paradigm. Furthermore, SKES dives deep into the understanding that "nodes with similar characteristics are prone to have similar importance values" whilst guaranteeing that such informativeness disparity between any different nodes is orderly reflected by the embedding distance of their associated latent features. Extensive experiments on three widely-evaluated benchmarks demonstrate the performance superiority of SKES over several recent competing methods.


MGL2Rank: Learning to Rank the Importance of Nodes in Road Networks Based on Multi-Graph Fusion

arXiv.org Artificial Intelligence

Identifying important nodes with strong propagation capabilities in road networks is a significant topic in the field of urban planning. However, existing methods for evaluating the importance of nodes in traffic network consider only topological information and traffic volumes, ignoring the diversity of characteristics in road networks, such as the number of lanes and average speed of road segments, limiting their performance. To solve this problem, we propose a graph learning-based framework (MGL2Rank) that integrates the rich characteristics of road network for ranking the importance of nodes. In this framework, we first develop an embedding module that contains a sampling algorithm (MGWalk) and an encoder network to learn latent representation for each road segment. MGWalk utilizes multi-graph fusion to capture the topology of the road network and establish associations among road segments based on their attributes. Then, we use the obtained node representation to learn the importance ranking of road segments. Finally, we construct a synthetic dataset for ranking tasks based on the regional road network of Shenyang city, and our ranking results on this dataset demonstrate the effectiveness of our proposed method. The data and source code of MGL2Rank are available at https://github.com/ZJ726.