FSD-CAP: Fractional Subgraph Diffusion with Class-Aware Propagation for Graph Feature Imputation

Qiao, Xin, Sun, Shijie, Dong, Anqi, Hua, Cong, Zhao, Xia, Zhang, Longfei, Zhu, Guangming, Zhang, Liang

Jan-28-2026–arXiv.org Machine Learning

Imputing missing node features in graphs is challenging, particularly under high missing rates. Existing methods based on latent representations or global diffusion often fail to produce reliable estimates, and may propagate errors across the graph. We propose FSD-CAP, a two-stage framework designed to improve imputation quality under extreme sparsity. A fractional diffusion operator adjusts propagation sharpness based on local structure. In the second stage, imputed features are refined using class-aware propagation, which incorporates pseudo-labels and neighborhood entropy to promote consistency. We evaluated FSD-CAP on multiple datasets. With 99 .5% of features missing across five benchmark datasets, FSD-CAP achieves average accuracies of 80 .06% For link prediction under the same setting, it reaches AUC scores of 91. Furthermore, FSD-CAP demonstrates superior performance on both large-scale and heterophily datasets when compared to other models. Graph Neural Networks (GNNs) are widely used for learning from graph-structured data, with successful applications in social networks (Bian et al., 2020), biology (Li et al., 2022), and recommendation systems (He et al., 2020). GNN architectures(Chen et al., 2023; Chien et al., 2020) always assume nodal features are fully observed, allowing information to be aggregated effectively from neighboring nodes. In practice, this assumption often fails. Node attributes are frequently missing due to privacy constraints, sensor failures, or incomplete data collection. High missing rates disrupt the message-passing process and significantly degrade model performance. A variety of methods have been proposed for imputing missing features, including statistical estimators (Srebro et al., 2004), machine learning models (Chen & Guestrin, 2016), and generative approaches (Vincent et al., 2008). Recent work has shifted toward deep learning techniques that model the distribution of node attributes. These include latent space models that align observed features with learned embeddings (Chen et al., 2020; Y oo et al., 2022), and GNN-based architectures designed to operate on incomplete inputs (Taguchi et al., 2021). These approaches, which rely on correlations in both feature and graph structure, are effective under moderate missing rates but experience significant performance degradation as sparsity increases, ultimately falling below simple baselines like zero-filling or mean imputation in highly incomplete settings(Y ou et al., 2020).

artificial intelligence, machine learning, node, (18 more...)

arXiv.org Machine Learning

Jan-28-2026

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Shaanxi Province > Xi'an (0.04)
- Europe > Sweden
  - Stockholm > Stockholm (0.04)
- North America > United States
  - Texas (0.04)
  - Wisconsin (0.04)

Genre:
- Research Report (1.00)

Industry:
- Information Technology (0.87)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (1.00)
  - Performance Analysis > Accuracy (0.67)
  - Statistical Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found