DRIK: Distribution-Robust Inductive Kriging without Information Leakage

Yang, Chen, Zhao, Changhao, Wang, Chen, Fan, Jiansheng

Sep-30-2025–arXiv.org Artificial Intelligence

Inductive kriging supports high-resolution spatio-temporal estimation with sparse sensor networks, but conventional training-evaluation setups often suffer from information leakage and poor out-of-distribution (OOD) generalization. We find that the common 2 2 spatio-temporal split allows test data to influence model selection through early stopping, obscuring the true OOD characteristics of inductive kriging. To address this issue, we propose a 3 3 partition that cleanly separates training, validation, and test sets, eliminating leakage and better reflecting real-world applications. Building on this redefined setting, we introduce DRIK, a Distribution-Robust Inductive Kriging approach designed with the intrinsic properties of inductive kriging in mind to explicitly enhance OOD generalization, employing a three-tier strategy at the node, edge, and subgraph levels. DRIK perturbs node coordinates to capture continuous spatial relationships, drops edges to reduce ambiguity in information flow and increase topological diversity, and adds pseudo-labeled subgraphs to strengthen domain generalization. Experiments on six diverse spatio-temporal datasets show that DRIK consistently outperforms existing methods, achieving up to 12.48% lower MAE while maintaining strong scalability. Sensors are widely used to monitor traffic flow (Kong et al., 2024), air quality (Y u et al., 2025), and solar energy production (Jebli et al., 2021), among other applications. However, their high deployment costs often limit sensor density and prevent comprehensive coverage of large areas (Liang et al., 2019; Seo et al., 2017). Inductive kriging provides a promising solution by estimating values at unsensed locations using data from existing sensors (Wu et al., 2021a; Zheng et al., 2023; Xu et al., 2025). Kriging models can generate high-resolution spatio-temporal estimates, improving accuracy while reducing the deployment and maintenance demands of large-scale sensor networks. The standard training and evaluation protocol for inductive kriging (Wu et al., 2021a) generally involves three steps, as shown in Figure 1 (a): (1) The complete spatio-temporal dataset X R This produces a 2 2 partition, with the final training and test sets drawn from diagonally opposite sections. A key limitation of this approach stems from the widespread use of early stopping during model training (Zheng et al., 2023).

data mining, machine learning, node, (18 more...)

arXiv.org Artificial Intelligence

Sep-30-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.28)

Genre:
- Research Report > Promising Solution (0.34)

Industry:
- Energy
  - Renewable > Solar (0.69)
  - Power Industry > Utilities (0.48)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning
      - Statistical Learning (1.00)
      - Neural Networks > Deep Learning (0.46)