GRIT-LP: Graph Transformer with Long-Range Skip Connection and Partitioned Spatial Graphs for Accurate Ice Layer Thickness Prediction

Liu, Zesheng, Rahnemoonfar, Maryam

arXiv.org Artificial Intelligence 

Graph transformers have demonstrated remarkable capability on complex spatio-temporal tasks, yet their depth is often limited by oversmoothing and weak long-range dependency modeling. To address these challenges, we introduce GRIT -LP, a graph transformer explicitly designed for polar ice-layer thickness estimation from polar radar imagery. Accurately estimating ice layer thickness is critical for understanding snow accumulation, reconstructing past climate patterns and reducing uncertainties in projections of future ice sheet evolution and sea level rise. GRIT -LP combines an inductive geometric graph learning framework with self-attention mechanism, and introduces two major innovations that jointly address challenges in modeling the spatio-temporal patterns of ice layers: a partitioned spatial graph construction strategy that forms overlapping, fully connected local neighborhoods to preserve spatial coherence and suppress noise from irrelevant long-range links, and a long-range skip connection mechanism within the transformer that improves information flow and mitigates oversmooth-ing in deeper attention layers. We conducted extensive experiments, demonstrating that GRIT -LP outperforms current state-of-the-art methods with a 24.92% improvement in root mean squared error. These results highlight the effectiveness of graph transformers in modeling spatiotemporal patterns by capturing both localized structural features and long-range dependencies across internal ice layers, and demonstrate their potential to advance data-driven understanding of cryospheric processes. Introduction Graph transformers have proven to be highly effective for modeling complex graph-structured data, with wide-range of applications in real-world scenarios, particularly those involving spatiotemporal patterns. Their ability to capture intricate relationships and dependencies makes them highly valuable in domains such as pedestrian trajectory prediction [1] and traffic prediction [2]. Despite their success, current graph transformer architectures face notable limitations, including overfitting and over-smoothing--a phenomenon where node features become indistinguishable as layers deepen [3]. Additionally, many existing graph transformers are relatively shallow, limiting their ability to effectively capture the complex, long-range dependencies that often emerge in real-world datasets.