Learning Regional Monsoon Patterns with a Multimodal Attention U-Net

Mazumder, Swaib Ilias, Kumar, Manish, Khan, Aparajita

arXiv.org Artificial Intelligence 

Accurate long-range monsoon rainfall prediction is critical for India's rain-fed agricultural economy and climate resilience planning, yet remains hindered by sparse ground data and complex regional variability. This work proposes a multimodal deep learning framework for gridded precipitation classification using satellite-derived geospatial inputs. Unlike previous rainfall prediction methods relying on coarse-resolution datasets of 5-50 km grid, we curate a high-resolution dataset of projected 1 km grid resolution for five Indian states, integrating seven heterogeneous Earth observation modalities, including land surface temperature, vegetation, soil moisture, humidity, wind speed, elevation, and land use, spanning the June-September 2024 period. We adopt a attention-guided U-Net architecture that captures spatial patterns and temporal dependencies across multi-modalities, and propose a combination of focal and dice loss to address class imbalance and spatial coherence in rainfall categories defined by the India Meteorological Department. Extensive experiments show that the multi-model framework significantly outperforms unimodal baselines and existing deep approaches, especially in underrepresented extreme rainfall zones. The framework demonstrates potential for scalable, region-adaptive monsoon forecasting and Earth observation driven climate risk assessment.