reanalysis data
DAMBench: A Multi-Modal Benchmark for Deep Learning-based Atmospheric Data Assimilation
Wang, Hao, Weng, Zixuan, Han, Jindong, Fan, Wei, Liu, Hao
Data Assimilation is a cornerstone of atmospheric system modeling, tasked with reconstructing system states by integrating sparse, noisy observations with prior estimation. While traditional approaches like variational and ensemble Kalman filtering have proven effective, recent advances in deep learning offer more scalable, efficient, and flexible alternatives better suited for complex, real-world data assimilation involving large-scale and multi-modal observations. However, existing deep learning-based DA research suffers from two critical limitations: (1) reliance on oversimplified scenarios with synthetically perturbed observations, and (2) the absence of standardized benchmarks for fair model comparison. To address these gaps, in this work, we introduce DAMBench, the first large-scale multi-modal benchmark designed to evaluate data-driven DA models under realistic atmospheric conditions. DAMBench integrates high-quality background states from state-of-the-art forecasting systems and real-world multi-modal observations (i.e., real-world weather stations and satellite imagery). All data are resampled to a common grid and temporally aligned to support systematic training, validation, and testing. We provide unified evaluation protocols and benchmark representative data assimilation approaches, including latent generative models and neural process frameworks. Additionally, we propose a lightweight multi-modal plugin to demonstrate how integrating realistic observations can enhance even simple baselines. Through comprehensive experiments, DAMBench establishes a rigorous foundation for future research, promoting reproducibility, fair comparison, and extensibility to real-world multi-modal scenarios. Our dataset and code are publicly available at https://github.com/figerhaowang/DAMBench.
Assessing the risk of future Dunkelflaute events for Germany using generative deep learning
Strnad, Felix, Schmidt, Jonathan, Mockert, Fabian, Hennig, Philipp, Ludwig, Nicole
The European electricity power grid is transitioning towards renewable energy sources, characterized by an increasing share of off- and onshore wind and solar power. However, the weather dependency of these energy sources poses a challenge to grid stability, with so-called Dunkelflaute events -- periods of low wind and solar power generation -- being of particular concern due to their potential to cause electricity supply shortages. In this study, we investigate the impact of these events on the German electricity production in the years and decades to come. For this purpose, we adapt a recently developed generative deep learning framework to downscale climate simulations from the CMIP6 ensemble. We first compare their statistics to the historical record taken from ERA5 data. Next, we use these downscaled simulations to assess plausible future occurrences of Dunkelflaute events in Germany under the optimistic low (SSP2-4.5) and high (SSP5-8.5) emission scenarios. Our analysis indicates that both the frequency and duration of Dunkelflaute events in Germany in the ensemble mean are projected to remain largely unchanged compared to the historical period. This suggests that, under the considered climate scenarios, the associated risk is expected to remain stable throughout the century.
- Europe > Poland (0.14)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.05)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- (7 more...)
SolarSeer: Ultrafast and accurate 24-hour solar irradiance forecasts outperforming numerical weather prediction across the USA
Bai, Mingliang, Fang, Zuliang, Tao, Shengyu, Xiang, Siqi, Bian, Jiang, Xiang, Yanfei, Zhao, Pengcheng, Jin, Weixin, Weyn, Jonathan A., Dong, Haiyu, Zhang, Bin, Sun, Hongyu, Thambiratnam, Kit, Zhang, Qi, Sun, Hongbin, Zhang, Xuan, Wu, Qiuwei
Accurate 24-hour solar irradiance forecasting is essential for the safe and economic operation of solar photovoltaic systems. Traditional numerical weather prediction (NWP) models represent the state-of-the-art in forecasting performance but rely on computationally costly data assimilation and solving complicated partial differential equations (PDEs) that simulate atmospheric physics. Here, we introduce SolarSeer, an end-to-end large artificial intelligence (AI) model for solar irradiance forecasting across the Contiguous United States (CONUS). SolarSeer is designed to directly map the historical satellite observations to future forecasts, eliminating the computational overhead of data assimilation and PDEs solving. This efficiency allows SolarSeer to operate over 1,500 times faster than traditional NWP, generating 24-hour cloud cover and solar irradiance forecasts for the CONUS at 5-kilometer resolution in under 3 seconds. Compared with the state-of-the-art NWP in the CONUS, i.e., High-Resolution Rapid Refresh (HRRR), SolarSeer significantly reduces the root mean squared error of solar irradiance forecasting by 27.28% in reanalysis data and 15.35% across 1,800 stations. SolarSeer also effectively captures solar irradiance fluctuations and significantly enhances the first-order irradiance difference forecasting accuracy. SolarSeer's ultrafast, accurate 24-hour solar irradiance forecasts provide strong support for the transition to sustainable, net-zero energy systems.
- Europe (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (3 more...)
Spatiotemporally Coherent Probabilistic Generation of Weather from Climate
Schmidt, Jonathan, Schmidt, Luca, Strnad, Felix, Ludwig, Nicole, Hennig, Philipp
Local climate information is crucial for impact assessment and decision-making, yet coarse global climate simulations cannot capture small-scale phenomena. However, to preserve physical properties, estimating spatio-temporally coherent high-resolution weather dynamics for multiple variables across long time horizons is crucial. We present a novel generative approach that uses a score-based diffusion model trained on high-resolution reanalysis data to capture the statistical properties of local weather dynamics. After training, we condition on coarse climate model data to generate weather patterns consistent with the aggregate information. As this inference task is inherently uncertain, we leverage the probabilistic nature of diffusion models and sample multiple trajectories. We evaluate our approach with high-resolution reanalysis information before applying it to the climate model downscaling task. We then demonstrate that the model generates spatially and temporally coherent weather dynamics that align with global climate output. Numerical simulations based on the Navier-Stokes equations, discretized over time and space, are fundamental to understanding weather patterns, climate variability, and climate change. Stateof-the-art numerical weather prediction (NWP) models, which primarily focus on atmospheric processes, can accurately resolve small-scale dynamics within the Earth system, providing fine-scale spatial and temporal weather patterns at resolutions on the order of kilometers [1]. However, the substantial computational resources required for these models render them impractical for simulating the extended time scales associated with climatic changes. In contrast, Earth System Models (ESMs), such as those included in the CMIP6 project [2], incorporate a broader range of processes--including atmospheric, oceanic, and biogeochemical interactions--while operating on coarser spatial scales. This coarse resolution limits the ability of ESMs to fully capture small-scale processes, requiring parameterizations to represent unresolved dynamics as functions of resolved variables. This work introduces a probabilistic downscaling pipeline that jointly estimates spatio-temporally consistent weather dynamics from ESM simulations on multiple variables. The framework is built around a score-based diffusion model and can be understood as a combination of four modules, which can each be adjusted independently of the others. This schematic outlines the framework.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Europe > Austria (0.04)
- North America > United States (0.04)
- (5 more...)
Generative Diffusion Model-based Downscaling of Observed Sea Surface Height over Kuroshio Extension since 2000
Han, Qiuchang, Jiang, Xingliang, Zhao, Yang, Wang, Xudong, Li, Zhijin, Zhang, Renhe
Satellite altimetry has been widely utilized to monitor global sea surface dynamics, enabling investigation of upper ocean variability from basin-scale to localized eddy ranges. However, the sparse spatial resolution of observational altimetry limits our understanding of oceanic submesoscale variability, prevalent at horizontal scales below 0.25o resolution. Here, we introduce a state-of-the-art generative diffusion model to train high-resolution sea surface height (SSH) reanalysis data and demonstrate its advantage in observational SSH downscaling over the eddy-rich Kuroshio Extension region. The diffusion-based model effectively downscales raw satellite-interpolated data from 0.25o resolution to 1/16o, corresponding to approximately 12-km wavelength. This model outperforms other high-resolution reanalysis datasets and neural network-based methods. Also, it successfully reproduces the spatial patterns and power spectra of satellite along-track observations. Our diffusion-based results indicate that eddy kinetic energy at horizontal scales less than 250 km has intensified significantly since 2004 in the Kuroshio Extension region. These findings underscore the great potential of deep learning in reconstructing satellite altimetry and enhancing our understanding of ocean dynamics at eddy scales.
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Liaoning Province > Shenyang (0.04)
- Southern Ocean (0.04)
Long-term foehn reconstruction combining unsupervised and supervised learning
Stauffer, Reto, Zeileis, Achim, Mayr, Georg J.
Foehn winds, characterized by abrupt temperature increases and wind speed changes, significantly impact regions on the leeward side of mountain ranges, e.g., by spreading wildfires. Understanding how foehn occurrences change under climate change is crucial. Unfortunately, foehn cannot be measured directly but has to be inferred from meteorological measurements employing suitable classification schemes. Hence, this approach is typically limited to specific periods for which the necessary data are available. We present a novel approach for reconstructing historical foehn occurrences using a combination of unsupervised and supervised probabilistic statistical learning methods. We utilize in-situ measurements (available for recent decades) to train an unsupervised learner (finite mixture model) for automatic foehn classification. These labeled data are then linked to reanalysis data (covering longer periods) using a supervised learner (lasso or boosting). This allows to reconstruct past foehn probabilities based solely on reanalysis data. Applying this method to ERA5 reanalysis data for six stations across Switzerland and Austria achieves accurate hourly reconstructions of north and south foehn occurrence, respectively, dating back to 1940. This paves the way for investigating how seasonal foehn patterns have evolved over the past 83 years, providing valuable insights into climate change impacts on these critical wind events.
- North America > United States > California (0.14)
- Europe > Austria > Tyrol > Innsbruck (0.07)
- North America > United States > Montana (0.05)
- (11 more...)
Feasibility of machine learning-based rice yield prediction in India at the district level using climate reanalysis data
De Clercq, Djavan, Mahdi, Adam
Yield forecasting, the science of predicting agricultural productivity before the crop harvest occurs, helps a wide range of stakeholders make better decisions around agricultural planning. This study aims to investigate whether machine learning-based yield prediction models can capably predict Kharif season rice yields at the district level in India several months before the rice harvest takes place. The methodology involved training 19 machine learning models such as CatBoost, LightGBM, Orthogonal Matching Pursuit, and Extremely Randomized Trees on 20 years of climate, satellite, and rice yield data across 247 of Indian rice-producing districts. In addition to model-building, a dynamic dashboard was built understand how the reliability of rice yield predictions varies across districts. The results of the proof-of-concept machine learning pipeline demonstrated that rice yields can be predicted with a reasonable degree of accuracy, with out-of-sample R2, MAE, and MAPE performance of up to 0.82, 0.29, and 0.16 respectively. These results outperformed test set performance reported in related literature on rice yield modeling in other contexts and countries. In addition, SHAP value analysis was conducted to infer both the importance and directional impact of the climate and remote sensing variables included in the model. Important features driving rice yields included temperature, soil water volume, and leaf area index. In particular, higher temperatures in August correlate with increased rice yields, particularly when the leaf area index in August is also high. Building on the results, a proof-of-concept dashboard was developed to allow users to easily explore which districts may experience a rise or fall in yield relative to the previous year.
- Asia > Vietnam (0.14)
- North America > Haiti (0.14)
- Asia > Bangladesh (0.04)
- (30 more...)
- Health & Medicine (1.00)
- Government (1.00)
- Food & Agriculture > Agriculture (1.00)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
A non-intrusive machine learning framework for debiasing long-time coarse resolution climate simulations and quantifying rare events statistics
Sorensen, Benedikt Barthel, Charalampopoulos, Alexis, Zhang, Shixuan, Harrop, Bryce, Leung, Ruby, Sapsis, Themistoklis
Due to the rapidly changing climate, the frequency and severity of extreme weather is expected to increase over the coming decades. As fully-resolved climate simulations remain computationally intractable, policy makers must rely on coarse-models to quantify risk for extremes. However, coarse models suffer from inherent bias due to the ignored "sub-grid" scales. We propose a framework to non-intrusively debias coarse-resolution climate predictions using neural-network (NN) correction operators. Previous efforts have attempted to train such operators using loss functions that match statistics. However, this approach falls short with events that have longer return period than that of the training data, since the reference statistics have not converged. Here, the scope is to formulate a learning method that allows for correction of dynamics and quantification of extreme events with longer return period than the training data. The key obstacle is the chaotic nature of the underlying dynamics. To overcome this challenge, we introduce a dynamical systems approach where the correction operator is trained using reference data and a coarse model simulation nudged towards that reference. The method is demonstrated on debiasing an under-resolved quasi-geostrophic model and the Energy Exascale Earth System Model (E3SM). For the former, our method enables the quantification of events that have return period two orders longer than the training data. For the latter, when trained on 8 years of ERA5 data, our approach is able to correct the coarse E3SM output to closely reflect the 36-year ERA5 statistics for all prognostic variables and significantly reduce their spatial biases.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- Europe > Northern Europe (0.04)
- (10 more...)
Sim2Real for Environmental Neural Processes
Scholz, Jonas, Andersson, Tom R., Vaughan, Anna, Requeima, James, Turner, Richard E.
Machine learning (ML)-based weather models have recently undergone rapid improvements. These models are typically trained on gridded reanalysis data from numerical data assimilation systems. However, reanalysis data comes with limitations, such as assumptions about physical laws and low spatiotemporal resolution. The gap between reanalysis and reality has sparked growing interest in training ML models directly on observations such as weather stations. Modelling scattered and sparse environmental observations requires scalable and flexible ML architectures, one of which is the convolutional conditional neural process (ConvCNP). ConvCNPs can learn to condition on both gridded and off-the-grid context data to make uncertainty-aware predictions at target locations. However, the sparsity of real observations presents a challenge for data-hungry deep learning models like the ConvCNP. One potential solution is 'Sim2Real': pre-training on reanalysis and fine-tuning on observational data. We analyse Sim2Real with a ConvCNP trained to interpolate surface air temperature over Germany, using varying numbers of weather stations for fine-tuning. On held-out weather stations, Sim2Real training substantially outperforms the same model architecture trained only with reanalysis data or only with station data, showing that reanalysis data can serve as a stepping stone for learning from real observations. Sim2Real could thus enable more accurate models for weather prediction and climate monitoring.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)
- Antarctica (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
Forecasting Tropical Cyclones with Cascaded Diffusion Models
Nath, Pritthijit, Shukla, Pancham, Quilodrán-Casas, César
As cyclones become more intense due to climate change, the rise of AI-based modelling provides a more affordable and accessible approach compared to traditional methods based on mathematical models. This work leverages diffusion models to forecast cyclone trajectories and precipitation patterns by integrating satellite imaging, remote sensing, and atmospheric data, employing a cascaded approach that incorporates forecasting, super-resolution, and precipitation modelling, with training on a dataset of 51 cyclones from six major basins. Experiments demonstrate that the final forecasts from the cascaded models show accurate predictions up to a 36-hour rollout, with SSIM and PSNR values exceeding 0.5 and 20 dB, respectively, for all three tasks. This work also highlights the promising efficiency of AI methods such as diffusion models for high-performance needs, such as cyclone forecasting, while remaining computationally affordable, making them ideal for highly vulnerable regions with critical forecasting needs and financial limitations.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Indian Ocean (0.06)
- Pacific Ocean (0.06)
- (6 more...)