Goto

Collaborating Authors

 pangu-weather


FlowCast-ODE: Continuous Hourly Weather Forecasting with Dynamic Flow Matching and ODE Solver

He, Shuangshuang, Zhang, Yuanting, Liang, Hongli, Meng, Qingye, Yuan, Xingyuan, Wang, Shuo

arXiv.org Artificial Intelligence

Data-driven hourly weather forecasting models often face the challenge of error accumulation in long-term predictions. The problem is exacerbated by non-physical temporal discontinuities present in widely-used training datasets such as ECMWF Reanalysis v5 (ERA5), which stem from its 12-hour assimilation cycle. Such artifacts lead hourly autoregressive models to learn spurious dynamics and rapidly accumulate errors. To address this, we introduce FlowCast-ODE, a novel framework that treats atmospheric evolution as a continuous flow to ensure temporal coherence. Our method employs dynamic flow matching to learn the instantaneous velocity field from data and an ordinary differential equation (ODE) solver to generate smooth and temporally continuous hourly predictions. By pre-training on 6-hour intervals to sidestep data discontinuities and fine-tuning on hourly data, FlowCast-ODE produces seamless forecasts for up to 120 hours with a single lightweight model. It achieves competitive or superior skill on key meteorological variables compared to baseline models, preserves fine-grained spatial details, and demonstrates strong performance in forecasting extreme events, such as tropical cyclone tracks.


Numerical models outperform AI weather forecasts of record-breaking extremes

Zhang, Zhongwei, Fischer, Erich, Zscheischler, Jakob, Engelke, Sebastian

arXiv.org Artificial Intelligence

Artificial intelligence (AI)-based models are revolutionizing weather forecasting and have surpassed leading numerical weather prediction systems on various benchmark tasks. However, their ability to extrapolate and reliably forecast unprecedented extreme events remains unclear. Here, we show that for record-breaking weather extremes, the numerical model High RESolution forecast (HRES) from the European Centre for Medium-Range Weather Forecasts still consistently outperforms state-of-the-art AI models GraphCast, GraphCast operational, Pangu-Weather, Pangu-Weather operational, and Fuxi. We demonstrate that forecast errors in AI models are consistently larger for record-breaking heat, cold, and wind than in HRES across nearly all lead times. We further find that the examined AI models tend to underestimate both the frequency and intensity of record-breaking events, and they underpredict hot records and overestimate cold records with growing errors for larger record exceedance. Our findings underscore the current limitations of AI weather models in extrapolating beyond their training domain and in forecasting the potentially most impactful record-breaking weather events that are particularly frequent in a rapidly warming climate. Further rigorous verification and model development is needed before these models can be solely relied upon for high-stakes applications such as early warning systems and disaster management.


Probabilistic measures afford fair comparisons of AIWP and NWP model output

Gneiting, Tilmann, Biegert, Tobias, Kraus, Kristof, Walz, Eva-Maria, Jordan, Alexander I., Lerch, Sebastian

arXiv.org Machine Learning

We introduce a new measure for fair and meaningful comparisons of single-valued output from artificial intelligence based weather prediction (AIWP) and numerical weather prediction (NWP) models, called potential continuous ranked probability score (PC). In a nutshell, we subject the deterministic backbone of physics-based and data-driven models post hoc to the same statistical postprocessing technique, namely, isotonic distributional regression (IDR). Then we find PC as the mean continuous ranked probability score (CRPS) of the postprocessed probabilistic forecasts. The nonnegative PC measure quantifies potential predictive performance and is invariant under strictly increasing transformations of the model output. PC attains its most desirable value of zero if, and only if, the weather outcome Y is a fixed, non-decreasing function of the model output X. The PC measure is recorded in the unit of the outcome, has an upper bound of one half times the mean absolute difference between outcomes, and serves as a proxy for the mean CRPS of real-time, operational probabilistic products. When applied to WeatherBench 2 data, our approach demonstrates that the data-driven GraphCast model outperforms the leading, physics-based European Centre for Medium Range Weather Forecasts (ECMWF) high-resolution (HRES) model. Furthermore, the PC measure for the HRES model aligns exceptionally well with the mean CRPS of the operational ECMWF ensemble. Across application domains, our approach affords comparisons of single-valued forecasts in settings where the pre-specification of a loss function -- which is the usual, and principally superior, procedure in forecast contests, administrative, and benchmarks settings -- places competitors on unequal footings.


Utilizing Strategic Pre-training to Reduce Overfitting: Baguan -- A Pre-trained Weather Forecasting Model

Niu, Peisong, Ma, Ziqing, Zhou, Tian, Chen, Weiqi, Shen, Lefei, Jin, Rong, Sun, Liang

arXiv.org Artificial Intelligence

Weather forecasting has long posed a significant challenge for humanity. While recent AI-based models have surpassed traditional numerical weather prediction (NWP) methods in global forecasting tasks, overfitting remains a critical issue due to the limited availability of real-world weather data spanning only a few decades. Unlike fields like computer vision or natural language processing, where data abundance can mitigate overfitting, weather forecasting demands innovative strategies to address this challenge with existing data. In this paper, we explore pre-training methods for weather forecasting, finding that selecting an appropriately challenging pre-training task introduces locality bias, effectively mitigating overfitting and enhancing performance. We introduce Baguan, a novel data-driven model for medium-range weather forecasting, built on a Siamese Autoencoder pre-trained in a self-supervised manner and fine-tuned for different lead times. Experimental results show that Baguan outperforms traditional methods, delivering more accurate forecasts. Additionally, the pre-trained Baguan demonstrates robust overfitting control and excels in downstream tasks, such as subseasonal-to-seasonal (S2S) modeling and regional forecasting, after fine-tuning.


Testing the Limit of Atmospheric Predictability with a Machine Learning Weather Model

Vonich, P. Trent, Hakim, Gregory J.

arXiv.org Artificial Intelligence

Atmospheric predictability research has long held that the limit of skillful deterministic weather forecasts is about 14 days. We challenge this limit using GraphCast, a machine-learning weather model, by optimizing forecast initial conditions using gradient-based techniques for twice-daily forecasts spanning 2020. This approach yields an average error reduction of 86% at 10 days, with skill lasting beyond 30 days. Mean optimal initial-condition perturbations reveal large-scale, spatially coherent corrections to ERA5, primarily reflecting an intensification of the Hadley circulation. Forecasts using GraphCast-optimal initial conditions in the Pangu-Weather model achieve a 21% error reduction, peaking at 4 days, indicating that analysis corrections reflect a combination of both model bias and a reduction in analysis error. These results demonstrate that, given accurate initial conditions, skillful deterministic forecasts are consistently achievable far beyond two weeks, challenging long-standing assumptions about the limits of atmospheric predictability.


Improving Predictions of Convective Storm Wind Gusts through Statistical Post-Processing of Neural Weather Models

Leclerc, Antoine, Koch, Erwan, Feldmann, Monika, Nerini, Daniele, Beucler, Tom

arXiv.org Artificial Intelligence

Issuing timely severe weather warnings helps mitigate potentially disastrous consequences. Recent advancements in Neural Weather Models (NWMs) offer a computationally inexpensive and fast approach for forecasting atmospheric environments on a 0.25{\deg} global grid. For thunderstorms, these environments can be empirically post-processed to predict wind gust distributions at specific locations. With the Pangu-Weather NWM, we apply a hierarchy of statistical and deep learning post-processing methods to forecast hourly wind gusts up to three days ahead. To ensure statistical robustness, we constrain our probabilistic forecasts using generalised extreme-value distributions across five regions in Switzerland. Using a convolutional neural network to post-process the predicted atmospheric environment's spatial patterns yields the best results, outperforming direct forecasting approaches across lead times and wind gust speeds. Our results confirm the added value of NWMs for extreme wind forecasting, especially for designing more responsive early-warning systems.


DeepMedcast: A Deep Learning Method for Generating Intermediate Weather Forecasts among Multiple NWP Models

Kudo, Atsushi

arXiv.org Artificial Intelligence

In recent decades, numerical weather predictions (NWPs) and their post-processing have played a central role in issuing weather forecasts, warnings, and advisories [WMO, 2013, Vannitsem el al., 2021]. NWP centers around the world have developed and are operating a variety of NWP models for accurate weather predictions. For example, the European Centre for Medium-Range Weather Forecasts (ECMWF) operates the Integrated Forecasting System (IFS) and its ensemble prediction system [ECMWF, 2024]; the UK Met Office operates the Unified Model and the Met Office Global and Regional Ensemble Prediction System [Brown et al., 2012, Hagelin et al., 2017, Inverarity et al., 2023]. The National Centers for Environmental Prediction (NCEP) at the National Oceanic and Atmospheric Administration (NOAA) operates the Global Forecast System [NCEP, 2016], the High-Resolution Rapid Refresh [Dowell et al., 2022], and the Hurricane Weather Research and Forecasting model [Gopalakrishnan et al., 2011]. The Japan Meteorological Agency (JMA) operates three deterministic NWP models and two ensemble prediction systems for short-range to weekly forecasts: the Global Spectrum Model (GSM), the Meso-Scale Model (MSM), the Local Forecast Model, the Global Ensemble Prediction System, and the Mesoscale Ensemble Prediction System [JMA, 2024]. These models cover different areas with varying resolutions and processes. In addition to traditional physics-based NWP models, recent advancements in artificial intelligence (AI) have introduced new methods for producing weather predictions.


Robustness of AI-based weather forecasts in a changing climate

Rackow, Thomas, Koldunov, Nikolay, Lessig, Christian, Sandu, Irina, Alexe, Mihai, Chantry, Matthew, Clare, Mariana, Dramsch, Jesper, Pappenberger, Florian, Pedruzo-Bagazgoitia, Xabier, Tietsche, Steffen, Jung, Thomas

arXiv.org Artificial Intelligence

Data-driven machine learning models for weather forecasting have made transformational progress in the last 1-2 years, with state-of-the-art ones now outperforming the best physics-based models for a wide range of skill scores. Given the strong links between weather and climate modelling, this raises the question whether machine learning models could also revolutionize climate science, for example by informing mitigation and adaptation to climate change or to generate larger ensembles for more robust uncertainty estimates. Here, we show that current state-of-the-art machine learning models trained for weather forecasting in present-day climate produce skillful forecasts across different climate states corresponding to pre-industrial, present-day, and future 2.9K warmer climates. This indicates that the dynamics shaping the weather on short timescales may not differ fundamentally in a changing climate. It also demonstrates out-of-distribution generalization capabilities of the machine learning models that are a critical prerequisite for climate applications. Nonetheless, two of the models show a global-mean cold bias in the forecasts for the future warmer climate state, i.e. they drift towards the colder present-day climate they have been trained for. A similar result is obtained for the pre-industrial case where two out of three models show a warming. We discuss possible remedies for these biases and analyze their spatial distribution, revealing complex warming and cooling patterns that are partly related to missing ocean-sea ice and land surface information in the training data. Despite these current limitations, our results suggest that data-driven machine learning models will provide powerful tools for climate science and transform established approaches by complementing conventional physics-based models.


Comparing and Contrasting Deep Learning Weather Prediction Backbones on Navier-Stokes and Atmospheric Dynamics

Karlbauer, Matthias, Maddix, Danielle C., Ansari, Abdul Fatir, Han, Boran, Gupta, Gaurav, Wang, Yuyang, Stuart, Andrew, Mahoney, Michael W.

arXiv.org Artificial Intelligence

Remarkable progress in the development of Deep Learning Weather Prediction (DLWP) models positions them to become competitive with traditional numerical weather prediction (NWP) models. Indeed, a wide number of DLWP architectures -- based on various backbones, including U-Net, Transformer, Graph Neural Network (GNN), and Fourier Neural Operator (FNO) -- have demonstrated their potential at forecasting atmospheric states. However, due to differences in training protocols, forecast horizons, and data choices, it remains unclear which (if any) of these methods and architectures are most suitable for weather forecasting and for future model development. Here, we step back and provide a detailed empirical analysis, under controlled conditions, comparing and contrasting the most prominent DLWP models, along with their backbones. We accomplish this by predicting synthetic two-dimensional incompressible Navier-Stokes and real-world global weather dynamics. In terms of accuracy, memory consumption, and runtime, our results illustrate various tradeoffs. For example, on synthetic data, we observe favorable performance of FNO; and on the real-world WeatherBench dataset, our results demonstrate the suitability of ConvLSTM and SwinTransformer for short-to-mid-ranged forecasts. For long-ranged weather rollouts of up to 365 days, we observe superior stability and physical soundness in architectures that formulate a spherical data representation, i.e., GraphCast and Spherical FNO. In addition, we observe that all of these model backbones ``saturate,'' i.e., none of them exhibit so-called neural scaling, which highlights an important direction for future work on these and related models.


EWMoE: An effective model for global weather forecasting with mixture-of-experts

Gan, Lihao, Man, Xin, Zhang, Chenghong, Shao, Jie

arXiv.org Artificial Intelligence

Weather forecasting is the analysis of past and present weather observations, as well as the use of modern science and technology, to predict the state of the Earth atmosphere in the future. It is one of the most important applications of scientific computing and plays a crucial role in key sectors such as transportation, logistics, agriculture, and energy production [1]. Traditionally, atmospheric scientists have relied on Numerical Weather Prediction (NWP) methods [2, 3], which utilize mathematical models of the atmosphere and oceans to forecast the weather states based on current weather conditions. While modern meteorological forecasting systems have achieved satisfactory results using NWP methods, these methods largely rely on parametric numerical models, which can introduce errors in the parameterization [4] of complex, unresolved processes. Additionally, NWP methods face challenges in meeting the diverse needs of weather forecasting due to its high computational cost, the difficulty of solving nonlinear physical processes, and model deviations [5, 6]. To address the above issues of NWP models, researchers have turned their attention to data-driven weather forecasting based on deep learning methods. These methods run very quickly and can easily achieve a balance among model complexity, prediction resolution, and prediction accuracy [7-9]. Denby [10] first employed Convolutional Neural Network (CNN) for the classification of weather satellite images.