fourcastnet
Forecasting Fails: Unveiling Evasion Attacks in Weather Prediction Models
Arif, Huzaifa, Chen, Pin-Yu, Gittens, Alex, Diffenderfer, James, Kailkhura, Bhavya
With the increasing reliance on AI models for weather forecasting, it is imperative to evaluate their vulnerability to adversarial perturbations. This work introduces Weather Adaptive Adversarial Perturbation Optimization (W AAPO), a novel framework for generating targeted adversarial perturbations that are both effective in manipulating forecasts and stealthy to avoid detection. W AAPO achieves this by incorporating constraints for channel sparsity, spatial localization, and smoothness, ensuring that perturbations remain physically realistic and imperceptible. Using the ERA5 dataset and FourCastNet (Pathak et al. 2022), we demonstrate W AAPO's ability to generate adversarial trajectories that align closely with predefined targets, even under constrained conditions. Our experiments highlight critical vulnerabilities in AI-driven forecasting models, where small perturbations to initial conditions can result in significant deviations in predicted weather patterns. These findings underscore the need for robust safeguards to protect against adversarial exploitation in operational forecasting systems. The code for W AAPO is available at: https://github.com/Huzaifa-Arif/W
- South America (0.04)
- Asia > China (0.04)
- North America > United States > New York > Rensselaer County > Troy (0.04)
- (4 more...)
- Information Technology > Security & Privacy (1.00)
- Energy (1.00)
- Government > Military (0.71)
- Government > Regional Government > North America Government > United States Government (0.47)
Energy Consumption in Parallel Neural Network Training
Huber, Philipp, Li, David, Muriedas, Juan Pedro Gutiérrez Hermosillo, Kieckhefen, Deifilia, Götz, Markus, Streit, Achim, Debus, Charlotte
The increasing demand for computational resources of training neural networks leads to a concerning growth in energy consumption. While parallelization has enabled upscaling model and dataset sizes and accelerated training, its impact on energy consumption is often overlooked. To close this research gap, we conducted scaling experiments for data-parallel training of two models, ResNet50 and FourCastNet, and evaluated the impact of parallelization parameters, i.e., GPU count, global batch size, and local batch size, on predictive performance, training time, and energy consumption. We show that energy consumption scales approximately linearly with the consumed resources, i.e., GPU hours; however, the respective scaling factor differs substantially between distinct model trainings and hardware, and is systematically influenced by the number of samples and gradient updates per GPU hour. Our results shed light on the complex interplay of scaling up neural network training and can inform future developments towards more sustainable AI research.
Weakly-Constrained 4D Var for Downscaling with Uncertainty using Data-Driven Surrogate Models
Dinenis, Philip, Rao, Vishwas, Anitescu, Mihai
Dynamic downscaling typically involves using numerical weather prediction (NWP) solvers to refine coarse data to higher spatial resolutions. Data-driven models such as FourCastNet have emerged as a promising alternative to the traditional NWP models for forecasting. Once these models are trained, they are capable of delivering forecasts in a few seconds, thousands of times faster compared to classical NWP models. However, as the lead times, and, therefore, their forecast window, increase, these models show instability in that they tend to diverge from reality. In this paper, we propose to use data assimilation approaches to stabilize them when used for downscaling tasks. Data assimilation uses information from three different sources, namely an imperfect computational model based on partial differential equations (PDE), from noisy observations, and from an uncertainty-reflecting prior. In this work, when carrying out dynamic downscaling, we replace the computationally expensive PDE-based NWP models with FourCastNet in a ``weak-constrained 4DVar framework" that accounts for the implied model errors. We demonstrate the efficacy of this approach for a hurricane-tracking problem; moreover, the 4DVar framework naturally allows the expression and quantification of uncertainty. We demonstrate, using ERA5 data, that our approach performs better than the ensemble Kalman filter (EnKF) and the unstabilized FourCastNet model, both in terms of forecast accuracy and forecast uncertainty.
- North America > United States (0.67)
- Europe (0.46)
- Africa (0.14)
- Energy > Oil & Gas > Upstream (0.67)
- Government > Regional Government (0.46)
Modulated Adaptive Fourier Neural Operators for Temporal Interpolation of Weather Forecasts
Leinonen, Jussi, Bonev, Boris, Kurth, Thorsten, Cohen, Yair
Weather and climate data are often available at limited temporal resolution, either due to storage limitations, or in the case of weather forecast models based on deep learning, their inherently long time steps. The coarse temporal resolution makes it difficult to capture rapidly evolving weather events. To address this limitation, we introduce an interpolation model that reconstructs the atmospheric state between two points in time for which the state is known. The model makes use of a novel network layer that modifies the adaptive Fourier neural operator (AFNO), which has been previously used in weather prediction and other applications of machine learning to physics problems. The modulated AFNO (ModAFNO) layer takes an embedding, here computed from the interpolation target time, as an additional input and applies a learned shift-scale operation inside the AFNO layers to adapt them to the target time. Thus, one model can be used to produce all intermediate time steps. Trained to interpolate between two time steps 6 h apart, the ModAFNO-based interpolation model produces 1 h resolution intermediate time steps that are visually nearly indistinguishable from the actual corresponding 1 h resolution data. The model reduces the RMSE loss of reconstructing the intermediate steps by approximately 50% compared to linear interpolation. We also demonstrate its ability to reproduce the statistics of extreme weather events such as hurricanes and heat waves better than 6 h resolution data. The ModAFNO layer is generic and is expected to be applicable to other problems, including weather forecasting with tunable lead time.
- Europe (0.47)
- North America > United States > California (0.29)
Prithvi WxC: Foundation Model for Weather and Climate
Schmude, Johannes, Roy, Sujit, Trojak, Will, Jakubik, Johannes, Civitarese, Daniel Salles, Singh, Shraddha, Kuehnert, Julian, Ankur, Kumar, Gupta, Aman, Phillips, Christopher E, Kienzler, Romeo, Szwarcman, Daniela, Gaur, Vishal, Shinde, Rajat, Lal, Rohit, Da Silva, Arlindo, Diaz, Jorge Luis Guevara, Jones, Anne, Pfreundschuh, Simon, Lin, Amy, Sheshadri, Aditi, Nair, Udaysankar, Anantharaj, Valentine, Hamann, Hendrik, Watson, Campbell, Maskey, Manil, Lee, Tsengdar J, Moreno, Juan Bernabe, Ramachandran, Rahul
Triggered by the realization that AI emulators can rival the performance of traditional numerical weather prediction models running on HPC systems, there is now an increasing number of large AI models that address use cases such as forecasting, downscaling, or nowcasting. While the parallel developments in the AI literature focus on foundation models -- models that can be effectively tuned to address multiple, different use cases -- the developments on the weather and climate side largely focus on single-use cases with particular emphasis on mid-range forecasting. We close this gap by introducing Prithvi WxC, a 2.3 billion parameter foundation model developed using 160 variables from the Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). Prithvi WxC employs an encoder-decoder-based architecture, incorporating concepts from various recent transformer models to effectively capture both regional and global dependencies in the input data. The model has been designed to accommodate large token counts to model weather phenomena in different topologies at fine resolutions. Furthermore, it is trained with a mixed objective that combines the paradigms of masked reconstruction with forecasting. We test the model on a set of challenging downstream tasks namely: Autoregressive rollout forecasting, Downscaling, Gravity wave flux parameterization, and Extreme events estimation. The pretrained model with 2.3 billion parameters, along with the associated fine-tuning workflows, has been publicly released as an open-source contribution via Hugging Face.
- Southern Ocean (0.04)
- North America > United States > Louisiana (0.04)
- North America > United States > Alabama > Madison County > Huntsville (0.04)
- (10 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Energy (0.93)
- Information Technology > Modeling & Simulation (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Interview with AAAI Fellow Anima Anandkumar: Neural Operators for science and engineering problems
Each year the Association for the Advancement of Artificial Intelligence (AAAI) recognizes a group of individuals who have made significant, sustained contributions to the field of artificial intelligence by appointing them as Fellows. We've been talking to some of the 2024 AAAI Fellows to find out more about their research. In this interview, we meet Anima Anandkumar and find out about her work on Neural Operators, of which she is the inventor. Neural Operators are able to learn complex physical phenomena that occur at multiple resolutions while standard neural networks are unable to do so. Standard neural networks use a fixed number of pixels or resolution to learn a phenomenon, while neural operators represent data as continuous functions.
- Information Technology (0.51)
- Health & Medicine (0.37)
MetMamba: Regional Weather Forecasting with Spatial-Temporal Mamba Model
Qin, Haoyu, Chen, Yungang, Jiang, Qianchuan, Sun, Pengchao, Ye, Xiancai, Lin, Chao
Weather forecasting has attracted increasing attention as climate change brought more heat waves, tropical cyclones, heavy rainfalls and other extreme weather events, affecting millions of people worldwide [1] [2]. Researchers have devised sophisticated numerical schemes and equations to capture complex weather dynamics to improve forecast accuracy[3]. However, its immense computational complexity often necessitates the use of large scale compute clusters, incurring notable energy cost and long processing time, making ensemble forecasts, critical for predicting such events[4], expensive or infeasible. Trained on European Centre for Medium-range Weather Forecast (ECMWF)'s reanalysis product ECMWF Reanalysis v5 (ERA5) dataset, deep learning based weather prediction (DLWP) models have shown promising performance, with FourCastNet[5] being the first data-driven model that directly competes with ECMWF's Integrate Forecasting System (IFS), followed by a series of other models[6][7][8][9] that outperforms IFS in various metrics [10], these models demonstrates excellent ability in capturing global weather trends, with a fraction of the computation requirement.
- Asia > China > Beijing > Beijing (0.06)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Comparing and Contrasting Deep Learning Weather Prediction Backbones on Navier-Stokes and Atmospheric Dynamics
Karlbauer, Matthias, Maddix, Danielle C., Ansari, Abdul Fatir, Han, Boran, Gupta, Gaurav, Wang, Yuyang, Stuart, Andrew, Mahoney, Michael W.
Remarkable progress in the development of Deep Learning Weather Prediction (DLWP) models positions them to become competitive with traditional numerical weather prediction (NWP) models. Indeed, a wide number of DLWP architectures -- based on various backbones, including U-Net, Transformer, Graph Neural Network (GNN), and Fourier Neural Operator (FNO) -- have demonstrated their potential at forecasting atmospheric states. However, due to differences in training protocols, forecast horizons, and data choices, it remains unclear which (if any) of these methods and architectures are most suitable for weather forecasting and for future model development. Here, we step back and provide a detailed empirical analysis, under controlled conditions, comparing and contrasting the most prominent DLWP models, along with their backbones. We accomplish this by predicting synthetic two-dimensional incompressible Navier-Stokes and real-world global weather dynamics. In terms of accuracy, memory consumption, and runtime, our results illustrate various tradeoffs. For example, on synthetic data, we observe favorable performance of FNO; and on the real-world WeatherBench dataset, our results demonstrate the suitability of ConvLSTM and SwinTransformer for short-to-mid-ranged forecasts. For long-ranged weather rollouts of up to 365 days, we observe superior stability and physical soundness in architectures that formulate a spherical data representation, i.e., GraphCast and Spherical FNO. In addition, we observe that all of these model backbones ``saturate,'' i.e., none of them exhibit so-called neural scaling, which highlights an important direction for future work on these and related models.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- (3 more...)
Long-Term Prediction Accuracy Improvement of Data-Driven Medium-Range Global Weather Forecast
Hu, Yifan, Yin, Fukang, Zhang, Weimin, Ren, Kaijun, Song, Junqiang, Deng, Kefeng, Zhang, Di
Long-term stability stands as a crucial requirement in data-driven medium-range global weather forecasting. Spectral bias is recognized as the primary contributor to instabilities, as data-driven methods difficult to learn small-scale dynamics. In this paper, we reveal that the universal mechanism for these instabilities is not only related to spectral bias but also to distortions brought by processing spherical data using conventional convolution. These distortions lead to a rapid amplification of errors over successive long-term iterations, resulting in a significant decline in forecast accuracy. To address this issue, a universal neural operator called the Spherical Harmonic Neural Operator (SHNO) is introduced to improve long-term iterative forecasts. SHNO uses the spherical harmonic basis to mitigate distortions for spherical data and uses gated residual spectral attention (GRSA) to correct spectral bias caused by spurious correlations across different scales. The effectiveness and merit of the proposed method have been validated through its application for spherical Shallow Water Equations (SWEs) and medium-range global weather forecasting. Our findings highlight the benefits and potential of SHNO to improve the accuracy of long-term prediction.
Data Assimilation with Machine Learning Surrogate Models: A Case Study with FourCastNet
Adrian, Melissa, Sanz-Alonso, Daniel, Willett, Rebecca
Modern data-driven surrogate models for weather forecasting provide accurate short-term predictions but inaccurate and nonphysical long-term forecasts. This paper investigates online weather prediction using machine learning surrogates supplemented with partial and noisy observations. We empirically demonstrate and theoretically justify that, despite the long-time instability of the surrogates and the sparsity of the observations, filtering estimates can remain accurate in the long-time horizon. As a case study, we integrate FourCastNet, a state-of-the-art weather surrogate model, within a variational data assimilation framework using partial, noisy ERA5 data. Our results show that filtering estimates remain accurate over a year-long assimilation window and provide effective initial conditions for forecasting tasks, including extreme event prediction.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Oceania > Guam (0.04)
- (4 more...)
- Energy (0.93)
- Government > Regional Government > North America Government > United States Government (0.68)