WeatherBench 2: A benchmark for the next generation of data-driven global weather models

Rasp, Stephan, Hoyer, Stephan, Merose, Alexander, Langmore, Ian, Battaglia, Peter, Russel, Tyler, Sanchez-Gonzalez, Alvaro, Yang, Vivian, Carver, Rob, Agrawal, Shreya, Chantry, Matthew, Bouallegue, Zied Ben, Dueben, Peter, Bromberg, Carla, Sisk, Jared, Barrington, Luke, Bell, Aaron, Sha, Fei

arXiv.org Artificial Intelligence 

WeatherBench 2 is an update to the global, medium-range (1-14 day) weather forecasting benchmark proposed by Rasp et al. (2020), designed with the aim to accelerate progress in data-driven weather modeling. WeatherBench 2 consists of an open-source evaluation framework, publicly available training, ground truth and baseline data as well as a continuously updated website with the latest metrics and state-of-the-art models: https://sites.research.google/weatherbench. This paper describes the design principles of the evaluation framework and presents results for current state-of-the-art physical and data-driven weather models. The metrics are based on established practices for evaluating weather forecasts at leading operational weather centers. We define a set of headline scores to provide an overview of model performance. In addition, we also discuss caveats in the current evaluation setup and challenges for the future of data-driven weather forecasting.