Goto

Collaborating Authors

 climateset



ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning

Neural Information Processing Systems

Climate models have been key for assessing the impact of climate change and simulating future climate scenarios. The machine learning (ML) community has taken an increased interest in supporting climate scientists' efforts on various tasks such as climate model emulation, downscaling, and prediction tasks. Many of those tasks have been addressed on datasets created with single climate models. However, both the climate science and ML communities have suggested that to address those tasks at scale, we need large, consistent, and ML-ready climate model datasets. Here, we introduce ClimateSet, a dataset containing the inputs and outputs of 36 climate models from the Input4MIPs and CMIP6 archives. In addition, we provide a modular dataset pipeline for retrieving and preprocessing additional climate models and scenarios.



Do machine learning climate models work in changing climate dynamics?

Navarro, Maria Conchita Agana, Li, Geng, Wolf, Theo, Pérez-Ortiz, María

arXiv.org Artificial Intelligence

Our baseline runs followed the ClimateSet single emulator specifications (Kaltenborn et al., 2023): Training Process: Each emulator is trained on data from a single climate model, predicting outputs for an entire sequence of monthly data for each year. Pre-Processing: The data has been pre-processed by ClimateSet to have a spatial resolution of approximately 250 km (144 x 96 longitude-latitude cells) and a temporal resolution of monthly data. The time series is divided into 1-year chunks, resulting in data with a shape of scenarios, years * months, variables, longitude, latitude . Input and Output Shapes: The input data has the shape batch, sequence length, num vars, lon, lat, where the sequence length is 12 (monthly data). The output has the shape batch, sequence length, 2, lon, lat, where the '2' corresponds to temperature (T AS) and precipitation (PR). Training Parameters: The models are trained for 50 epochs with an initial learning rate of 2e-4, using an exponential decay scheduler. For the non-frozen ClimaX models, training begins with a 5-epoch warm-up phase at 1e-8, followed by training at 5e-4. Loss: The latitude-longitude weighted mean squared error (LLMSE) as implemented in (Nguyen et al., 2023) is used.


ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning

Neural Information Processing Systems

Climate models have been key for assessing the impact of climate change and simulating future climate scenarios. The machine learning (ML) community has taken an increased interest in supporting climate scientists' efforts on various tasks such as climate model emulation, downscaling, and prediction tasks. Many of those tasks have been addressed on datasets created with single climate models. However, both the climate science and ML communities have suggested that to address those tasks at scale, we need large, consistent, and ML-ready climate model datasets. Here, we introduce ClimateSet, a dataset containing the inputs and outputs of 36 climate models from the Input4MIPs and CMIP6 archives.


ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning

Kaltenborn, Julia, Lange, Charlotte E. E., Ramesh, Venkatesh, Brouillard, Philippe, Gurwicz, Yaniv, Nagda, Chandni, Runge, Jakob, Nowack, Peer, Rolnick, David

arXiv.org Artificial Intelligence

Climate models have been key for assessing the impact of climate change and simulating future climate scenarios. The machine learning (ML) community has taken an increased interest in supporting climate scientists' efforts on various tasks such as climate model emulation, downscaling, and prediction tasks. Many of those tasks have been addressed on datasets created with single climate models. However, both the climate science and ML communities have suggested that to address those tasks at scale, we need large, consistent, and ML-ready climate model datasets. Here, we introduce ClimateSet, a dataset containing the inputs and outputs of 36 climate models from the Input4MIPs and CMIP6 archives. In addition, we provide a modular dataset pipeline for retrieving and preprocessing additional climate models and scenarios. We showcase the potential of our dataset by using it as a benchmark for ML-based climate model emulation. We gain new insights about the performance and generalization capabilities of the different ML models by analyzing their performance across different climate models. Furthermore, the dataset can be used to train an ML emulator on several climate models instead of just one. Such a "super emulator" can quickly project new climate change scenarios, complementing existing scenarios already provided to policymakers. We believe ClimateSet will create the basis needed for the ML community to tackle climate-related tasks at scale.