Many aspects of geosciences pose novel problems for intelligent systems research. Geoscience data is challenging because it tends to be uncertain, intermittent, sparse, multiresolution, and multi-scale. Geosciences processes and objects often have amorphous spatiotemporal boundaries. The lack of ground truth makes model evaluation, testing, and comparison difficult. Overcoming these challenges requires breakthroughs that would significantly transform intelligent systems, while greatly benefitting the geosciences in turn.
The vast and rapidly increasing supply of new data in the Earth sciences creates many opportunities to gain scientific insights and to answer important questions. Data analysis has always been an integral component of research and education in the Earth sciences, but mainstream Earth scientists may not yet be fully aware of many recently developed methods in computer science, statistics, and math. The fastest way to put these new methods of data analysis to use in the Earth sciences is for Earth scientists and data scientists to collaborate. However, those collaborations can be difficult to initiate and even more difficult to maintain and to guide to successful outcomes. Here we break down the collaboration process into steps and provide some guidelines that we have found useful for efficient collaboration between Earth scientists and data scientists.
Data-driven approaches, most prominently deep learning, have become powerful tools for prediction in many domains. A natural question to ask is whether data-driven methods could also be used for numerical weather prediction. First studies show promise but the lack of a common dataset and evaluation metrics make inter-comparison between studies difficult. Here we present a benchmark dataset for data-driven medium-range weather forecasting, a topic of high scientific interest for atmospheric and computer scientists alike. We provide data derived from the ERA5 archive that has been processed to facilitate the use in machine learning models. We propose a simple and clear evaluation metric which will enable a direct comparison between different methods. Further, we provide baseline scores from simple linear regression techniques, deep learning models as well as purely physical forecasting models. All data is publicly available and the companion code is reproducible with tutorials for getting started. We hope that this dataset will accelerate research in data-driven weather forecasting.
Neural networks have become increasingly prevalent within the geosciences for applications ranging from numerical model parameterizations to the prediction of extreme weather. A common limitation of neural networks has been the lack of methods to interpret what the networks learn and how they make decisions. As such, neural networks have typically been used within the geosciences to accurately identify a desired output given a set of inputs, with the interpretation of what the network learns being used - if used at all - as a secondary metric to ensure the network is making the right decision for the right reason. Network interpretation techniques have become more advanced in recent years, however, and we therefore propose that the ultimate objective of using a neural network can also be the interpretation of what the network has learned rather than the output itself. We show that the interpretation of a neural network can enable the discovery of scientifically meaningful connections within geoscientific data. By training neural networks to use one or more components of the earth system to identify another, interpretation methods can be used to gain scientific insights into how and why the two components are related. In particular, we use two methods for neural network interpretation. These methods project the decision pathways of a network back onto the original input dimensions, and are called "optimal input" and layerwise relevance propagation (LRP). We then show how these interpretation techniques can be used to reliably infer scientifically meaningful information from neural networks by applying them to common climate patterns. These results suggest that combining interpretable neural networks with novel scientific hypotheses will open the door to many new avenues in neural network-related geoscience research.
In this manuscript, we provide a structured and comprehensive overview of techniques to integrate machine learning with physics-based modeling. First, we provide a summary of application areas for which these approaches have been applied. Then, we describe classes of methodologies used to construct physics-guided machine learning models and hybrid physics-machine learning frameworks from a machine learning standpoint. With this foundation, we then provide a systematic organization of these existing techniques and discuss ideas for future research.