Knowledge-guided Machine Learning: Current Trends and Future Prospects

Karpatne, Anuj, Jia, Xiaowei, Kumar, Vipin

arXiv.org Artificial Intelligence 

This is especially true in environmental sciences that are rapidly transitioning from being data-poor to data-rich, e.g., with the ever-increasing volumes of environmental data being collected by Earth observing satellites, in-situ sensors, and those generated by model simulations (e.g., climate model runs [113]). Similar to how recent developments in ML has transformed how we interact with the information on the Internet, it is befitting to ask how ML advances can enable Earth system scientists to transform a fundamental goal in science, which is to build better models of physical, biological, and environmental systems. The conventional approach for modeling relationships between input drivers and response variables is to use process-based models rooted in scientific equations. Despite their ability to leverage the mechanistic understanding of scientific phenomena, process-based models suffer from several shortcomings limiting their adoption in complex real-world settings, e.g., due to imperfections in model formulations (or modeling bias), incorrect choices of parameter values in equations, and high computational costs in running high-fidelity simulations. In response to these challenges, ML methods offer a promising alternative to capture statistical relationships between inputs and outputs directly from data. However, "black-box" ML models, that solely rely on the supervision contained in data, show limited generalizability in scientific problems, especially when applied to out-of-distribution data. One of the reasons for this lack of generalizability is the limited scale of data in scientific disciplines in contrast to mainstream applications of AI and ML where large-scale datasets in computer vision and natural language modeling have been instrumental in the success of state-of-the-art AI/ML models. Another fundamental deficiency in black-box ML models is their tendency to produce results that are inconsistent with existing scientific theories and their inability to provide a mechanistic understanding of discovered patterns and relationships from data, limiting their usefulness in science.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found