Newman, Dava
GEO-Bench: Toward Foundation Models for Earth Monitoring
Lacoste, Alexandre, Lehmann, Nils, Rodriguez, Pau, Sherwin, Evan David, Kerner, Hannah, Lütjens, Björn, Irvin, Jeremy Andrew, Dao, David, Alemohammad, Hamed, Drouin, Alexandre, Gunturkun, Mehmet, Huang, Gabriel, Vazquez, David, Newman, Dava, Bengio, Yoshua, Ermon, Stefano, Zhu, Xiao Xiang
Recent progress in self-supervision has shown that pre-training large neural networks on vast amounts of unsupervised data can lead to substantial increases in generalization to downstream tasks. Such models, recently coined foundation models, have been transformational to the field of natural language processing. Variants have also been proposed for image data, but their applicability to remote sensing tasks is limited. To stimulate the development of foundation models for Earth monitoring, we propose a benchmark comprised of six classification and six segmentation tasks, which were carefully curated and adapted to be both relevant to the field and well-suited for model evaluation. We accompany this benchmark with a robust methodology for evaluating models and reporting aggregated results to enable a reliable assessment of progress. Finally, we report results for 20 baselines to gain information about the performance of existing models. We believe that this benchmark will be a driver of progress across a variety of Earth monitoring tasks.
Physically-Consistent Generative Adversarial Networks for Coastal Flood Visualization
Lütjens, Björn, Leshchinskiy, Brandon, Requena-Mesa, Christian, Chishtie, Farrukh, Díaz-Rodríguez, Natalia, Boulais, Océane, Sankaranarayanan, Aruna, Masson-Forsythe, Margaux, Piña, Aaron, Gal, Yarin, Raïssi, Chedy, Lavin, Alexander, Newman, Dava
As climate change increases the intensity of natural disasters, society needs better tools for adaptation. Floods, for example, are the most frequent natural disaster, and better tools for flood risk communication could increase the support for flood-resilient infrastructure development. Our work aims to enable more visual communication of large-scale climate impacts via visualizing the output of coastal flood models as satellite imagery. We propose the first deep learning pipeline to ensure physical-consistency in synthetic visual satellite imagery. We advanced a state-of-the-art GAN called pix2pixHD, such that it produces imagery that is physically-consistent with the output of an expert-validated storm surge model (NOAA SLOSH). By evaluating the imagery relative to physics-based flood maps, we find that our proposed framework outperforms baseline models in both physical-consistency and photorealism. We envision our work to be the first step towards a global visualization of how the climate challenge will shape our landscape. Continuing on this path, we show that the proposed pipeline generalizes to visualize reforestation. We also publish a dataset of over 25k labelled image-triplets to study image-to-image translation in Earth observation.
Technology Readiness Levels for Machine Learning Systems
Lavin, Alexander, Gilligan-Lee, Ciarán M., Visnjic, Alessya, Ganju, Siddha, Newman, Dava, Ganguly, Sujoy, Lange, Danny, Baydin, Atılım Güneş, Sharma, Amit, Gibson, Adam, Gal, Yarin, Xing, Eric P., Mattmann, Chris, Parr, James
The development and deployment of machine learning (ML) systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. The lack of diligence can lead to technical debt, scope creep and misaligned objectives, model misuse and failures, and expensive consequences. Engineering systems, on the other hand, follow well-defined processes and testing standards to streamline development for high-quality, reliable results. The extreme is spacecraft systems, where mission critical measures and robustness are ingrained in the development process. Drawing on experience in both spacecraft engineering and ML (from research through product across domain areas), we have developed a proven systems engineering approach for machine learning development and deployment. Our "Machine Learning Technology Readiness Levels" (MLTRL) framework defines a principled process to ensure robust, reliable, and responsible systems while being streamlined for ML workflows, including key distinctions from traditional software engineering. Even more, MLTRL defines a lingua franca for people across teams and organizations to work collaboratively on artificial intelligence and machine learning technologies. Here we describe the framework and elucidate it with several real world use-cases of developing ML methods from basic research through productization and deployment, in areas such as medical diagnostics, consumer computer vision, satellite imagery, and particle physics.