ecoregion
How Does the Spatial Distribution of Pre-training Data Affect Geospatial Foundation Models?
Purohit, Mirali, Muhawenayo, Gedeon, Rolf, Esther, Kerner, Hannah
Foundation models have made rapid advances in many domains including Earth observation, where Geospatial Foundation Models (GFMs) can help address global challenges such as climate change, agriculture, and disaster response. Previous work on GFMs focused on tailoring model architecture and pre-text tasks, and did not investigate the impact of pre-training data selection on model performance. However, recent works from other domains show that the pre-training data distribution is an important factor influencing the performance of the foundation models. With this motivation, our research explores how the geographic distribution of pre-training data affects the performance of GFMs. We evaluated several pre-training data distributions by sampling different compositions from a global data pool. Our experiments with two GFMs on downstream tasks indicate that balanced and globally representative data compositions often outperform region-specific sampling, highlighting the importance of diversity and global coverage in pre-training data. Our results suggest that the most appropriate data sampling technique may depend on the specific GFM architecture. These findings will support the development of robust GFMs by incorporating quality pre-training data distributions, ultimately improving machine learning solutions for Earth observation.
- South America (0.04)
- Oceania (0.04)
- Europe (0.04)
- (7 more...)
MiTREE: Multi-input Transformer Ecoregion Encoder for Species Distribution Modelling
Climate change poses an extreme threat to biodiversity, making it imperative to efficiently model the geographical range of different species. The availability of large-scale remote sensing images and environmental data has facilitated the use of machine learning in Species Distribution Models (SDMs), which aim to predict the presence of a species at any given location. Traditional SDMs, reliant on expert observation, are labor-intensive, but advancements in remote sensing and citizen science data have facilitated machine learning approaches to SDM development. However, these models often struggle with leveraging spatial relationships between different inputs -- for instance, learning how climate data should inform the data present in satellite imagery -- without upsampling or distorting the original inputs. Additionally, location information and ecological characteristics at a location play a crucial role in predicting species distribution models, but these aspects have not yet been incorporated into state-of-the-art approaches. In this work, we introduce MiTREE: a multi-input Vision-Transformer-based model with an ecoregion encoder. MiTREE computes spatial cross-modal relationships without upsampling as well as integrates location and ecological context. We evaluate our model on the SatBird Summer and Winter datasets, the goal of which is to predict bird species encounter rates, and we find that our approach improves upon state-of-the-art baselines.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- Oceania > Australia > New South Wales (0.04)
- North America > United States > Colorado (0.04)
- Asia > Middle East > Jordan (0.04)
A critical appraisal of water table depth estimation: Challenges and opportunities within machine learning
Janssen, Joseph, Tootchi, Ardalan, Ameli, Ali A.
Fine-resolution spatial patterns of water table depth (WTD) play a crucial role in shaping ecological resilience, hydrological connectivity, and anthropocentric objectives. Generally, a large-scale (e.g., continental or global) spatial map of static WTD can be simulated using either physically-based (PB) or machine learning-based (ML) models. We construct three fine-resolution (500 m) ML simulations of WTD, using the XGBoost algorithm and more than 20 million real and proxy observations of WTD, across the United States and Canada. The three ML models were constrained using known physical relations between WTD's drivers and WTD and were trained by sequentially adding real and proxy observations of WTD. We interpret the black box of our physically constrained ML models and compare it against available literature in groundwater hydrology. Through an extensive (pixel-by-pixel) evaluation, we demonstrate that our models can more accurately predict unseen real and proxy observations of WTD across most of North America's ecoregions compared to three available PB simulations of WTD. However, we still argue that large-scale WTD estimation is far from being a solved problem. We reason that due to biased observational data mainly collected from low-elevation floodplains, the misspecification of equations within physically-based models, and the over-flexibility of machine learning models, verifiably accurate simulations of WTD do not yet exist. Ultimately, we thoroughly discuss future directions that may help hydrogeologists decide how to proceed with WTD estimations, with a particular focus on the application of machine learning and the use of proxy satellite data.
- North America > United States (1.00)
- North America > Canada (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Energy > Oil & Gas > Upstream (1.00)
Machine learning predicts biodiversity and resilience in the 'coral triangle'
Coral reef conservation is a steppingstone to protect marine biodiversity and life in the ocean as we know it. The health of coral also has huge societal implications: reef ecosystems provide sustenance and livelihoods for millions of people around the world. Conserving biodiversity in reef areas is both a social issue and a marine biodiversity priority. In the face of climate change, Annalisa Bracco, professor in the School of Earth and Atmospheric Sciences at Georgia Institute of Technology, and Lyuba Novi, a postdoctoral researcher, offer a new methodology that could revolutionize how conservationists monitor coral. The researchers applied machine learning tools to study how climate impacts connectivity and biodiversity in the Pacific Ocean's Coral Triangle--the most diverse and biologically complex marine ecosystem on the planet.
The data synergy effects of time-series deep learning models in hydrology
Fang, Kuai, Kifer, Daniel, Lawson, Kathryn, Feng, Dapeng, Shen, Chaopeng
When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice to regionalize - to divide a large spatial domain into multiple regions and study each region separately - instead of fitting a single model on the entire data (also known as unification). Traditional wisdom in these fields suggests that models built for each region separately will have higher performance because of homogeneity within each region. However, by partitioning the training data, each model has access to fewer data points and cannot learn from commonalities between regions. Here, through two hydrologic examples (soil moisture and streamflow), we argue that unification can often significantly outperform regionalization in the era of big data and deep learning (DL). Common DL architectures, even without bespoke customization, can automatically build models that benefit from regional commonality while accurately learning region-specific differences. We highlight an effect we call data synergy, where the results of the DL models improved when data were pooled together from characteristically different regions. In fact, the performance of the DL models benefited from more diverse rather than more homogeneous training data. We hypothesize that DL models automatically adjust their internal representations to identify commonalities while also providing sufficient discriminatory information to the model. The results here advocate for pooling together larger datasets, and suggest the academic community should place greater emphasis on data sharing and compilation.
- North America > United States > Virginia > Fairfax County > Reston (0.04)
- North America > United States > Pennsylvania > Centre County > University Park (0.04)
- North America > United States > Iowa (0.04)
- (2 more...)
- Research Report > Experimental Study (0.94)
- Research Report > New Finding (0.68)