satmae
- North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
- North America > United States > Colorado (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- (2 more...)
- Energy (1.00)
- Transportation > Infrastructure & Services (0.92)
- Transportation > Ground (0.67)
SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is plentiful and the inherent temporal and multi-spectral structure provides avenues to further improve existing pre-training strategies. In this paper, we present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE). To leverage temporal information, we include a temporal embedding along with independently masking image patches across time. In addition, we demonstrate that encoding multi-spectral data as groups of bands with distinct spectral positional encodings is beneficial. Our approach yields strong improvements over previous state-of-the-art techniques, both in terms of supervised learning performance on benchmark datasets (up to $\uparrow$ 7%), and transfer learning performance on downstream remote sensing tasks, including land cover classification (up to $\uparrow$ 14%) and semantic segmentation.
- North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
- North America > United States > Colorado (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- (2 more...)
A Appendix
Co-located images of different timestamps, or sequences, are provided in fMoW . We crop surface reflectance images from the Sentinel-2 (ESA) satellite (courtesy of the U.S. Geological Survey), consisting of 90-day composites of images at the same locations as fMoW images (to reduce the For locations where all fMoW images are before the Sentinel-2 time range, we discard the location. Figure 5: Distribution of images and locations across the categories over the fMoW Sentinel training set. A.2 fMoW Sentinel We provide information about the fMoW-Sentinel dataset, collected using Sentinel-2 A.2.2 fMoW Sentinel Bands Channel Resolution Central wavelength Mean Standard deviation B1: Aerosols 60m 443nm 1370.192 Further details can be found here.
- North America > United States (0.88)
- Europe (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- South America > Brazil (0.04)
- Oceania > Australia (0.04)
- (2 more...)
- Energy (0.53)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Vision > Image Understanding (0.68)
SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is plentiful and the inherent temporal and multi-spectral structure provides avenues to further improve existing pre-training strategies. In this paper, we present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE). To leverage temporal information, we include a temporal embedding along with independently masking image patches across time. In addition, we demonstrate that encoding multi-spectral data as groups of bands with distinct spectral positional encodings is beneficial.
KidSat: satellite imagery to map childhood poverty dataset and benchmark
Sharma, Makkunda, Yang, Fan, Vo, Duy-Nhat, Suel, Esra, Mishra, Swapnil, Bhatt, Samir, Fiala, Oliver, Rudgard, William, Flaxman, Seth
Satellite imagery has emerged as an important tool to analyse demographic, health, and development indicators. While various deep learning models have been built for these tasks, each is specific to a particular problem, with few standard benchmarks available. We propose a new dataset pairing satellite imagery and high-quality survey data on child poverty to benchmark satellite feature representations. Our dataset consists of 33,608 images, each 10 km $\times$ 10 km, from 19 countries in Eastern and Southern Africa in the time period 1997-2022. As defined by UNICEF, multidimensional child poverty covers six dimensions and it can be calculated from the face-to-face Demographic and Health Surveys (DHS) Program . As part of the benchmark, we test spatial as well as temporal generalization, by testing on unseen locations, and on data after the training years. Using our dataset we benchmark multiple models, from low-level satellite imagery models such as MOSAIKS , to deep learning foundation models, which include both generic vision models such as Self-Distillation with no Labels (DINOv2) models and specific satellite imagery models such as SatMAE. We provide open source code for building the satellite dataset, obtaining ground truth data from DHS and running various models assessed in our work.
- North America > United States (1.00)
- Africa > Southern Africa (0.25)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- (5 more...)
A Appendix
A.1 Datasets fMoW RGB Functional Map of the World (fMoW) [17] is a dataset of high-resolution satellite image time series across the world, with a task of classification among 62 architecture categories such as airport, shipyard, and zoo. They are of different length, and around 60% of the samples have length larger than 2. Readers can refer to the fMoW paper [17] for statistics on the distribution of sequence lengths. We construct a temporal version of fMoW by randomly associating every single image with two images of the same location but of different timestamps if possible. We crop surface reflectance images from the Sentinel-2 (ESA) satellite (courtesy of the U.S. Geological Survey), consisting of 90-day composites of images at the same locations as fMoW images (to reduce the impacts of cloud coverage). At each fMoW datapoint location, we collect a time series of Sentinel-2 images, using the provided geo-coordinate bounding boxes. For locations where all fMoW images are before the Sentinel-2 time range, we discard the location.
- North America > United States (0.88)
- Europe (0.04)
SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is plentiful and the inherent temporal and multi-spectral structure provides avenues to further improve existing pre-training strategies. In this paper, we present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE). To leverage temporal information, we include a temporal embedding along with independently masking image patches across time. In addition, we demonstrate that encoding multi-spectral data as groups of bands with distinct spectral positional encodings is beneficial. Our approach yields strong improvements over previous state-of-the-art techniques, both in terms of supervised learning performance on benchmark datasets (up to " 7%), and transfer learning performance on downstream remote sensing tasks, including land cover classification (up to " 14%) and semantic segmentation.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- South America > Brazil (0.04)
- Oceania > Australia (0.04)
- (2 more...)