Supplementary Material and Datasheet for the WorldStrat Dataset
–Neural Information Processing Systems
Does this timeframe match the creation timeframe of the data associated with the instances (e.g., recent crawl of old news articles)? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LCCS comprises of 23 classes and 14 sub-classes. The dataset, along with its machine-readable metadata, is hosted on CERN-backed Zenodo data repository: https://zenodo.org/record/6810792 Its long-term maintenance is discussed in the Datasheet. This includes reproducible code for the Benchmarks of Section 4 of [Cornebise et al., 2022a], following the ML Reproducibility Checklist [Pineau et al., 2021a,b]. The project also has its own website available at https://worldstrat.github.io/, The authors hereby state that they bear all responsibility in case of violation of rights, etc., and confirm that the data license is as follows: The low-resolution imagery, labels, metadata, and pretrained models are released under Creative Commons with Attribution 4.0 International (CC BY 4.0) The mean of the cloud coverage over the Sentinel 2 product areas is 7.98 %, with a standard deviation of 14.22. The quantiles are: 0.025: 0.00% 0.25: 0.00% 0.5: 0.66% 0.75: 10.05% 0.975: 49.95% It is important to note that this cloud cover percentage, as mentioned in the article and datasheet, is calculated on the entire product size of the provider, which varies in size but is much larger than the 2.5km we target. This means that even an image with a large cloud cover percentage can be cloud free, and in extreme cases (though unlikely), vice-versa. Also there are indeed considerable difference across sampled regions and land cover types. A simple example would be rainforests and non-desert equatorial regions. Using a strict no-cloud policy would make sampling enough low-resolution images either impossible or would make the temporal difference extremely large (up to 7 years for some AOIs). With that in mind, we strived to keep the cloud coverage as low as possible, ideally under 5%, while maintaining the temporal difference as small as possible.
Neural Information Processing Systems
Nov-15-2025, 16:52:35 GMT
- Country:
- Asia > Middle East
- Syria (0.04)
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- North America > United States
- Oregon (0.04)
- Asia > Middle East
- Industry:
- Government (1.00)
- Information Technology (0.67)
- Law (1.00)
- Technology: