Goto

Collaborating Authors

 Oehmcke, Stefan


Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment

arXiv.org Artificial Intelligence

As low-quality housing and in particular certain roof characteristics are associated with an increased risk of malaria, classification of roof types based on remote sensing imagery can support the assessment of malaria risk and thereby help prevent the disease. To support research in this area, we release the Nacala-Roof-Material dataset, which contains high-resolution drone images from Mozambique with corresponding labels delineating houses and specifying their roof types. The dataset defines a multi-task computer vision problem, comprising object detection, classification, and segmentation. In addition, we benchmarked various state-of-the-art approaches on the dataset. Canonical U-Nets, YOLOv8, and a custom decoder on pretrained DINOv2 served as baselines. We show that each of the methods has its advantages but none is superior on all tasks, which highlights the potential of our dataset for future research in multi-task learning. While the tasks are closely related, accurate segmentation of objects does not necessarily imply accurate instance separation, and vice versa. We address this general issue by introducing a variant of the deep ordinal watershed (DOW) approach that additionally separates the interior of objects, allowing for improved object delineation and separation. We show that our DOW variant is a generic approach that improves the performance of both U-Net and DINOv2 backbones, leading to a better trade-off between semantic segmentation and instance segmentation.


MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

arXiv.org Artificial Intelligence

The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is shown on several downstream tasks including image classification and semantic segmentation. We find that multi-modal pretraining notably improves the linear probing performance, e.g. 4pp on BigEarthNet and 16pp on So2Sat, compared to pretraining on optical satellite images only. We show that this also leads to better label and parameter efficiency which are crucial aspects in global scale applications.


Deep Learning Based 3D Point Cloud Regression for Estimating Forest Biomass

arXiv.org Artificial Intelligence

Robust quantification of forest carbon stocks and their dynamics is important for climate change mitigation and adaptation strategies [FAO and UNEP, 2020]. The Paris Agreement [United Nations / Framework Convention on Climate Change, 2015] and the IPCC [Shukla et al., 2019] acknowledge that climate change mitigation goals cannot be achieved without a substantial contribution from forests. Spatial details in the carbon budget of forests are necessary to encourage transformational actions towards a sustainable forest sector [Harris et al., 2021, 2012]. Currently, many countries do not have nationally specific forest carbon accumulation rates but rather rely on default rates from the IPCC 2018 [Masson-Delmotte et al., 2019, Requena Suarez et al., 2019]), without accounting for finer-scale variations of carbon stocks [Cook-Patton et al., 2020]. Precise spatio-temporal monitoring of forest carbon dynamics at large scales has proven to be challenging [Erb et al., 2018, Griscom et al., 2017]. This is due to the complex structure of forests, topographic features, and land management practices [Tubiello et al., 2021, Lewis et al., 2019]. Technological developments in remote sensing and the concurrent increased availability of field-based measurements have led to an improvement in estimating carbon stocks using remote sensing observations of forest attributes that serve as proxy for above-ground biomass (AGB) [Knapp et al., 2018, Bouvier et al., 2015, Pan et al., 2013]. Currently, three remote sensing techniques are applied to collect data for AGB estimates: i) passive optical imagery, ii) synthetic aperture radar (SAR), and iii) light detection and ranging (LiDAR).


Learning Selection Masks for Deep Neural Networks

arXiv.org Machine Learning

Data have often to be moved between servers and clients during the inference phase. For instance, modern virtual assistants collect data on mobile devices and the data are sent to remote servers for the analysis. A related scenario is that clients have to access and download large amounts of data stored on servers in order to apply machine learning models. Depending on the available bandwidth, this data transfer can be a serious bottleneck, which can significantly limit the application machine learning models. In this work, we propose a simple yet effective framework that allows to select certain parts of the input data needed for the subsequent application of a given neural network. Both the masks as well as the neural network are trained simultaneously such that a good model performance is achieved while, at the same time, only a minimal amount of data is selected by the masks. During the inference phase, only the parts selected by the masks have to be transferred between the server and the client. Our experimental evaluation indicates that it is, for certain learning tasks, possible to significantly reduce the amount of data needed to be transferred without affecting the model performance much.