Goto

Collaborating Authors

 Gieseke, Fabian


DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications

arXiv.org Artificial Intelligence

Significant efforts have been directed towards adapting self-supervised multimodal learning for Earth observation applications. However, existing methods produce coarse patch-sized embeddings, limiting their effectiveness and integration with other modalities like LiDAR. To close this gap, we present DUNIA, an approach to learn pixel-sized embeddings through cross-modal alignment between images and full-waveform LiDAR data. As the model is trained in a contrastive manner, the embeddings can be directly leveraged in the context of a variety of environmental monitoring tasks in a zero-shot setting. In our experiments, we demonstrate the effectiveness of the embeddings for seven such tasks (canopy height mapping, fractional canopy cover, land cover mapping, tree species identification, plant area index, crop type classification, and per-pixel waveform-based vertical structure mapping). The results show that the embeddings, along with zero-shot classifiers, often outperform specialized supervised models, even in low data regimes. In the fine-tuning setting, we show strong low-shot capabilities with performances near or better than state-of-the-art on five out of six tasks.


Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation

arXiv.org Artificial Intelligence

With the rise in global greenhouse gas emissions, accurate large-scale tree canopy height maps are essential for understanding forest structure, estimating above-ground biomass, and monitoring ecological disruptions. To this end, we present a novel approach to generate large-scale, high-resolution canopy height maps over time. Our model accurately predicts canopy height over multiple years given Sentinel-2 time series satellite data. Using GEDI LiDAR data as the ground truth for training the model, we present the first 10m resolution temporal canopy height map of the European continent for the period 2019-2022. As part of this product, we also offer a detailed canopy height map for 2020, providing more precise estimates than previous studies. Our pipeline and the resulting temporal height map are publicly available, enabling comprehensive large-scale monitoring of forests and, hence, facilitating future research and ecological analyses. For an interactive viewer, see https://europetreemap.projects.earthengine.app/view/temporalcanopyheight.


Estimating Canopy Height at Scale

arXiv.org Artificial Intelligence

We propose a framework for global-scale canopy height estimation based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regions, enhancing the reliability of our predictions in those areas. A comparison between predictions and ground-truth labels yields an MAE / RMSE of 2.43 / 4.73 (meters) overall and 4.45 / 6.72 (meters) for trees taller than five meters, which depicts a substantial improvement compared to existing global-scale maps. The resulting height map as well as the underlying framework will facilitate and enhance ecological analyses at a global scale, including, but not limited to, large-scale forest and biomass monitoring.


End-to-End Neural Network Training for Hyperbox-Based Classification

arXiv.org Artificial Intelligence

Hyperbox-based classification has been seen as a promising technique in which decisions on the data are represented as a series of orthogonal, multidimensional boxes (i.e., hyperboxes) that are often interpretable and human-readable. However, existing methods are no longer capable of efficiently handling the increasing volume of data many application domains face nowadays. We address this gap by proposing a novel, fully differentiable framework for hyperbox-based classification via neural networks. In contrast to previous work, our hyperbox models can be efficiently trained in an end-to-end fashion, which leads to significantly reduced training times and superior classification results.


Deep Learning Based 3D Point Cloud Regression for Estimating Forest Biomass

arXiv.org Artificial Intelligence

Robust quantification of forest carbon stocks and their dynamics is important for climate change mitigation and adaptation strategies [FAO and UNEP, 2020]. The Paris Agreement [United Nations / Framework Convention on Climate Change, 2015] and the IPCC [Shukla et al., 2019] acknowledge that climate change mitigation goals cannot be achieved without a substantial contribution from forests. Spatial details in the carbon budget of forests are necessary to encourage transformational actions towards a sustainable forest sector [Harris et al., 2021, 2012]. Currently, many countries do not have nationally specific forest carbon accumulation rates but rather rely on default rates from the IPCC 2018 [Masson-Delmotte et al., 2019, Requena Suarez et al., 2019]), without accounting for finer-scale variations of carbon stocks [Cook-Patton et al., 2020]. Precise spatio-temporal monitoring of forest carbon dynamics at large scales has proven to be challenging [Erb et al., 2018, Griscom et al., 2017]. This is due to the complex structure of forests, topographic features, and land management practices [Tubiello et al., 2021, Lewis et al., 2019]. Technological developments in remote sensing and the concurrent increased availability of field-based measurements have led to an improvement in estimating carbon stocks using remote sensing observations of forest attributes that serve as proxy for above-ground biomass (AGB) [Knapp et al., 2018, Bouvier et al., 2015, Pan et al., 2013]. Currently, three remote sensing techniques are applied to collect data for AGB estimates: i) passive optical imagery, ii) synthetic aperture radar (SAR), and iii) light detection and ranging (LiDAR).


Learning Selection Masks for Deep Neural Networks

arXiv.org Machine Learning

Data have often to be moved between servers and clients during the inference phase. For instance, modern virtual assistants collect data on mobile devices and the data are sent to remote servers for the analysis. A related scenario is that clients have to access and download large amounts of data stored on servers in order to apply machine learning models. Depending on the available bandwidth, this data transfer can be a serious bottleneck, which can significantly limit the application machine learning models. In this work, we propose a simple yet effective framework that allows to select certain parts of the input data needed for the subsequent application of a given neural network. Both the masks as well as the neural network are trained simultaneously such that a good model performance is achieved while, at the same time, only a minimal amount of data is selected by the masks. During the inference phase, only the parts selected by the masks have to be transferred between the server and the client. Our experimental evaluation indicates that it is, for certain learning tasks, possible to significantly reduce the amount of data needed to be transferred without affecting the model performance much.


Training Big Random Forests with Little Resources

arXiv.org Machine Learning

Without access to large compute clusters, building random forests on large datasets is still a challenging problem. This is, in particular, the case if fully-grown trees are desired. We propose a simple yet effective framework that allows to efficiently construct ensembles of huge trees for hundreds of millions or even billions of training instances using a cheap desktop computer with commodity hardware. The basic idea is to consider a multi-level construction scheme, which builds top trees for small random subsets of the available data and which subsequently distributes all training instances to the top trees' leaves for further processing. While being conceptually simple, the overall efficiency crucially depends on the particular implementation of the different phases. The practical merits of our approach are demonstrated using dense datasets with hundreds of millions of training instances.


Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy

arXiv.org Machine Learning

Astrophysics and cosmology are rich with data. The advent of wide-area digital cameras on large aperture telescopes has led to ever more ambitious surveys of the sky. Data volumes of entire surveys a decade ago can now be acquired in a single night and real-time analysis is often desired. Thus, modern astronomy requires big data know-how, in particular it demands highly efficient machine learning and image analysis algorithms. But scalability is not the only challenge: Astronomy applications touch several current machine learning research questions, such as learning from biased data and dealing with label and measurement noise. We argue that this makes astronomy a great domain for computer science research, as it pushes the boundaries of data analysis. In the following, we will present this exciting application area for data scientists. We will focus on exemplary results, discuss main challenges, and highlight some recent methodological advancements in machine learning and image analysis triggered by astronomical applications.