Prexl, Jonathan
Data-Centric Machine Learning for Geospatial Remote Sensing Data
Roscher, Ribana, Rußwurm, Marc, Gevaert, Caroline, Kampffmeyer, Michael, Santos, Jefersson A. dos, Vakalopoulou, Maria, Hänsch, Ronny, Hansen, Stine, Nogueira, Keiller, Prexl, Jonathan, Tuia, Devis
Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning models have been proposed, the majority of them have been developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that shifting the focus towards a complementary data-centric perspective is necessary to achieve further improvements in accuracy, generalization ability, and real impact in end-user applications. This work presents a definition and precise categorization of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches.
Using Machine Learning to predict extreme events in the H\'enon map
Lellep, Martin, Prexl, Jonathan, Linkmann, Moritz, Eckhardt, Bruno
Machine Learning (ML) inspired algorithms provide a flexible set of tools for analyzing and forecasting chaotic dynamical systems. We here analyze the performance of one algorithm for the prediction of extreme events in the two-dimensional H\'enon map at the classical parameters. The task is to determine whether a trajectory will exceed a threshold after a set number of time steps into the future. This task has a geometric interpretation within the dynamics of the H\'enon map, which we use to gauge the performance of the neural networks that are used in this work. We analyze the dependence of the success rate of the ML models on the prediction time $T$ , the number of training samples $N_T$ and the size of the network $N_p$. We observe that in order to maintain a certain accuracy, $N_T \propto exp(2 h T)$ and $N_p \propto exp(hT)$, where $h$ is the topological entropy. Similar relations between the intrinsic chaotic properties of the dynamics and ML parameters might be observable in other systems as well.