Goto

Collaborating Authors

 Civitarese, Daniel


A modular framework for extreme weather generation

arXiv.org Artificial Intelligence

Extreme weather events have an enormous impact on society and are expected to become more frequent and severe with climate change. In this context, resilience planning becomes crucial for risk mitigation and coping with these extreme events. Machine learning techniques can play a critical role in resilience planning through the generation of realistic extreme weather event scenarios that can be used to evaluate possible mitigation actions. This paper proposes a modular framework that relies on interchangeable components to produce extreme weather event scenarios. We discuss possible alternatives for each of the components and show initial results comparing two approaches on the task of generating precipitation scenarios.


Workflow Provenance in the Lifecycle of Scientific Machine Learning

arXiv.org Artificial Intelligence

Machine Learning (ML) has been fundamentally transforming several industries and businesses in numerous ways. More recently, it has also been impacting computational science and engineering domains, such as geoscience, climate science, material science, and health science. Scientific ML, i.e., ML applied to these domains, is characterized by the combination of data-driven techniques with domain-specific data and knowledge to obtain models of physical phenomena [1], [2], [3], [4], [5]. Obtaining models in scientific ML works similarly to conducting traditional large-scale computational experiments [6], which involve a team of scientists and engineers that formulate hypotheses, design the experiment and predefine parameters and input datasets, analyze the experiment data, do observations, and calibrate initial assumptions in a cycle until they are satisfied with the results. Scientific ML is naturally large-scale because multiple people collaborate in a project, using their multidisciplinary domain-specific knowledge to design and perform data-intensive tasks to curate (i.e., understand, clean, enrich with observations) datasets and prepare for learning algorithms. They then plan and execute compute-intensive tasks for computational simulations or training ML models affected by the scientific domain's constraints. They utilize specialized scientific software tools running either on their desktops, on cloud clusters (e.g., Docker-based), or large HPC machines.


Netherlands Dataset: A New Public Dataset for Machine Learning in Seismic Interpretation

arXiv.org Machine Learning

Machine learning and, more specifically, deep learning algorithms have seen remarkable growth in their popularity and usefulness in the last years. This is arguably due to three main factors: powerful computers, new techniques to train deeper networks and larger datasets. Although the first two are readily available in modern computers and ML libraries, the last one remains a challenge for many domains. It is a fact that big data is a reality in almost all fields nowadays, and geosciences are not an exception. However, to achieve the success of general-purpose applications such as ImageNet - for which there are +14 million labeled images for 1000 target classes - we not only need more data, we need more high-quality labeled data. When it comes to the Oil&Gas industry, confidentiality issues hamper even more the sharing of datasets. In this work, we present the Netherlands interpretation dataset, a contribution to the development of machine learning in seismic interpretation. The Netherlands F3 dataset acquisition was carried out in the North Sea, Netherlands offshore. The data is publicly available and contains pos-stack data, 8 horizons and well logs of 4 wells. For the purposes of our machine learning tasks, the original dataset was reinterpreted, generating 9 horizons separating different seismic facies intervals. The interpreted horizons were used to generate approximatelly 190,000 labeled images for inlines and crosslines. Finally, we present two deep learning applications in which the proposed dataset was employed and produced compelling results.