Topology-Driven Generative Completion of Lacunae in Molecular Data

Zubarev, Dmitry Yu., Ristoski, Petar

arXiv.org Artificial Intelligence 

Materials discovery is frequently driven by historical data sets that lack characteristics of the data sets specifically constructed to meet the needs of particular discovery efforts. They carry imprints of the ever-changing historical context of the research and development. Shifting priorities of the external funding, pressure for momentous technological breakthroughs, community perception of high-profile topics, and evolution of experimental capabilities render historical data a patchwork of findings with poorly understood internal structure. Statistical learning methods are typically concerned with statistical characteristics of the data. In the materials discovery, there is an additional pressure to understand the shape of the data in terms of what is known and what is missing and inform laborious and expensive data acquisition associated with material preparation, processing, and characterization. In this contribution, we are investigating the interplay between the shape of the historical data expressed as the structure of lacunae, such as gaps, loops, and voids, and the hypothesis generation that informs subsequent data acquisition. We describe an approach that explicitly identifies lacunae via topological data analysis (TDA) and fills them in using constrained generative modeling. TDA is concerned with capturing the shape of the data - the characteristics that are preserved under continuous deformations. The simplest widely accepted form of TDA is clustering.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found