AITopics

2412.18067

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report (1.00)

Industry:

Energy (1.00)
Materials > Chemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

arXiv.org Artificial IntelligenceNov-19-2024

Reward driven workflows for unsupervised explainable analysis of phases and ferroic variants from atomically resolved imaging data

Barakati, Kamyar, Liu, Yu, Nelson, Chris, Ziatdinov, Maxim A., Zhang, Xiaohang, Takeuchi, Ichiro, Kalinin, Sergei V.

Rapid progress in aberration corrected electron microscopy necessitates development of robust methods for the identification of phases, ferroic variants, and other pertinent aspects of materials structure from imaging data. While unsupervised methods for clustering and classification are widely used for these tasks, their performance can be sensitive to hyperparameter selection in the analysis workflow. In this study, we explore the effects of descriptors and hyperparameters on the capability of unsupervised ML methods to distill local structural information, exemplified by discovery of polarization and lattice distortion in Sm doped BiFeO3 (BFO) thin films. We demonstrate that a reward-driven approach can be used to optimize these key hyperparameters across the full workflow, where rewards were designed to reflect domain wall continuity and straightness, ensuring that the analysis aligns with the material's physical behavior. This approach allows us to discover local descriptors that are best aligned with the specific physical behavior, providing insight into the fundamental physics of materials. We further extend the reward driven workflows to disentangle structural factors of variation via optimized variational autoencoder (VAE). Finally, the importance of well-defined rewards was explored as a quantifiable measure of success of the workflow.

artificial intelligence, domain wall, machine learning, (17 more...)

2411.12612

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (0.34)

Industry:

Health & Medicine (0.47)
Energy (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

arXiv.org Artificial IntelligenceNov-2-2024

Conditional Latent Space Molecular Scaffold Optimization for Accelerated Molecular Design

Boyar, Onur, Hanada, Hiroyuki, Takeuchi, Ichiro

The rapid discovery of new chemical compounds is essential for advancing global health and developing treatments. While generative models show promise in creating novel molecules, challenges remain in ensuring the real-world applicability of these molecules and finding such molecules efficiently. To address this, we introduce Conditional Latent Space Molecular Scaffold Optimization (CLaSMO), which combines a Conditional Variational Autoencoder (CVAE) with Latent Space Bayesian Optimization (LSBO) to modify molecules strategically while maintaining similarity to the original input. Our LSBO setting improves the sample-efficiency of our optimization, and our modification approach helps us to obtain molecules with higher chances of real-world applicability. CLaSMO explores substructures of molecules in a sample-efficient manner by performing BO in the latent space of a CVAE conditioned on the atomic environment of the molecule to be optimized. Our experiments demonstrate that CLaSMO efficiently enhances target properties with minimal substructure modifications, achieving state-of-the-art results with a smaller model and dataset compared to existing methods. We also provide an open-source web application that enables chemical experts to apply CLaSMO in a Human-in-the-Loop setting.

artificial intelligence, machine learning, optimization problem, (16 more...)

2411.01423

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Artificial IntelligenceOct-22-2024

Real-time experiment-theory closed-loop interaction for autonomous materials science

Liang, Haotong, Wang, Chuangye, Yu, Heshan, Kirsch, Dylan, Pant, Rohit, McDannald, Austin, Kusne, A. Gilad, Zhao, Ji-Cheng, Takeuchi, Ichiro

Iterative cycles of theoretical prediction and experimental validation are the cornerstone of the modern scientific method. However, the proverbial "closing of the loop" in experiment-theory cycles in practice are usually ad hoc, often inherently difficult, or impractical to repeat on a systematic basis, beset by the scale or the time constraint of computation or the phenomena under study. Here, we demonstrate Autonomous MAterials Search Engine (AMASE), where we enlist robot science to perform self-driving continuous cyclical interaction of experiments and computational predictions for materials exploration. In particular, we have applied the AMASE formalism to the rapid mapping of a temperature-composition phase diagram, a fundamental task for the search and discovery of new materials. Thermal processing and experimental determination of compositional phase boundaries in thin films are autonomously interspersed with real-time updating of the phase diagram prediction through the minimization of Gibbs free energies. AMASE was able to accurately determine the eutectic phase diagram of the Sn-Bi binary thin-film system on the fly from a self-guided campaign covering just a small fraction of the entire composition - temperature phase space, translating to a 6-fold reduction in the number of necessary experiments. This study demonstrates for the first time the possibility of real-time, autonomous, and iterative interactions of experiments and theory carried out without any human intervention.

machine learning, phase diagram, real time system, (18 more...)

2410.1743

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report (1.00)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Architecture > Real Time Systems (0.80)
Information Technology > Artificial Intelligence > Robots (0.68)

arXiv.org Artificial IntelligenceOct-17-2024

Statistical testing on generative AI anomaly detection tools in Alzheimer's Disease diagnosis

He, Rosemary, Takeuchi, Ichiro

Alzheimer's Disease is challenging to diagnose due to our limited understanding of its mechanism and large heterogeneity among patients. Neurodegeneration is studied widely as a biomarker for clinical diagnosis, which can be measured from time series MRI progression. On the other hand, generative AI has shown promise in anomaly detection in medical imaging and used for tasks including tumor detection. However, testing the reliability of such data-driven methods is non-trivial due to the issue of double-dipping in hypothesis testing. In this work, we propose to solve this issue with selective inference and develop a reliable generative AI method for Alzheimer's prediction. We show that compared to traditional statistical methods with highly inflated p-values, selective inference successfully controls the false discovery rate under the desired alpha level while retaining statistical power. In practice, our pipeline could assist clinicians in Alzheimer's diagnosis and early intervention.

artificial intelligence, data mining, machine learning, (16 more...)

2410.13363

Genre: Research Report > Experimental Study (0.70)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.81)

arXiv.org Machine LearningOct-13-2024

Statistical Test for Auto Feature Engineering by Selective Inference

Matsukawa, Tatsuya, Shiraishi, Tomohiro, Nishino, Shuichi, Katsuoka, Teruyuki, Takeuchi, Ichiro

Auto Feature Engineering (AFE) plays a crucial role in developing practical machine learning pipelines by automating the transformation of raw data into meaningful features that enhance model performance. By generating features in a data-driven manner, AFE enables the discovery of important features that may not be apparent through human experience or intuition. On the other hand, since AFE generates features based on data, there is a risk that these features may be overly adapted to the data, making it essential to assess their reliability appropriately. Unfortunately, because most AFE problems are formulated as combinatorial search problems and solved by heuristic algorithms, it has been challenging to theoretically quantify the reliability of generated features. To address this issue, we propose a new statistical test for generated features by AFE algorithms based on a framework called selective inference. As a proof of concept, we consider a simple class of tree search-based heuristic AFE algorithms, and consider the problem of testing the generated features when they are used in a linear model. The proposed test can quantify the statistical significance of the generated features in the form of $p$-values, enabling theoretically guaranteed control of the risk of false findings.

algorithm, artificial intelligence, machine learning, (16 more...)

2410.19768

Genre:

Research Report > Experimental Study (0.51)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJun-27-2024

Statistical Test for Data Analysis Pipeline by Selective Inference

Shiraishi, Tomohiro, Matsukawa, Tatsuya, Nishino, Shuichi, Takeuchi, Ichiro

A data analysis pipeline is a structured sequence of processing steps that transforms raw data into meaningful insights by effectively integrating various analysis algorithms. In this paper, we propose a novel statistical test designed to assess the statistical significance of data analysis pipelines. Our approach allows for the systematic development of valid statistical tests applicable to any data analysis pipeline configuration composed of a set of data analysis components. We have developed this framework by adapting selective inference, which has gained recent attention as a new statistical inference technique for data-driven hypotheses. The proposed statistical test is theoretically designed to control the type I error at the desired significance level in finite samples. As examples, we consider a class of pipelines composed of three missing value imputation algorithms, three outlier detection algorithms, and three feature selection algorithms. We confirm the validity of our statistical test through experiments with both synthetic and real data for this class of data analysis pipelines. Additionally, we present an implementation framework that facilitates testing across any configuration of data analysis pipelines in this class without extra implementation costs.

artificial intelligence, data mining, machine learning, (19 more...)

2406.18902

Country: North America > United States (0.14)

Genre: Research Report > Experimental Study (0.50)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

arXiv.org Machine LearningJun-9-2024

Distributionally Robust Safe Sample Screening

Hanada, Hiroyuki, Tatsuya, Aoyama, Satoshi, Akahane, Tanaka, Tomonari, Okura, Yoshito, Inatsu, Yu, Hashimoto, Noriaki, Takeno, Shion, Murayama, Taro, Lee, Hanju, Kojima, Shinya, Takeuchi, Ichiro

In this study, we propose a machine learning method called Distributionally Robust Safe Sample Screening (DRSSS). DRSSS aims to identify unnecessary training samples, even when the distribution of the training samples changes in the future. To achieve this, we effectively combine the distributionally robust (DR) paradigm, which aims to enhance model robustness against variations in data distribution, with the safe sample screening (SSS), which identifies unnecessary training samples prior to model training. Since we need to consider an infinite number of scenarios regarding changes in the distribution, we applied SSS because it does not require model training after the change of the distribution. In this paper, we employed the covariate shift framework to represent the distribution of training samples and reformulated the DR covariate-shift problem as a weighted empirical risk minimization problem, where the weights are subject to uncertainty within a predetermined range. By extending the existing SSS technique to accommodate this weight uncertainty, the DRSSS method is capable of reliably identifying unnecessary samples under any future distribution within a specified range. We provide a theoretical guarantee for the DRSSS method and validate its performance through numerical experiments on both synthetic and real-world datasets.

artificial intelligence, machine learning, optimization problem, (17 more...)

2406.05964

Country:

North America > United States (0.14)
Asia > Japan (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Artificial IntelligenceMay-28-2024

Crystal-LSBO: Automated Design of De Novo Crystals with Latent Space Bayesian Optimization

Boyar, Onur, Gu, Yanheng, Tanaka, Yuji, Tonogai, Shunsuke, Itakura, Tomoya, Takeuchi, Ichiro

Generative modeling of crystal structures is significantly challenged by the complexity of input data, which constrains the ability of these models to explore and discover novel crystals. This complexity often confines de novo design methodologies to merely small perturbations of known crystals and hampers the effective application of advanced optimization techniques. One such optimization technique, Latent Space Bayesian Optimization (LSBO) has demonstrated promising results in uncovering novel objects across various domains, especially when combined with Variational Autoencoders (VAEs). Recognizing LSBO's potential and the critical need for innovative crystal discovery, we introduce Crystal-LSBO--a de novo design framework for crystals specifically tailored to enhance explorability within LSBO frameworks. Crystal-LSBO employs multiple VAEs, each dedicated to a distinct aspect of crystal structure--lattice, coordinates, and chemical elements, orchestrated by an integrative model that synthesizes these components into a cohesive output. This setup not only streamlines the learning process but also produces explorable latent spaces thanks to the decreased complexity of the learning task for each model, enabling LSBO approaches to operate. Our study pioneers the use of LSBO for de novo crystal design, demonstrating its efficacy through optimization tasks focused mainly on formation energy values. Our results highlight the effectiveness of our methodology, offering a new perspective for de novo crystal discovery.

artificial intelligence, latent space, machine learning, (18 more...)

2405.17881

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningApr-25-2024

Distributionally Robust Safe Screening

Hanada, Hiroyuki, Akahane, Satoshi, Aoyama, Tatsuya, Tanaka, Tomonari, Okura, Yoshito, Inatsu, Yu, Hashimoto, Noriaki, Murayama, Taro, Hanju, Lee, Kojima, Shinya, Takeuchi, Ichiro

In this study, we propose a method Distributionally Robust Safe Screening (DRSS), for identifying unnecessary samples and features within a DR covariate shift setting. This method effectively combines DR learning, a paradigm aimed at enhancing model robustness against variations in data distribution, with safe screening (SS), a sparse optimization technique designed to identify irrelevant samples and features prior to model training. The core concept of the DRSS method involves reformulating the DR covariate-shift problem as a weighted empirical risk minimization problem, where the weights are subject to uncertainty within a predetermined range. By extending the SS technique to accommodate this weight uncertainty, the DRSS method is capable of reliably identifying unnecessary samples and features under any future distribution within a specified range. We provide a theoretical guarantee of the DRSS method and validate its performance through numerical experiments on both synthetic and real-world datasets.

artificial intelligence, machine learning, optimization problem, (15 more...)

2404.16328

Country: Asia > Japan (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)