Goto

Collaborating Authors

 uf 3


When More Data Hurts: Optimizing Data Coverage While Mitigating Diversity Induced Underfitting in an Ultra-Fast Machine-Learned Potential

Gibson, Jason B., Janicki, Tesia D., Hire, Ajinkya C., Bishop, Chris, Lane, J. Matthew D., Hennig, Richard G.

arXiv.org Artificial Intelligence

Machine-learned interatomic potentials (MLIPs) are becoming an essential tool in materials modeling. However, optimizing the generation of training data used to parameterize the MLIPs remains a significant challenge. This is because MLIPs can fail when encountering local enviroments too different from those present in the training data. The difficulty of determining \textit{a priori} the environments that will be encountered during molecular dynamics (MD) simulation necessitates diverse, high-quality training data. This study investigates how training data diversity affects the performance of MLIPs using the Ultra-Fast Force Field (UF$^3$) to model amorphous silicon nitride. We employ expert and autonomously generated data to create the training data and fit four force-field variants to subsets of the data. Our findings reveal a critical balance in training data diversity: insufficient diversity hinders generalization, while excessive diversity can exceed the MLIP's learning capacity, reducing simulation accuracy. Specifically, we found that the UF$^3$ variant trained on a subset of the training data, in which nitrogen-rich structures were removed, offered vastly better prediction and simulation accuracy than any other variant. By comparing these UF$^3$ variants, we highlight the nuanced requirements for creating accurate MLIPs, emphasizing the importance of application-specific training data to achieve optimal performance in modeling complex material behaviors.


Data At AI Speeds

#artificialintelligence

SwiftStack is being acquired by NVIDIA. SwiftStack is a software-driven data storage and management platform for data-intensive applications and workflows, providing access to data across the edge, core data centers and public clouds. The image below, from the SwiftStack web site gives some idea of the SwiftStack software ecosystem. SwiftStack says that it has worked for more than a year with NVIDIA to solve the data challenges to enable AI at scale. The release about the announcement says that, "Last year, when we announced SwiftStack 7, we unveiled our focus on the SwiftStack Data Platform for AI, HPC, and accelerated computing. This included SwiftStack 1space as a valuable piece of the puzzle, enabling data acceleration in the core, at the edge, and in the cloud."