Ustyuzhanin, Andrey
Wyckoff Transformer: Generation of Symmetric Crystals
Kazeev, Nikita, Nong, Wei, Romanov, Ignat, Zhu, Ruiming, Ustyuzhanin, Andrey, Yamazaki, Shuya, Hippalgaonkar, Kedar
Symmetry rules that atoms obey when they bond together to form an ordered crystal play a fundamental role in determining their physical, chemical, and electronic properties such as electrical and thermal conductivity, optical and polarization behavior, and mechanical strength. Almost all known crystalline materials have internal symmetry. Consistently generating stable crystal structures is still an open challenge, specifically because such symmetry rules are not accounted for. To address this issue, we propose WyFormer, a generative model for materials conditioned on space group symmetry. We use Wyckoff positions as the basis for an elegant, compressed, and discrete structure representation. To model the distribution, we develop a permutation-invariant autoregressive model based on the Transformer and an absence of positional encoding. WyFormer has a unique and powerful synergy of attributes, proven by extensive experimentation: best-in-class symmetry-conditioned generation, physics-motivated inductive bias, competitive stability of the generated structures, competitive material property prediction quality, and unparalleled inference speed.
Predicting ionic conductivity in solids from the machine-learned potential energy landscape
Maevskiy, Artem, Carvalho, Alexandra, Sataev, Emil, Turchyna, Volha, Noori, Keian, Rodin, Aleksandr, Neto, A. H. Castro, Ustyuzhanin, Andrey
Discovering new superionic materials is essential for advancing solid-state batteries, which offer improved energy density and safety compared to the traditional lithium-ion batteries with liquid electrolytes. Conventional computational methods for identifying such materials are resource-intensive and not easily scalable. Recently, universal interatomic potential models have been developed using equivariant graph neural networks. These models are trained on extensive datasets of first-principles force and energy calculations. One can achieve significant computational advantages by leveraging them as the foundation for traditional methods of assessing the ionic conductivity, such as molecular dynamics or nudged elastic band techniques. However, the generalization error from model inference on diverse atomic structures arising in such calculations can compromise the reliability of the results. In this work, we propose an approach for the quick and reliable evaluation of ionic conductivity through the analysis of a universal interatomic potential. Our method incorporates a set of heuristic structure descriptors that effectively employ the rich knowledge of the underlying model while requiring minimal generalization capabilities. Using our descriptors, we rank lithium-containing materials in the Materials Project database according to their expected ionic conductivity. Eight out of the ten highest-ranked materials are confirmed to be superionic at room temperature in first-principles calculations. Notably, our method achieves a speed-up factor of approximately 50 compared to molecular dynamics driven by a machine-learning potential, and is at least 3,000 times faster compared to first-principles molecular dynamics.
AI Competitions and Benchmarks: Competition platforms
Ustyuzhanin, Andrey, Carlens, Harald
The ecosystem of artificial intelligence competitions is a diverse and multifaceted landscape, encompassing a variety of platforms that each host numerous competitions annually, alongside a plethora of specialized websites dedicated to singular contests. These platforms adeptly manage the overarching administrative responsibilities inherent in orchestrating competitions, thus affording organizers the liberty to allocate greater attention to other facets of their contests. Notably, these platforms exhibit considerable diversity in their operational functionalities, economic models, and community dynamics. This chapter conducts an extensive review of the foremost services in this realm and elucidates several alternative methodologies that facilitate the independent hosting of such challenges. Keywords: competition platform, challenge hosting services, comparison.
Symbolic expression generation via Variational Auto-Encoder
Popov, Sergei, Lazarev, Mikhail, Belavin, Vladislav, Derkach, Denis, Ustyuzhanin, Andrey
There are many problems in physics, biology, and other natural sciences in which symbolic regression can provide valuable insights and discover new laws of nature. A widespread Deep Neural Networks do not provide interpretable solutions. Meanwhile, symbolic expressions give us a clear relation between observations and the target variable. However, at the moment, there is no dominant solution for the symbolic regression task, and we aim to reduce this gap with our algorithm. In this work, we propose a novel deep learning framework for symbolic expression generation via variational autoencoder (VAE). In a nutshell, we suggest using a VAE to generate mathematical expressions, and our training strategy forces generated formulas to fit a given dataset. Our framework allows encoding apriori knowledge of the formulas into fast-check predicates that speed up the optimization process. We compare our method to modern symbolic regression benchmarks and show that our method outperforms the competitors under noisy conditions. The recovery rate of SEGVAE is 65% on the Ngyuen dataset with a noise level of 10%, which is better than the previously reported SOTA by 20%. We demonstrate that this value depends on the dataset and can be even higher.
$(1 + \varepsilon)$-class Classification: an Anomaly Detection Method for Highly Imbalanced or Incomplete Data Sets
Borisyak, Maxim, Ryzhikov, Artem, Ustyuzhanin, Andrey, Derkach, Denis, Ratnikov, Fedor, Mineeva, Olga
Anomaly detection is not an easy problem since distribution of anomalous samples is unknown a priori. We explore a novel method that gives a trade-off possibility between one-class and two-class approaches, and leads to a better performance on anomaly detection problems with small or non-representative anomalous samples. The method is evaluated using several data sets and compared to a set of conventional one-class and two-class approaches.
Machine Learning in High Energy Physics Community White Paper
Albertsson, Kim, Altoe, Piero, Anderson, Dustin, Andrews, Michael, Espinosa, Juan Pedro Araque, Aurisano, Adam, Basara, Laurent, Bevan, Adrian, Bhimji, Wahid, Bonacorsi, Daniele, Calafiura, Paolo, Campanelli, Mario, Capps, Louis, Carminati, Federico, Carrazza, Stefano, Childers, Taylor, Coniavitis, Elias, Cranmer, Kyle, David, Claire, Davis, Douglas, Duarte, Javier, Erdmann, Martin, Eschle, Jonas, Farbin, Amir, Feickert, Matthew, Castro, Nuno Filipe, Fitzpatrick, Conor, Floris, Michele, Forti, Alessandra, Garra-Tico, Jordi, Gemmler, Jochen, Girone, Maria, Glaysher, Paul, Gleyzer, Sergei, Gligorov, Vladimir, Golling, Tobias, Graw, Jonas, Gray, Lindsey, Greenwood, Dick, Hacker, Thomas, Harvey, John, Hegner, Benedikt, Heinrich, Lukas, Hooberman, Ben, Junggeburth, Johannes, Kagan, Michael, Kane, Meghan, Kanishchev, Konstantin, Karpiński, Przemysław, Kassabov, Zahari, Kaul, Gaurav, Kcira, Dorian, Keck, Thomas, Klimentov, Alexei, Kowalkowski, Jim, Kreczko, Luke, Kurepin, Alexander, Kutschke, Rob, Kuznetsov, Valentin, Köhler, Nicolas, Lakomov, Igor, Lannon, Kevin, Lassnig, Mario, Limosani, Antonio, Louppe, Gilles, Mangu, Aashrita, Mato, Pere, Meenakshi, Narain, Meinhard, Helge, Menasce, Dario, Moneta, Lorenzo, Moortgat, Seth, Neubauer, Mark, Newman, Harvey, Pabst, Hans, Paganini, Michela, Paulini, Manfred, Perdue, Gabriel, Perez, Uzziel, Picazio, Attilio, Pivarski, Jim, Prosper, Harrison, Psihas, Fernanda, Radovic, Alexander, Reece, Ryan, Rinkevicius, Aurelius, Rodrigues, Eduardo, Rorie, Jamal, Rousseau, David, Sauers, Aaron, Schramm, Steven, Schwartzman, Ariel, Severini, Horst, Seyfert, Paul, Siroky, Filip, Skazytkin, Konstantin, Sokoloff, Mike, Stewart, Graeme, Stienen, Bob, Stockdale, Ian, Strong, Giles, Thais, Savannah, Tomko, Karen, Upfal, Eli, Usai, Emanuele, Ustyuzhanin, Andrey, Vala, Martin, Vallecorsa, Sofia, Verzetti, Mauro, Vilasís-Cardona, Xavier, Vlimant, Jean-Roch, Vukotic, Ilija, Wang, Sean-Jiun, Watts, Gordon, Williams, Michael, Wu, Wenjing, Wunsch, Stefan, Zapata, Omar
Machine learning is an important research area in particle physics, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas in machine learning in particle physics with a roadmap for their implementation, software and hardware resource requirements, collaborative initiatives with the data science community, academia and industry, and training the particle physics community in data science. The main objective of the document is to connect and motivate these areas of research and development with the physics drivers of the High-Luminosity Large Hadron Collider and future neutrino experiments and identify the resource needs for their implementation. Additionally we identify areas where collaboration with external communities will be of great benefit.