Goto

Collaborating Authors

 physical science


CIPHER: Scalable Time Series Analysis for Physical Sciences with Application to Solar Wind Phenomena

Kobayashi, Jasmine R., Martin, Daniela, Filho, Valmir P Moraes, O'Brien, Connor, Hong, Jinsu, Saikia, Sudeshna Boro, Lamdouar, Hala, Miles, Nathan D., Scoczynski, Marcella, Stone, Mavis, Sundaresan, Sairam, Jungbluth, Anna, Muñoz-Jaramillo, Andrés, Samara, Evangelia, Gallego, Joseph

arXiv.org Artificial Intelligence

Labeling or classifying time series is a persistent challenge in the physical sciences, where expert annotations are scarce, costly, and often inconsistent. Yet robust labeling is essential to enable machine learning models for understanding, prediction, and forecasting. We present the \textit{Clustering and Indexation Pipeline with Human Evaluation for Recognition} (CIPHER), a framework designed to accelerate large-scale labeling of complex time series in physics. CIPHER integrates \textit{indexable Symbolic Aggregate approXimation} (iSAX) for interpretable compression and indexing, density-based clustering (HDBSCAN) to group recurring phenomena, and a human-in-the-loop step for efficient expert validation. Representative samples are labeled by domain scientists, and these annotations are propagated across clusters to yield systematic, scalable classifications. We evaluate CIPHER on the task of classifying solar wind phenomena in OMNI data, a central challenge in space weather research, showing that the framework recovers meaningful phenomena such as coronal mass ejections and stream interaction regions. Beyond this case study, CIPHER highlights a general strategy for combining symbolic representations, unsupervised learning, and expert knowledge to address label scarcity in time series across the physical sciences. The code and configuration files used in this study are publicly available to support reproducibility.


Combined Representation and Generation with Diffusive State Predictive Information Bottleneck

John, Richard, Qiu, Yunrui, Herron, Lukas, Tiwary, Pratyush

arXiv.org Artificial Intelligence

Generative modeling becomes increasingly data-intensive in high-dimensional spaces. In molecular science, where data collection is expensive and important events are rare, compression to lower-dimensional manifolds is especially important for various downstream tasks, including generation. We combine a time-lagged information bottleneck designed to characterize molecular important representations and a diffusion model in one joint training objective. The resulting protocol, which we term Diffusive State Predictive Information Bottleneck (D-SPIB), enables the balancing of representation learning and generation aims in one flexible architecture. Additionally, the model is capable of combining temperature information from different molecular simulation trajectories to learn a coherent and useful internal representation of thermodynamics. We benchmark D-SPIB on multiple molecular tasks and showcase its potential for exploring physical conditions outside the training set.


A Perspective on Symbolic Machine Learning in Physical Sciences

Makke, Nour, Chawla, Sanjay

arXiv.org Artificial Intelligence

Machine learning is rapidly making its pathway across all of the natural sciences, including physical sciences. The rate at which ML is impacting non-scientific disciplines is incomparable to that in the physical sciences. This is partly due to the uninterpretable nature of deep neural networks. Symbolic machine learning stands as an equal and complementary partner to numerical machine learning in speeding up scientific discovery in physics. This perspective discusses the main differences between the ML and scientific approaches. It stresses the need to develop and apply symbolic machine learning to physics problems equally, in parallel to numerical machine learning, because of the dual nature of physics research.


Users Favor LLM-Generated Content -- Until They Know It's AI

Parshakov, Petr, Naidenova, Iuliia, Paklina, Sofia, Matkin, Nikita, Nesseler, Cornel

arXiv.org Artificial Intelligence

In this paper, we investigate how individuals evaluate human and large langue models generated responses to popular questions when the source of the content is either concealed or disclosed. Through a controlled field experiment, participants were presented with a set of questions, each accompanied by a response generated by either a human or an AI. In a randomized design, half of the participants were informed of the response's origin while the other half remained unaware. Our findings indicate that, overall, participants tend to prefer AI-generated responses. However, when the AI origin is revealed, this preference diminishes significantly, suggesting that evaluative judgments are influenced by the disclosure of the response's provenance rather than solely by its quality. These results underscore a bias against AI-generated content, highlighting the societal challenge of improving the perception of AI work in contexts where quality assessments should be paramount.


Inferring Interpretable Models of Fragmentation Functions using Symbolic Regression

Makke, Nour, Chawla, Sanjay

arXiv.org Artificial Intelligence

Fragmentation functions represent a key ingredient in the description of hadron production cross sections in various high-energy physics (HEP) processes, i.e., lepton-nucleon, nucleon-nucleon, and nuclei-nuclei collisions. They provide a quantitative description of the hadronization mechanism [1], which is intrinsically non-perturbative in the Quantum Chromodynamics (QCD) theory. FFs are not calculable in perturbative QCD, and their determination fully relies on physical observables measured in high-energy physics experiments, e.g., the large hadron collider at CERN [2-5], among others. The current methodology relies on global QCD fits [6, 7], commonly referred to as "FFs parameterizations", where a pre-assumed functional form of FFs is fit to a wide range of physical observables to learn its parameters by involving the DGLAP evolution equations [8] which considers the different energy scales of the experimental measurements. FFs represent a key ingredient to describe hadron production in all HEP processes at the running experiments at the large hadron collider (LHC) at CERN, and to make predictions for the next generation of experiments such as the future Electron Ion Collider (EIC) at the Berkeley National Laboratory (BNL) and the future Circular Colider (FCC) at CERN which will run at significantly higher energies (with center-of-mass energy of 100 TeV versus 14 TeV at LHC) thus covering new regions of the kinematic phase space. It is mandatory to question, in the fast-evolving AI era, whether ML could assist in inferring a functional form of FFs directly from data rather than pre-assuming a function, and, most importantly, if the function learned using AI tools is interpretable, human-understandable, and how it compares to designated functions.


Machine learning pioneers, including the 'Godfather of AI,' are awarded the Nobel Prize in Physics

Engadget

Two scientists have been awarded the Nobel Prize in Physics "for foundational discoveries and inventions that enable machine learning with artificial neural networks." John Hopfield, an emeritus professor of Princeton University, devised an associative memory that's able to store and reconstruct images and other types of patterns in data. Geoffrey Hinton, who has been dubbed the "Godfather of AI," pioneered a way to autonomously find properties in data, leading to the ability to identify certain elements in pictures. "This year's physics laureates' breakthroughs stand on the foundations of physical science. They have showed a completely new way for us to use computers to aid and to guide us to tackle many of the challenges our society face," the committee wrote on X. "Thanks to their work humanity now has a new item in its toolbox, which we can choose to use for good purposes. Machine learning based on artificial neural networks is currently revolutionizing science, engineering and daily life."


Physical Symbolic Optimization

Tenachi, Wassim, Ibata, Rodrigo, Diakogiannis, Foivos I.

arXiv.org Artificial Intelligence

We present a framework for constraining the automatic sequential generation of equations to obey the rules of dimensional analysis by construction. Combining this approach with reinforcement learning, we built $\Phi$-SO, a Physical Symbolic Optimization method for recovering analytical functions from physical data leveraging units constraints. Our symbolic regression algorithm achieves state-of-the-art results in contexts in which variables and constants have known physical units, outperforming all other methods on SRBench's Feynman benchmark in the presence of noise (exceeding 0.1%) and showing resilience even in the presence of significant (10%) levels of noise.


A New Task: Deriving Semantic Class Targets for the Physical Sciences

Bowles, Micah, Tang, Hongming, Vardoulaki, Eleni, Alexander, Emma L., Luo, Yan, Rudnick, Lawrence, Walmsley, Mike, Porter, Fiona, Scaife, Anna M. M., Slijepcevic, Inigo Val, Segal, Gary

arXiv.org Artificial Intelligence

We define deriving semantic class targets as a novel multi-modal task. By doing so, we aim to improve classification schemes in the physical sciences which can be severely abstracted and obfuscating. We address this task for upcoming radio astronomy surveys and present the derived semantic radio galaxy morphology class targets.


Off-the-shelf deep learning is not enough: parsimony, Bayes and causality

Vasudevan, Rama K., Ziatdinov, Maxim, Vlcek, Lukas, Kalinin, Sergei V.

arXiv.org Machine Learning

Deep neural networks ("deep learning") have emerged as a technology of choice to tackle problems in natural language processing, computer vision, speech recognition and gameplay, and in just a few years has led to superhuman level performance and ushered in a new wave of "AI." Buoyed by these successes, researchers in the physical sciences have made steady progress in incorporating deep learning into their respective domains. However, such adoption brings substantial challenges that need to be recognized and confronted. Here, we discuss both opportunities and roadblocks to implementation of deep learning within materials science, focusing on the relationship between correlative nature of machine learning and causal hypothesis driven nature of physical sciences. We argue that deep learning and AI are now well positioned to revolutionize fields where causal links are known, as is the case for applications in theory. When confounding factors are frozen or change only weakly, this leaves open the pathway for effective deep learning solutions in experimental domains. Similarly, these methods offer a pathway towards understanding the physics of real-world systems, either via deriving reduced representations, deducing algorithmic complexity, or recovering generative physical models. However, extending deep learning and "AI" for models with unclear causal relationship can produce misleading and potentially incorrect results. Here, we argue the broad adoption of Bayesian methods incorporating prior knowledge, development of DL solutions with incorporated physical constraints, and ultimately adoption of causal models, offers a path forward for fundamental and applied research. Most notably, while these advances can change the way science is carried out in ways we cannot imagine, machine learning is not going to substitute science any time soon.


Identifying knot types of polymer conformations by machine learning

#artificialintelligence

We investigate the use of artificial neural networks (NNs) as an alternative tool to current analytical methods for recognizing knots in a given polymer conformation. First, it is of interest to examine whether NNs are effective at learning the global and sequential properties that uniquely define a knot. Second, knot classification is an important and unsolved problem in mathematical and physical sciences, and NNs may provide insights into this problem. Motivated by these points, we generate millions of polymer conformations for five knot types: 0, 31, 41, 51, and 52, and we design various NN models for classification. Our best model achieves a five-class classification accuracy of above 99% on a polymer of 100 monomers.