Goto

Collaborating Authors

 nachman


BitHEP -- The Limits of Low-Precision ML in HEP

Krause, Claudius, Wang, Daohan, Winterhalder, Ramon

arXiv.org Artificial Intelligence

The increasing complexity of modern neural network architectures demands fast and memory-efficient implementations to mitigate computational bottlenecks. In this work, we evaluate the recently proposed BitNet architecture in HEP applications, assessing its performance in classification, regression, and generative modeling tasks. Specifically, we investigate its suitability for quark-gluon discrimination, SMEFT parameter estimation, and detector simulation, comparing its efficiency and accuracy to state-of-the-art methods. Our results show that while BitNet consistently performs competitively in classification tasks, its performance in regression and generation varies with the size and type of the network, highlighting key limitations and potential areas for improvement.


TRANSIT your events into a new mass: Fast background interpolation for weakly-supervised anomaly searches

Oleksiyuk, Ivan, Voloshynovskiy, Svyatoslav, Golling, Tobias

arXiv.org Artificial Intelligence

We introduce a new model for conditional and continuous data morphing called TRansport Adversarial Network for Smooth InTerpolation (TRANSIT). We apply it to create a background data template for weakly-supervised searches at the LHC. The method smoothly transforms sideband events to match signal region mass distributions. We demonstrate the performance of TRANSIT using the LHC Olympics R\&D dataset. The model captures non-linear mass correlations of features and produces a template that offers a competitive anomaly sensitivity compared to state-of-the-art transport-based template generators. Moreover, the computational training time required for TRANSIT is an order of magnitude lower than that of competing deep learning methods. This makes it ideal for analyses that iterate over many signal regions and signal models. Unlike generative models, which must learn a full probability density distribution, i.e., the correlations between all the variables, the proposed transport model only has to learn a smooth conditional shift of the distribution. This allows for a simpler, more efficient residual architecture, enabling mass uncorrelated features to pass the network unchanged while the mass correlated features are adjusted accordingly. Furthermore, we show that the latent space of the model provides a set of mass decorrelated features useful for anomaly detection without background sculpting.


OmniJet-${\alpha_{ C}}$: Learning point cloud calorimeter simulations using generative transformers

Birk, Joschka, Gaede, Frank, Hallin, Anna, Kasieczka, Gregor, Mozzanica, Martina, Rose, Henning

arXiv.org Artificial Intelligence

A foundation model is a machine learning model that has been pre-trained on a large amount of data, Machine learning (ML) methods have been a common and can then be fine-tuned for different downstream ingredient in particle physics research for a long tasks [61]. The idea behind utilizing pre-trained time, with neural networks being applied to object models is that their outputs can significantly enhance identification already in analyses at LEP [1]. Since the performance of downstream tasks, yielding then, the range of applications has grown drastically, better results than if the model were to be trained with ML methods being developed and used for from scratch. While the models mentioned above example in tagging [2-4], anomaly detection [5-8], have focused on exploring different tasks in specific individual reconstruction stages like particle tracking subdomains, like jet physics, a more ambitious goal [9-11] or even full event interpretation and reconstruction eventually would be to develop a foundation model [12]. Another important use case for for all tasks in all subdomains, including for example ML in high energy physics (HEP) is detector simulation.


SIGMA: Single Interpolated Generative Model for Anomalies

Das, Ranit, Shih, David

arXiv.org Artificial Intelligence

A key step in any resonant anomaly detection search is accurate modeling of the background distribution in each signal region. Data-driven methods like CATHODE accomplish this by training separate generative models on the complement of each signal region, and interpolating them into their corresponding signal regions. Having to re-train the generative model on essentially the entire dataset for each signal region is a major computational cost in a typical sliding window search with many signal regions. Here, we present SIGMA, a new, fully data-driven, computationally-efficient method for estimating background distributions. The idea is to train a single generative model on all of the data and interpolate its parameters in sideband regions in order to obtain a model for the background in the signal region. The SIGMA method significantly reduces the computational cost compared to previous approaches, while retaining a similar high quality of background modeling and sensitivity to anomalous signals.


Moment Unfolding

Desai, Krish, Nachman, Benjamin, Thaler, Jesse

arXiv.org Machine Learning

Deconvolving ("unfolding'') detector distortions is a critical step in the comparison of cross section measurements with theoretical predictions in particle and nuclear physics. However, most existing approaches require histogram binning while many theoretical predictions are at the level of statistical moments. We develop a new approach to directly unfold distribution moments as a function of another observable without having to first discretize the data. Our Moment Unfolding technique uses machine learning and is inspired by Generative Adversarial Networks (GANs). We demonstrate the performance of this approach using jet substructure measurements in collider physics. With this illustrative example, we find that our Moment Unfolding protocol is more precise than bin-based approaches and is as or more precise than completely unbinned methods.


Convolutional L2LFlows: Generating Accurate Showers in Highly Granular Calorimeters Using Convolutional Normalizing Flows

Buss, Thorsten, Gaede, Frank, Kasieczka, Gregor, Krause, Claudius, Shih, David

arXiv.org Artificial Intelligence

In the quest to build generative surrogate models as computationally efficient alternatives to rule-based simulations, the quality of the generated samples remains a crucial frontier. So far, normalizing flows have been among the models with the best fidelity. However, as the latent space in such models is required to have the same dimensionality as the data space, scaling up normalizing flows to high dimensional datasets is not straightforward. The prior L2LFlows approach successfully used a series of separate normalizing flows and sequence of conditioning steps to circumvent this problem. In this work, we extend L2LFlows to simulate showers with a 9-times larger profile in the lateral direction. To achieve this, we introduce convolutional layers and U-Net-type connections, move from masked autoregressive flows to coupling layers, and demonstrate the successful modelling of showers in the ILD Electromagnetic Calorimeter as well as Dataset 3 from the public CaloChallenge dataset.


Unifying Simulation and Inference with Normalizing Flows

Du, Haoxing, Krause, Claudius, Mikuni, Vinicius, Nachman, Benjamin, Pang, Ian, Shih, David

arXiv.org Machine Learning

There have been many applications of deep neural networks to detector calibrations and a growing number of studies that propose deep generative models as automated fast detector simulators. We show that these two tasks can be unified by using maximum likelihood estimation (MLE) from conditional generative models for energy regression. Unlike direct regression techniques, the MLE approach is prior-independent and non-Gaussian resolutions can be determined from the shape of the likelihood near the maximum. Using an ATLAS-like calorimeter simulation, we demonstrate this concept in the context of calorimeter energy calibration.


BUFF: Boosted Decision Tree based Ultra-Fast Flow matching

Jiang, Cheng, Qian, Sitian, Qu, Huilin

arXiv.org Artificial Intelligence

Tabular data stands out as one of the most frequently encountered types in high energy physics. Unlike commonly homogeneous data such as pixelated images, simulating high-dimensional tabular data and accurately capturing their correlations are often quite challenging, even with the most advanced architectures. Based on the findings that tree-based models surpass the performance of deep learning models for tasks specific to tabular data, we adopt the very recent generative modeling class named conditional flow matching and employ different techniques to integrate the usage of Gradient Boosted Trees. The performances are evaluated for various tasks on different analysis level with several public datasets. We demonstrate the training and inference time of most high-level simulation tasks can achieve speedup by orders of magnitude. The application can be extended to low-level feature simulation and conditioned generations with competitive performance.


CapsLorentzNet: Integrating Physics Inspired Features with Graph Convolution

Sahu, Rameswar

arXiv.org Artificial Intelligence

With the advent of advanced machine learning techniques, boosted object tagging has witnessed significant progress. In this article, we take this field further by introducing novel architectural modifications compatible with a wide array of Graph Neural Network (GNN) architectures. Our approach advocates for integrating capsule layers, replacing the conventional decoding blocks in standard GNNs. These capsules are a group of neurons with vector activations. The orientation of these vectors represents important properties of the objects under study, with their magnitude characterizing whether the object under study belongs to the class represented by the capsule. Moreover, capsule networks incorporate a regularization by reconstruction mechanism, facilitating the seamless integration of expert-designed high-level features into the analysis. We have studied the usefulness of our architecture with the LorentzNet architecture for quark-gluon tagging. Here, we have replaced the decoding block of LorentzNet with a capsulated decoding block and have called the resulting architecture CapsLorentzNet. Our new architecture can enhance the performance of LorentzNet by 20 \% for the quark-gluon tagging task.


Residual ANODE

Das, Ranit, Kasieczka, Gregor, Shih, David

arXiv.org Artificial Intelligence

We present R-ANODE, a new method for data-driven, model-agnostic resonant anomaly detection that raises the bar for both performance and interpretability. The key to R-ANODE is to enhance the inductive bias of the anomaly detection task by fitting a normalizing flow directly to the small and unknown signal component, while holding fixed a background model (also a normalizing flow) learned from sidebands. In doing so, R-ANODE is able to outperform all classifier-based, weakly-supervised approaches, as well as the previous ANODE method which fit a density estimator to all of the data in the signal region instead of just the signal. We show that the method works equally well whether the unknown signal fraction is learned or fixed, and is even robust to signal fraction misspecification. Finally, with the learned signal model we can sample and gain qualitative insights into the underlying anomaly, which greatly enhances the interpretability of resonant anomaly detection and offers the possibility of simultaneously discovering and characterizing the new physics that could be hiding in the data.