Gandrakota, Abhijith
Interpreting Transformers for Jet Tagging
Wang, Aaron, Gandrakota, Abhijith, Ngadiuba, Jennifer, Sahu, Vivekanand, Bhatnagar, Priyansh, Khoda, Elham E, Duarte, Javier
Machine learning (ML) algorithms, particularly attention-based transformer models, have become indispensable for analyzing the vast data generated by particle physics experiments like ATLAS and CMS at the CERN LHC. Particle Transformer (ParT), a state-of-the-art model, leverages particle-level attention to improve jet-tagging tasks, which are critical for identifying particles resulting from proton collisions. This study focuses on interpreting ParT by analyzing attention heat maps and particle-pair correlations on the $\eta$-$\phi$ plane, revealing a binary attention pattern where each particle attends to at most one other particle. At the same time, we observe that ParT shows varying focus on important particles and subjets depending on decay, indicating that the model learns traditional jet substructure observables. These insights enhance our understanding of the model's internal workings and learning process, offering potential avenues for improving the efficiency of transformer architectures in future high-energy physics applications.
Real-time Anomaly Detection at the L1 Trigger of CMS Experiment
Gandrakota, Abhijith
The Compact Muon Solenoid (CMS) experiment studies these collisions to uncover potential Beyond Standard Model (BSM) physics and precisely measure rare Standard Model (SM) processes [2]. While the high collision rate at the LHC increases the probability of producing and detecting rare processes, the nearly 100 million channels of the CMS detector also generate an enormous amount of data [10, 14]. Only a small fraction of the 40 MHz proton-proton collision events--around 1,000 per second--can be stored for detailed offline analysis. To meet this stringent data reduction, events are selected using a two-tiered trigger system. The first level (L1), composed of custom hardware processors built with field-programmable gate arrays (FPGAs), uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a fixed latency of 4 [14]. The second level, the high-level trigger (HLT), consists of a processor farm running optimized event reconstruction software, reducing the rate to around 1 kHz before storage [10].
Robust Anomaly Detection for Particle Physics Using Multi-Background Representation Learning
Gandrakota, Abhijith, Zhang, Lily, Puli, Aahlad, Cranmer, Kyle, Ngadiuba, Jennifer, Ranganath, Rajesh, Tran, Nhan
Anomaly, or out-of-distribution, detection is a promising tool for aiding discoveries of new particles or processes in particle physics. In this work, we identify and address two overlooked opportunities to improve anomaly detection for high-energy physics. First, rather than train a generative model on the single most dominant background process, we build detection algorithms using representation learning from multiple background types, thus taking advantage of more information to improve estimation of what is relevant for detection. Second, we generalize decorrelation to the multi-background setting, thus directly enforcing a more complete definition of robustness for anomaly detection. We demonstrate the benefit of the proposed robust multi-background anomaly detection algorithms on a high-dimensional dataset of particle decays at the Large Hadron Collider.
Fast Particle-based Anomaly Detection Algorithm with Variational Autoencoder
Liu, Ryan, Gandrakota, Abhijith, Ngadiuba, Jennifer, Spiropulu, Maria, Vlimant, Jean-Roch
Model-agnostic anomaly detection is one of the promising approaches in the search for new beyond the standard model physics. In this paper, we present Set-VAE, a particle-based variational autoencoder (VAE) anomaly detection algorithm. We demonstrate a 2x signal efficiency gain compared with traditional subjettiness-based jet selection. Furthermore, with an eye to the future deployment to trigger systems, we propose the CLIP-VAE, which reduces the inference-time cost of anomaly detection by using the KL-divergence loss as the anomaly score, resulting in a 2x acceleration in latency and reducing the caching requirement.
Efficient and Robust Jet Tagging at the LHC with Knowledge Distillation
Liu, Ryan, Gandrakota, Abhijith, Ngadiuba, Jennifer, Spiropulu, Maria, Vlimant, Jean-Roch
The challenging environment of real-time data processing systems at the Large Hadron Collider (LHC) strictly limits the computational complexity of algorithms that can be deployed. For deep learning models, this implies that only models with low computational complexity that have weak inductive bias are feasible. To address this issue, we utilize knowledge distillation to leverage both the performance of large models and the reduced computational complexity of small ones. In this paper, we present an implementation of knowledge distillation, demonstrating an overall boost in the student models' performance for the task of classifying jets at the LHC. Furthermore, by using a teacher model with a strong inductive bias of Lorentz symmetry, we show that we can induce the same inductive bias in the student model which leads to better robustness against arbitrary Lorentz boost.