SURFing to the Fundamental Limit of Jet Tagging
Pang, Ian, Faroughy, Darius A., Shih, David, Das, Ranit, Kasieczka, Gregor
–arXiv.org Artificial Intelligence
Jet tagging is a central task in collider physics. Over the past decade, machine learning has driven major advances in jet tagging, with increasingly sophisticated architectures achieving very high classification performance on simulated datasets [1-11]. This success naturally raises a key question: have current jet taggers already reached the fundamental limit of jet tagging, or does a gap remain between practical performance and the true statistical optimum? The Neyman-Pearson (NP) limit, defined by the likelihood ratio, is the best possible discriminant between two different underlying physics processes - such as top and QCD jets - that any classifier could achieve if it had access to the exact data likelihoods [12]. In practice, however, this limit cannot be evaluated directly because the true likelihood of the data-generating process is unknown. It therefore remains unclear how close existing classifiers are to this ultimate bound. Recently, Ref. [13] proposed using autoregressive GPT-style generative models to probe this limit for top vs. QCD jets from the JetClass dataset [14]. These models operate on discretized, tokenized representations of jet constituents and yield explicit log-likelihoods, enabling the computation of likelihood ratios between jet classes.
arXiv.org Artificial Intelligence
Nov-21-2025
- Country:
- Europe > Germany
- Baden-Württemberg > Karlsruhe Region
- Heidelberg (0.04)
- Hamburg (0.04)
- Baden-Württemberg > Karlsruhe Region
- North America > United States
- New Jersey > Middlesex County > Piscataway (0.04)
- Europe > Germany
- Genre:
- Research Report (1.00)
- Industry:
- Energy (0.67)
- Government (0.46)
- Technology: