AITopics

2503.00131

Country:

Europe (0.67)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceNov-25-2024

Flow Annealed Importance Sampling Bootstrap meets Differentiable Particle Physics

Kofler, Annalena, Stimper, Vincent, Mikhasenko, Mikhail, Kagan, Michael, Heinrich, Lukas

High-energy physics requires the generation of large numbers of simulated data samples from complex but analytically tractable distributions called matrix elements. Surrogate models, such as normalizing flows, are gaining popularity for this task due to their computational efficiency. We adopt an approach based on Flow Annealed importance sampling Bootstrap (FAB) that evaluates the differentiable target density during training and helps avoid the costly generation of training data in advance. We show that FAB reaches higher sampling efficiency with fewer target evaluations in high dimensions in comparison to other methods.

artificial intelligence, efficiency, machine learning, (17 more...)

2411.16234

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

arXiv.org Artificial IntelligenceMar-11-2024

Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models

Harris, Philip, Kagan, Michael, Krupa, Jeffrey, Maier, Benedikt, Woodward, Nathaniel

Self-Supervised Learning (SSL) is at the core of training modern large machine learning models, providing a scheme for learning powerful representations that can be used in a variety of downstream tasks. However, SSL strategies must be adapted to the type of training data and downstream tasks required. We propose RS3L, a novel simulation-based SSL strategy that employs a method of re-simulation to drive data augmentation for contrastive learning. By intervening in the middle of the simulation process and re-running simulation components downstream of the intervention, we generate multiple realizations of an event, thus producing a set of augmentations covering all physics-driven variations available in the simulator. Using experiments from high-energy physics, we explore how this strategy may enable the development of a foundation model; we show how R3SL pre-training enables powerful performance in downstream tasks such as discrimination of a variety of objects and uncertainty mitigation. In addition to our results, we make the RS3L dataset publicly available for further studies on how to improve SSL strategies.

artificial intelligence, deep learning, machine learning, (19 more...)

2403.07066

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Energy (0.68)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Artificial IntelligenceJan-25-2024

Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models

Heinrich, Lukas, Golling, Tobias, Kagan, Michael, Klein, Samuel, Leigh, Matthew, Osadchy, Margarita, Raine, John Andrew

These models also represent a scale in both model size and data size that have not been addressed in HEP. In this work, we aim to take the first steps towards building such While Artificial Intelligence (AI) and Machine Learning a HEP foundation model, focusing on developing HEP (ML) are already playing a major role in the analysis of data specific SSL strategies, whilst keeping an eye on how high energy physics (HEP) data, the HEP community well such strategies may scale in the future. We propose a has yet to benefit from the self-supervised learning (SSL) masked particle modeling (MPM) scheme, akin to masked based approaches to building large foundation models language modeling (MLM) in NLP, for self-supervised (FM) [1] that have been pioneered in natural language learning on unlabeled data consisting of sets of particles processing (NLP) [2-5] and computer vision (CV) [6-8]. in a collider physics environment. In doing so, we propose These modern approaches use SSL to pre-train models a novel scheme to apply masked modeling strategies to on vast data sets in order to learn generic representations unordered sets of inputs. of the data. Such models can then be efficiently finetuned with small datasets for a variety of downstream This work aims to generalize the language-inspired tasks. The self-supervised pre-training of a FM produces MLM-type training scheme to HEP scientific data. The a model that is also referred to as the "backbone", as it paradigm involves extracting semantic meaning and understanding can serve as the information extraction component for of the whole by predicting the missing (masked) downstream models. This concept significantly expands pieces, referred to as tokens, thereby considering the collective the possibilities for learning robust and meaningful data impact of individual input elements.

large language model, machine learning, natural language, (21 more...)

2401.13537

Country:

North America > United States (0.93)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Industry:

Energy (0.68)
Education (0.48)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
(2 more...)

arXiv.org Artificial IntelligenceOct-19-2023

Differentiable Vertex Fitting for Jet Flavour Tagging

Smith, Rachel E. C., Ochoa, Inês, Inácio, Rúben, Shoemaker, Jonathan, Kagan, Michael

We propose a differentiable vertex fitting algorithm that can be used for secondary vertex fitting, and that can be seamlessly integrated into neural networks for jet flavour tagging. Vertex fitting is formulated as an optimization problem where gradients of the optimized solution vertex are defined through implicit differentiation and can be passed to upstream or downstream neural network components for network training. More broadly, this is an application of differentiable programming to integrate physics knowledge into neural network models in high energy physics. We demonstrate how differentiable secondary vertex fitting can be integrated into larger transformer-based models for flavour tagging and improve heavy flavour jet classification.

artificial intelligence, differentiable vertex fitting, machine learning, (1 more...)

2310.12804

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningAug-31-2023

Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics

Kagan, Michael, Heinrich, Lukas

We propose to apply several gradient estimation techniques to enable the differentiation of programs with discrete randomness in High Energy Physics. Such programs are common in High Energy Physics due to the presence of branching processes and clustering-based analysis. Thus differentiating such programs can open the way for gradient based optimization in the context of detector design optimization, simulator tuning, or data analysis and reconstruction optimization. We discuss several possible gradient estimation strategies, including the recent Stochastic AD method, and compare them in simplified detector design experiments. In doing so we develop, to the best of our knowledge, the first fully differentiable branching program.

artificial intelligence, estimator, machine learning, (15 more...)

2308.1668

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry:

Energy (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Machine LearningJul-1-2022

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Khoda, Elham E, Rankin, Dylan, de Lima, Rafael Teixeira, Harris, Philip, Hauck, Scott, Hsu, Shih-Chieh, Kagan, Michael, Loncar, Vladimir, Paikara, Chaitanya, Rao, Richa, Summers, Sioni, Vernieri, Caterina, Wang, Aaron

Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neural network layers -- long short-term memory and gated recurrent unit -- within the hls4ml framework. We demonstrate that our implementation is capable of producing effective designs for both small and large models, and can be customized to meet specific design requirements for inference latencies and FPGA resources. We show the performance and synthesized designs for multiple neural networks, many of which are trained specifically for jet identification tasks at the CERN Large Hadron Collider.

artificial intelligence, machine learning, neural network, (19 more...)

2207.00559

Country:

North America > United States > Massachusetts (0.28)
North America > United States > California (0.28)

Genre: Research Report (0.82)

Industry:

Energy (0.68)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-28-2022

Differentiable Matrix Elements with MadJax

Heinrich, Lukas, Kagan, Michael

MadJax is a tool for generating and evaluating differentiable matrix elements of high energy scattering processes. As such, it is a step towards a differentiable programming paradigm in high energy physics that facilitates the incorporation of high energy physics domain knowledge, encoded in simulation software, into gradient based learning and optimization pipelines. MadJax comprises two components: (a) a plugin to the general purpose matrix element generator MadGraph that integrates matrix element and phase space sampling code with the JAX differentiable programming framework, and (b) a standalone wrapping API for accessing the matrix element code and its gradients, which are computed with automatic differentiation. The MadJax implementation and example applications of simulation based inference and normalizing flow based matrix element modeling, with capabilities enabled uniquely with differentiable matrix elements, are presented.

artificial intelligence, machine learning, mul float64, (18 more...)

doi: 10.1088/1742-6596/2438/1/012137

2203.00057

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Industry:

Government > Regional Government (0.46)
Energy (0.46)

Technology:

Information Technology > Software (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Machine LearningNov-11-2020

Neural Empirical Bayes: Source Distribution Estimation and its Applications to Simulation-Based Inference

Vandegar, Maxime, Kagan, Michael, Wehenkel, Antoine, Louppe, Gilles

We revisit empirical Bayes in the absence of a tractable likelihood function, as is typical in scientific domains relying on computer simulations. We investigate how the empirical Bayesian can make use of neural density estimators first to use all noise-corrupted observations to estimate a prior or source distribution over uncorrupted samples, and then to perform single-observation posterior inference using the fitted source distribution. We propose an approach based on the direct maximization of the log-marginal likelihood of the observations, examining both biased and de-biased estimators, and comparing to variational approaches. We find that, up to symmetries, a neural empirical Bayes approach recovers ground truth source distributions. With the learned source distribution in hand, we show the applicability to likelihood-free inference and examine the quality of the resulting posterior estimates. Finally, we demonstrate the applicability of Neural Empirical Bayes on an inverse problem from collider physics.

artificial intelligence, neural network, source distribution, (18 more...)

2011.05836

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

arXiv.org Machine LearningMar-11-2019

Continual Learning via Neural Pruning

Golkar, Siavash, Kagan, Michael, Cho, Kyunghyun

We introduce Continual Learning via Neural Pruning (CLNP), a new method aimed at lifelong learning in fixed capacity models based on neuronal model sparsification. In this method, subsequent tasks are trained using the inactive neurons and filters of the sparsified network and cause zero deterioration to the performance of previous tasks. In order to deal with the possible compromise between model sparsity and performance, we formalize and incorporate the concept of graceful forgetting: the idea that it is preferable to suffer a small amount of forgetting in a controlled manner if it helps regain network capacity and prevents uncontrolled loss of performance during the training of future tasks. CLNP also provides simple continual learning diagnostic tools in terms of the number of free neurons left for the training of future tasks as well as the number of neurons that are being reused. In particular, we see in experiments that CLNP verifies and automatically takes advantage of the fact that the features of earlier layers are more transferable. We show empirically that CLNP leads to significantly improved results over current weight elasticity based methods.

educational setting, neural network, neuron, (19 more...)

1903.04476

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (0.46)
Education > Educational Setting > Continuing Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)