Tran, Nhan
Low latency optical-based mode tracking with machine learning deployed on FPGAs on a tokamak
Wei, Yumou, Forelli, Ryan F., Hansen, Chris, Levesque, Jeffrey P., Tran, Nhan, Agar, Joshua C., Di Guglielmo, Giuseppe, Mauel, Michael E., Navratil, Gerald A.
Active feedback control in magnetic confinement fusion devices is desirable to mitigate plasma instabilities and enable robust operation. Optical high-speed cameras provide a powerful, non-invasive diagnostic and can be suitable for these applications. In this study, we process fast camera data, at rates exceeding 100kfps, on $\textit{in situ}$ Field Programmable Gate Array (FPGA) hardware to track magnetohydrodynamic (MHD) mode evolution and generate control signals in real-time. Our system utilizes a convolutional neural network (CNN) model which predicts the $n$=1 MHD mode amplitude and phase using camera images with better accuracy than other tested non-deep-learning-based methods. By implementing this model directly within the standard FPGA readout hardware of the high-speed camera diagnostic, our mode tracking system achieves a total trigger-to-output latency of 17.6$\mu$s and a throughput of up to 120kfps. This study at the High Beta Tokamak-Extended Pulse (HBT-EP) experiment demonstrates an FPGA-based high-speed camera data acquisition and processing system, enabling application in real-time machine-learning-based tokamak diagnostic and control as well as potential applications in other scientific domains.
On-Sensor Data Filtering using Neuromorphic Computing for High Energy Physics Experiments
Kulkarni, Shruti R., Young, Aaron, Date, Prasanna, Miniskar, Narasinga Rao, Vetter, Jeffrey S., Fahim, Farah, Parpillon, Benjamin, Dickinson, Jennet, Tran, Nhan, Yoo, Jieun, Mills, Corrinne, Swartz, Morris, Maksimovic, Petar, Schuman, Catherine D., Bean, Alice
This work describes the investigation of neuromorphic computing-based spiking neural network (SNN) models used to filter data from sensor electronics in high energy physics experiments conducted at the High Luminosity Large Hadron Collider. We present our approach for developing a compact neuromorphic model that filters out the sensor data based on the particle's transverse momentum with the goal of reducing the amount of data being sent to the downstream electronics. The incoming charge waveforms are converted to streams of binary-valued events, which are then processed by the SNN. We present our insights on the various system design choices - from data encoding to optimal hyperparameters of the training algorithm - for an accurate and compact SNN optimized for hardware deployment. Our results show that an SNN trained with an evolutionary algorithm and an optimized set of hyperparameters obtains a signal efficiency of about 91% with nearly half as many parameters as a deep neural network.
Structural Re-weighting Improves Graph Domain Adaptation
Liu, Shikun, Li, Tianchun, Feng, Yongbin, Tran, Nhan, Zhao, Han, Qiang, Qiu, Li, Pan
In many real-world applications, graph-structured data used for training and testing have differences in distribution, such as in high energy physics (HEP) where simulation data used for training may not match real experiments. Graph domain adaptation (GDA) is a method used to address these differences. However, current GDA primarily works by aligning the distributions of node representations output by a single graph neural network encoder shared across the training and testing domains, which may often yield sub-optimal solutions. This work examines different impacts of distribution shifts caused by either graph structure or node attributes and identifies a new type of shift, named conditional structure shift (CSS), which current GDA approaches are provably sub-optimal to deal with. A novel approach, called structural reweighting (StruRW), is proposed to address this issue and is tested on synthetic graphs, four benchmark datasets, and a new application in HEP. StruRW has shown significant performance improvement over the baselines in the settings with large graph structure shifts, and reasonable performance improvement when node attribute shift dominates.
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs
Campos, Javier, Dong, Zhen, Duarte, Javier, Gholami, Amir, Mahoney, Michael W., Mitrevski, Jovan, Tran, Nhan
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs) for efficient field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) hardware. Our approach leverages Hessian-aware quantization (HAWQ) of NNs, the Quantized Open Neural Network Exchange (QONNX) intermediate representation, and the hls4ml tool flow for transpiling NNs into FPGA and ASIC firmware. This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow that can be deployed for real-time machine learning applications in a wide range of scientific and industrial settings. We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the CERN Large Hadron Collider (LHC). Given the high collision rate, all data processing must be implemented on custom ASIC and FPGA hardware within a strict area and latency. Based on these constraints, we implement an optimized mixed-precision NN classifier for high-momentum particle jets in simulated LHC proton-proton collisions.
Neural network accelerator for quantum control
Xu, David, Özgüler, A. Barış, Di Guglielmo, Giuseppe, Tran, Nhan, Perdue, Gabriel N., Carloni, Luca, Fahim, Farah
Efficient quantum control is necessary for practical quantum computing implementations with current technologies. Conventional algorithms for determining optimal control parameters are computationally expensive, largely excluding them from use outside of the simulation. Existing hardware solutions structured as lookup tables are imprecise and costly. By designing a machine learning model to approximate the results of traditional tools, a more efficient method can be produced. Such a model can then be synthesized into a hardware accelerator for use in quantum systems. In this study, we demonstrate a machine learning algorithm for predicting optimal pulse parameters. This algorithm is lightweight enough to fit on a low-resource FPGA and perform inference with a latency of 175 ns and pipeline interval of 5 ns with $~>~$0.99 gate fidelity. In the long term, such an accelerator could be used near quantum computing hardware where traditional computers cannot operate, enabling quantum control at a reasonable cost at low latencies without incurring large data bandwidths outside of the cryogenic environment.
QONNX: Representing Arbitrary-Precision Quantized Neural Networks
Pappalardo, Alessandro, Umuroglu, Yaman, Blott, Michaela, Mitrevski, Jovan, Hawks, Ben, Tran, Nhan, Loncar, Vladimir, Summers, Sioni, Borras, Hendrik, Muhizi, Jules, Trahms, Matthew, Hsu, Shih-Chieh, Hauck, Scott, Duarte, Javier
We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc -- in order to represent uniform quantization. By keeping the QONNX IR high-level and flexible, we enable targeting a wider variety of platforms. We also present utilities for working with QONNX, as well as examples of its usage in the FINN and hls4ml toolchains. Finally, we introduce the QONNX model zoo to share low-precision quantized neural networks.
Applications and Techniques for Fast Machine Learning in Science
Deiana, Allison McCarn, Tran, Nhan, Agar, Joshua, Blott, Michaela, Di Guglielmo, Giuseppe, Duarte, Javier, Harris, Philip, Hauck, Scott, Liu, Mia, Neubauer, Mark S., Ngadiuba, Jennifer, Ogrenci-Memik, Seda, Pierini, Maurizio, Aarrestad, Thea, Bahr, Steffen, Becker, Jurgen, Berthold, Anne-Sophie, Bonventre, Richard J., Bravo, Tomas E. Muller, Diefenthaler, Markus, Dong, Zhen, Fritzsche, Nick, Gholami, Amir, Govorkova, Ekaterina, Hazelwood, Kyle J, Herwig, Christian, Khan, Babar, Kim, Sehoon, Klijnsma, Thomas, Liu, Yaling, Lo, Kin Ho, Nguyen, Tri, Pezzullo, Gianantonio, Rasoulinezhad, Seyedramin, Rivera, Ryan A., Scholberg, Kate, Selig, Justin, Sen, Sougata, Strukov, Dmitri, Tang, William, Thais, Savannah, Unger, Kai Lukas, Vilalta, Ricardo, Krosigk, Belinavon, Warburton, Thomas K., Flechas, Maria Acosta, Aportela, Anthony, Calvet, Thomas, Cristella, Leonardo, Diaz, Daniel, Doglioni, Caterina, Galati, Maria Domenica, Khoda, Elham E, Fahim, Farah, Giri, Davide, Hawks, Benjamin, Hoang, Duc, Holzman, Burt, Hsu, Shih-Chieh, Jindariani, Sergo, Johnson, Iris, Kansal, Raghav, Kastner, Ryan, Katsavounidis, Erik, Krupa, Jeffrey, Li, Pan, Madireddy, Sandeep, Marx, Ethan, McCormack, Patrick, Meza, Andres, Mitrevski, Jovan, Mohammed, Mohammed Attia, Mokhtar, Farouk, Moreno, Eric, Nagu, Srishti, Narayan, Rohin, Palladino, Noah, Que, Zhiqiang, Park, Sang Eon, Ramamoorthy, Subramanian, Rankin, Dylan, Rothman, Simon, Sharma, Ashish, Summers, Sioni, Vischia, Pietro, Vlimant, Jean-Roch, Weng, Olivia
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.
Fast convolutional neural networks on FPGAs with hls4ml
Aarrestad, Thea, Loncar, Vladimir, Pierini, Maurizio, Summers, Sioni, Ngadiuba, Jennifer, Petersson, Christoffer, Linander, Hampus, Iiyama, Yutaro, Di Guglielmo, Giuseppe, Duarte, Javier, Harris, Philip, Rankin, Dylan, Jindariani, Sergo, Pedro, Kevin, Tran, Nhan, Liu, Mia, Kreinar, Edward, Wu, Zhenbin, Hoang, Duc
The hls4ml library [1, 2] is an open source software designed to facilitate the deployment of machine learning (ML) models on field-programmable gate arrays (FPGAs), targeting low-latency and low-power edge applications. Taking as input a neural network model, hls4ml generates C/C code designed to be transpiled into FPGA firmware by processing it with a high-level synthesis (HLS) library. The development of hls4ml was historically driven by the need to integrate ML algorithms in the first stage of the real-time data processing of particle physics experiments operating at the CERN Large Hadron Collider (LHC). The LHC produces high-energy proton collisions (or events) every 25 ns, each consisting of about 1 MB of raw data. Since this throughput is overwhelming for the currently available processing and storage resources, the LHC experiments run a real-time event selection system, the so-called Level-1 trigger (L1T), to reduce the event rate from 40 MHz to 100 kHz [3-6]. Due to the size of the buffering system, the L1T system operates with a fixed latency of O(1 µs). While hls4ml excels as a tool to automatically generate low-latency ML firmware for L1T applications, it also offers interesting opportunities for edge-computing applications beyond particle physics whenever efficient, e.g.
Fast inference of deep neural networks in FPGAs for particle physics
Duarte, Javier, Han, Song, Harris, Philip, Jindariani, Sergo, Kreinar, Edward, Kreis, Benjamin, Ngadiuba, Jennifer, Pierini, Maurizio, Rivera, Ryan, Tran, Nhan, Wu, Zhenbin
Recent results at the Large Hadron Collider (LHC) have pointed to enhanced physics capabilities through the improvement of the real-time event processing techniques. Machine learning methods are ubiquitous and have proven to be very powerful in LHC physics, and particle physics as a whole. However, exploration of the use of such techniques in low-latency, low-power FPGA hardware has only just begun. FPGA-based trigger and data acquisition (DAQ) systems have extremely low, sub-microsecond latency requirements that are unique to particle physics. We present a case study for neural network inference in FPGAs focusing on a classifier for jet substructure which would enable, among many other physics scenarios, searches for new dark sector particles and novel measurements of the Higgs boson. While we focus on a specific example, the lessons are far-reaching. We develop a package based on High-Level Synthesis (HLS) called hls4ml to build machine learning models in FPGAs. The use of HLS increases accessibility across a broad user community and allows for a drastic decrease in firmware development time. We map out FPGA resource usage and latency versus neural network hyperparameters to identify the problems in particle physics that would benefit from performing neural network inference with FPGAs. For our example jet substructure model, we fit well within the available resources of modern FPGAs with a latency on the scale of 100 ns.