snl
Coordinate Descent for Network Linearization
Rakhlin, Vlad, Jevnisek, Amir, Avidan, Shai
ReLU activations are the main bottleneck in Private Inference that is based on ResNet networks. This is because they incur significant inference latency. Reducing ReLU count is a discrete optimization problem, and there are two common ways to approach it. Most current state-of-the-art methods are based on a smooth approximation that jointly optimizes network accuracy and ReLU budget at once. However, the last hard thresholding step of the optimization usually introduces a large performance loss. We take an alternative approach that works directly in the discrete domain by leveraging Coordinate Descent as our optimization framework. In contrast to previous methods, this yields a sparse solution by design. We demonstrate, through extensive experiments, that our method is State of the Art on common benchmarks.
Neural Network Acceleration on MPSoC board: Integrating SLAC's SNL, Rogue Software and Auto-SNL
Rahali, Hamza Ezzaoui, Dave, Abhilasha, Ruckman, Larry, Rahimifar, Mohammad Mehdi, Therrien, Audrey C., Russel, James J., Herbst, Ryan T.
The LCLS-II Free Electron Laser (FEL) will generate X-ray pulses for beamline experiments at rates of up to 1~MHz, with detectors producing data throughputs exceeding 1 TB/s. Managing such massive data streams presents significant challenges, as transmission and storage infrastructures become prohibitively expensive. Machine learning (ML) offers a promising solution for real-time data reduction, but conventional implementations introduce excessive latency, making them unsuitable for high-speed experimental environments. To address these challenges, SLAC developed the SLAC Neural Network Library (SNL), a specialized framework designed to deploy real-time ML inference models on Field-Programmable Gate Arrays (FPGA). SNL's key feature is the ability to dynamically update model weights without requiring FPGA resynthesis, enhancing flexibility for adaptive learning applications. To further enhance usability and accessibility, we introduce Auto-SNL, a Python extension that streamlines the process of converting Python-based neural network models into SNL-compatible high-level synthesis code. This paper presents a benchmark comparison against hls4ml, the current state-of-the-art tool, across multiple neural network architectures, fixed-point precisions, and synthesis configurations targeting a Xilinx ZCU102 FPGA. The results showed that SNL achieves competitive or superior latency in most tested architectures, while in some cases also offering FPGA resource savings. This adaptation demonstrates SNL's versatility, opening new opportunities for researchers and academics in fields such as high-energy physics, medical imaging, robotics, and many more.
Zero-Direction Probing: A Linear-Algebraic Framework for Deep Analysis of Large-Language-Model Drift
We present Zero-Direction Probing (ZDP), a theory-only framework for detecting model drift from null directions of transformer activations without task labels or output evaluations. Under assumptions A1--A6, we prove: (i) the Variance--Leak Theorem, (ii) Fisher Null-Conservation, (iii) a Rank--Leak bound for low-rank updates, and (iv) a logarithmic-regret guarantee for online null-space trackers. We derive a Spectral Null-Leakage (SNL) metric with non-asymptotic tail bounds and a concentration inequality, yielding a-priori thresholds for drift under a Gaussian null model. These results show that monitoring right/left null spaces of layer activations and their Fisher geometry provides concrete, testable guarantees on representational change.
Learning Energy-Based Models by Self-normalising the Likelihood
Senetaire, Hugo, Jeha, Paul, Mattei, Pierre-Alexandre, Frellsen, Jes
Training an energy-based model (EBM) with maximum likelihood is challenging due to the intractable normalisation constant. Traditional methods rely on expensive Markov chain Monte Carlo (MCMC) sampling to estimate the gradient of logartihm of the normalisation constant. We propose a novel objective called self-normalised log-likelihood (SNL) that introduces a single additional learnable parameter representing the normalisation constant compared to the regular log-likelihood. SNL is a lower bound of the log-likelihood, and its optimum corresponds to both the maximum likelihood estimate of the model parameters and the normalisation constant. We show that the SNL objective is concave in the model parameters for exponential family distributions. Unlike the regular log-likelihood, the SNL can be directly optimised using stochastic gradient techniques by sampling from a crude proposal distribution. We validate the effectiveness of our proposed method on various density estimation tasks as well as EBMs for regression. Our results show that the proposed method, while simpler to implement and tune, outperforms existing techniques.
FPGA-Accelerated SpeckleNN with SNL for Real-time X-ray Single-Particle Imaging
Dave, Abhilasha, Wang, Cong, Russell, James, Herbst, Ryan, Thayer, Jana
We implement a specialized version of our SpeckleNN model for real-time speckle pattern classification in X-ray Single-Particle Imaging (SPI) using the SLAC Neural Network Library (SNL) on an FPGA. This hardware is optimized for inference near detectors in high-throughput X-ray free-electron laser (XFEL) facilities like the Linac Coherent Light Source (LCLS). To fit FPGA constraints, we optimized SpeckleNN, reducing parameters from 5.6M to 64.6K (98.8% reduction) with 90% accuracy. We also compressed the latent space from 128 to 50 dimensions. Deployed on a KCU1500 FPGA, the model used 71% of DSPs, 75% of LUTs, and 48% of FFs, with an average power consumption of 9.4W. The FPGA achieved 45.015us inference latency at 200 MHz. On an NVIDIA A100 GPU, the same inference consumed ~73W and had a 400us latency. Our FPGA version achieved an 8.9x speedup and 7.8x power reduction over the GPU. Key advancements include model specialization and dynamic weight loading through SNL, eliminating time-consuming FPGA re-synthesis for fast, continuous deployment of (re)trained models. These innovations enable real-time adaptive classification and efficient speckle pattern vetoing, making SpeckleNN ideal for XFEL facilities. This implementation accelerates SPI experiments and enhances adaptability to evolving conditions.
'SNL' legends say comedy has become bigger, 'snarkier' and more political
Legendary "Saturday Night Live" cast members Jon Lovitz, Kevin Nealon, and Siobhan Fallon Hogan are reflecting on how comedy has changed since they started in the business, stating that it is more popular than ever -- and has more snark than it once did. Ahead of SNL's 50th anniversary special, Sunday, the three spoke to Fox News Digital about how the comedy landscape, particularly the stand-up comedy scene, has evolved in the era of social media and divisive politics -- marveling at how comedians have exploded in popularity in recent years and developed a freer form and more biting style. "When I started comedy, it was totally different. And it was a totally different time and generation, and it was not as much short attention span. I can look back at some of the sketches on SNL, and they're a lot longer than they are now because of the short attention span," Nealon told Fox, adding, "I think also comedy may have gotten a little more snarkier."
Analysis of Hardware Synthesis Strategies for Machine Learning in Collider Trigger and Data Acquisition
Jia, Haoyi, Dave, Abhilasha, Gonski, Julia, Herbst, Ryan
To fully exploit the physics potential of current and future high energy particle colliders, machine learning (ML) can be implemented in detector electronics for intelligent data processing and acquisition. The implementation of ML in real-time at colliders requires very low latencies that are unachievable with a software-based approach, requiring optimization and synthesis of ML algorithms for deployment on hardware. An analysis of neural network inference efficiency is presented, focusing on the application of collider trigger algorithms in field programmable gate arrays (FPGAs). Trade-offs are evaluated between two frameworks, the SLAC Neural Network Library (SNL) and hls4ml, in terms of resources and latency for different model sizes. Results highlight the strengths and limitations of each approach, offering valuable insights for optimizing real-time neural network deployments at colliders. This work aims to guide researchers and engineers in selecting the most suitable hardware and software configurations for real-time, resource-constrained environments.
Fast, accurate and lightweight sequential simulation-based inference using Gaussian locally linear mappings
Hรคggstrรถm, Henrik, Rodrigues, Pedro L. C., Oudoumanessah, Geoffroy, Forbes, Florence, Picchini, Umberto
Bayesian inference for complex models with an intractable likelihood can be tackled using algorithms performing many calls to computer simulators. These approaches are collectively known as "simulation-based inference" (SBI). Recent SBI methods have made use of neural networks (NN) to provide approximate, yet expressive constructs for the unavailable likelihood function and the posterior distribution. However, they do not generally achieve an optimal trade-off between accuracy and computational demand. In this work, we propose an alternative that provides both approximations to the likelihood and the posterior distribution, using structured mixtures of probability distributions. Our approach produces accurate posterior inference when compared to state-of-the-art NN-based SBI methods, while exhibiting a much smaller computational footprint. We illustrate our results on several benchmark models from the SBI literature.
Neural Likelihood Approximation for Integer Valued Time Series Data
O'Loughlin, Luke, Maclean, John, Black, Andrew
Stochastic processes defined on integer valued state spaces are popular within the physical A combination of factors such as non-linear dynamics, and biological sciences. These models are partial observation and a complicated latent structure necessary for capturing the dynamics of small makes the inference of parameter posteriors a systems where the individual nature of the challenging problem. The likelihood is generally intractable populations cannot be ignored and stochastic and although methods that can sample from effects are important. The inference of the the exact posteriors exist--many based on simulation parameters of such models, from time series of the model itself--these can become computationally data, is difficult due to intractability of the prohibitive in many situations (Doucet et al., 2015; likelihood; current methods, based on simulations Sherlock et al., 2015). Observations of a system with of the underlying model, can be so computationally low noise are particularly challenging for simulation expensive as to be prohibitive.
Implementation of a framework for deploying AI inference engines in FPGAs
Herbst, Ryan, Coffee, Ryan, Fronk, Nathan, Kim, Kukhee, Kim, Kuktae, Ruckman, Larry, Russell, J. J.
The LCLS2 Free Electron Laser FEL will generate xray pulses to beamline experiments at up to 1Mhz These experimentals will require new ultrahigh rate UHR detectors that can operate at rates above 100 kHz and generate data throughputs upwards of 1 TBs a data velocity which requires prohibitively large investments in storage infrastructure Machine Learning has demonstrated the potential to digest large datasets to extract relevant insights however current implementations show latencies that are too high for realtime data reduction objectives SLAC has endeavored on the creation of a software framework which translates MLs structures for deployment on Field Programmable Gate Arrays FPGAs deployed at the Edge of the data chain close to the instrumentation This framework leverages Xilinxs HLS framework presenting an API modeled after the open source Keras interface to the TensorFlow library This SLAC Neural Network Library SNL framework is designed with a streaming data approach optimizing the data flow between layers while minimizing the buffer data buffering requirements The goal is to ensure the highest possible framerate while keeping the maximum latency constrained to the needs of the experiment Our framework is designed to ensure the RTL implementation of the network layers supporting full redeployment of weights and biases without requiring resynthesis after training The ability to reduce the precision of the implemented networks through quantization is necessary to optimize the use of both DSP and memory resources in the FPGA We currently have a preliminary version of the toolset and are experimenting with both general purpose example networks and networks being designed for specific LCLS2 experiments.