AITopics | fpga

Collaborating Authors

fpga

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards a Comprehensive Benchmark for High-Level Synthesis Targeted to FPGAs

Neural Information Processing SystemsDec-26-2025, 07:57:39 GMT

High-level synthesis (HLS) aims to raise the abstraction layer in hardware design, enabling the design of domain-specific accelerators (DSAs) like field-programmable gate arrays (FPGAs) using C/C++ instead of hardware description languages (HDLs). Compiler directives in the form of pragmas play a crucial role in modifying the microarchitecture within the HLS framework. However, the space of possible microarchitectures grows exponentially with the number of pragmas.

comprehensive benchmark, high-level synthesis targeted, name change, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

Model Recovery at the Edge under Resource Constraints for Physical AI

Xu, Bin, Banerjee, Ayan, Gupta, Sandeep K. S.

arXiv.org Artificial IntelligenceDec-3-2025

Model Recovery (MR) enables safe, explainable decision making in mission-critical autonomous systems (MCAS) by learning governing dynamical equations, but its deployment on edge devices is hindered by the iterative nature of neural ordinary differential equations (NODEs), which are inefficient on FPGAs. Memory and energy consumption are the main concerns when applying MR on edge devices for real-time operation. We propose MERINDA, a novel FPGA-accelerated MR framework that replaces iterative solvers with a parallelizable neural architecture equivalent to NODEs. MERINDA achieves nearly 11x lower DRAM usage and 2.2x faster runtime compared to mobile GPUs. Experiments reveal an inverse relationship between memory and energy at fixed accuracy, highlighting MERINDA's suitability for resource-constrained, real-time MCAS.

artificial intelligence, machine learning, real time system, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA251275

2512.02283

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
Atlantic Ocean > North Atlantic Ocean > Hudson Bay (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.68)
Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware

Schulte, Jan-Frederik, Ramhorst, Benjamin, Sun, Chang, Mitrevski, Jovan, Ghielmetti, Nicolò, Lupi, Enrico, Danopoulos, Dimitrios, Loncar, Vladimir, Duarte, Javier, Burnette, David, Laatu, Lauri, Tzelepis, Stylianos, Axiotis, Konstantinos, Berthet, Quentin, Wang, Haoyan, White, Paul, Demirsoy, Suleyman, Colombo, Marco, Aarrestad, Thea, Summers, Sioni, Pierini, Maurizio, Di Guglielmo, Giuseppe, Ngadiuba, Jennifer, Campos, Javier, Hawks, Ben, Gandrakota, Abhijith, Fahim, Farah, Tran, Nhan, Constantinides, George, Que, Zhiqiang, Luk, Wayne, Tapper, Alexander, Hoang, Duc, Paladino, Noah, Harris, Philip, Lai, Bo-Cheng, Valentin, Manuel, Forelli, Ryan, Ogrenci, Seda, Gerlach, Lino, Flynn, Rian, Liu, Mia, Diaz, Daniel, Khoda, Elham, Quinnan, Melissa, Solares, Russell, Parajuli, Santosh, Neubauer, Mark, Herwig, Christian, Tsoi, Ho Fung, Rankin, Dylan, Hsu, Shih-Chieh, Hauck, Scott

arXiv.org Artificial IntelligenceDec-3-2025

We present hls4ml, a free and open-source platform that translates machine learning (ML) models from modern deep learning frameworks into high-level synthesis (HLS) code that can be integrated into full designs for field-programmable gate arrays (FPGAs) or application-specific integrated circuits (ASICs). With its flexible and modular design, hls4ml supports a large number of deep learning frameworks and can target HLS compilers from several vendors, including Vitis HLS, Intel oneAPI and Catapult HLS. Together with a wider eco-system for software-hardware co-design, hls4ml has enabled the acceleration of ML inference in a wide range of commercial and scientific applications where low latency, resource usage, and power consumption are critical. In this paper, we describe the structure and functionality of the hls4ml platform. The overarching design considerations for the generated HLS code are discussed, together with selected performance results.

artificial intelligence, fpga, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2512.01463

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(26 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Health & Medicine > Therapeutic Area (0.92)
Energy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond the GPU: The Strategic Role of FPGAs in the Next Wave of AI

Jiménez, Arturo Urías

arXiv.org Artificial IntelligenceNov-18-2025

AI acceleration has been dominated by GPUs, but the growing need for lower latency, energy efficiency, and fine-grained hardware control exposes the limits of fixed architectures. In this context, Field-Programmable Gate Arrays (FPGAs) emerge as a reconfigurable platform that allows mapping AI algorithms directly into device logic. Their ability to implement parallel pipelines for convolutions, attention mechanisms, and post-processing with deterministic timing and reduced power consumption makes them a strategic option for workloads that demand predictable performance and deep customization. Unlike CPUs and GPUs, whose architecture is immutable, an FPGA can be reconfigured in the field to adapt its physical structure to a specific model, integrate as a SoC with embedded processors, and run inference near the sensor without sending raw data to the cloud. This reduces latency and required bandwidth, improves privacy, and frees GPUs from specialized tasks in data centers. Partial reconfiguration and compilation flows from AI frameworks are shortening the path from prototype to deployment, enabling hardware--algorithm co-design.

artificial intelligence, fpga, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.11614

Genre: Research Report (0.42)

Industry: Information Technology (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

JEDI-linear: Fast and Efficient Graph Neural Networks for Jet Tagging on FPGAs

Que, Zhiqiang, Sun, Chang, Paramesvaran, Sudarshan, Clement, Emyr, Karakoulaki, Katerina, Brown, Christopher, Laatu, Lauri, Cox, Arianna, Tapper, Alexander, Luk, Wayne, Spiropulu, Maria

arXiv.org Artificial IntelligenceNov-18-2025

Graph Neural Networks (GNNs), particularly Interaction Networks (INs), have shown exceptional performance for jet tagging at the CERN High-Luminosity Large Hadron Collider (HL-LHC). However, their computational complexity and irregular memory access patterns pose significant challenges for deployment on FPGAs in hardware trigger systems, where strict latency and resource constraints apply. In this work, we propose JEDI-linear, a novel GNN architecture with linear computational complexity that eliminates explicit pairwise interactions by leveraging shared transformations and global aggregation. To further enhance hardware efficiency, we introduce fine-grained quantization-aware training with per-parameter bitwidth optimization and employ multiplier-free multiply-accumulate operations via distributed arithmetic. Evaluation results show that our FPGA-based JEDI-linear achieves 3.7 to 11.5 times lower latency, up to 150 times lower initiation interval, and up to 6.2 times lower LUT usage compared to state-of-the-art GNN designs while also delivering higher model accuracy and eliminating the need for DSP blocks entirely. This is the first interaction-based GNN to achieve less than 60~ns latency and currently meets the requirements for use in the HL-LHC CMS Level-1 trigger system. This work advances the next-generation trigger systems by enabling accurate, scalable, and resource-efficient GNN inference in real-time environments. Our open-sourced templates will further support reproducibility and broader adoption across scientific applications.

artificial intelligence, machine learning, particle, (18 more...)

arXiv.org Artificial Intelligence

2508.15468

Country:

Europe > Ukraine > Volyn Oblast > Luts'k (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(2 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Knowledge is Overrated: A zero-knowledge machine learning and cryptographic hashing-based framework for verifiable, low latency inference at the LHC

Jawahar, Pratik, Doglioni, Caterina, Pierini, Maurizio

arXiv.org Machine LearningNov-18-2025

Low latency event-selection (trigger) algorithms are essential components of Large Hadron Collider (LHC) operation. Modern machine learning (ML) models have shown great offline performance as classifiers and could improve trigger performance, thereby improving downstream physics analyses. However, inference on such large models does not satisfy the $40\text{MHz}$ online latency constraint at the LHC. In this work, we propose \texttt{PHAZE}, a novel framework built on cryptographic techniques like hashing and zero-knowledge machine learning (zkML) to achieve low latency inference, via a certifiable, early-exit mechanism from an arbitrarily large baseline model. We lay the foundations for such a framework to achieve nanosecond-order latency and discuss its inherent advantages, such as built-in anomaly detection, within the scope of LHC triggers, as well as its potential to enable a dynamic low-level trigger in the future.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2511.12592

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Role of Advanced Computer Architectures in Accelerating Artificial Intelligence Workloads

Amin, Shahid, Shah, Syed Pervez Hussnain

arXiv.org Artificial IntelligenceNov-14-2025

The remarkable progress in Artificial Intelligence (AI) is foundation-ally linked to a concurrent revolution in computer architecture. As AI models, particularly Deep Neural Networks (DNNs), have grown in complexity, their massive computational demands have pushed traditional architectures to their limits. This paper provides a structured review of this co-evolution, analyzing the architectural landscape designed to accelerate modern AI workloads. We explore the dominant architectural paradigms Graphics Processing Units (GPUs), Appli-cation-Specific Integrated Circuits (ASICs), and Field-Programmable Gate Ar-rays (FPGAs) by breaking down their design philosophies, key features, and per-formance trade-offs. The core principles essential for performance and energy efficiency, including dataflow optimization, advanced memory hierarchies, spar-sity, and quantization, are analyzed. Furthermore, this paper looks ahead to emerging technologies such as Processing-in-Memory (PIM) and neuromorphic computing, which may redefine future computation. By synthesizing architec-tural principles with quantitative performance data from industry-standard benchmarks, this survey presents a comprehensive picture of the AI accelerator landscape. We conclude that AI and computer architecture are in a symbiotic relationship, where hardware-software co-design is no longer an optimization but a necessity for future progress in computing.

artificial intelligence, machine learning, survey article, (19 more...)

arXiv.org Artificial Intelligence

2511.1001

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)

Genre: Overview (1.00)

Industry:

Information Technology (1.00)
Semiconductors & Electronics (0.67)

Technology:

Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

FPGA or GPU? Analyzing comparative research for application-specific guidance

Purkayastha, Arnab A, Tharwani, Jay, Aggarwal, Shobhit

arXiv.org Artificial IntelligenceNov-11-2025

The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) have emerged as prominent solutions, each excelling in specific domains. Although there is substantial research comparing FPGAs and GPUs, most of the work focuses primarily on performance metrics, offering limited insight into the specific types of applications that each accelerator benefits the most. This paper aims to bridge this gap by synthesizing insights from various research articles to guide users in selecting the appropriate accelerator for domain-specific applications. By categorizing the reviewed studies and analyzing key performance metrics, this work highlights the strengths, limitations, and ideal use cases for FPGAs and GPUs. The findings offer actionable recommendations, helping researchers and practitioners navigate trade-offs in performance, energy efficiency, and programmability.

application, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.06565

Country:

North America > United States > Virginia (0.04)
North America > United States > North Carolina > Mecklenburg County > Charlotte (0.04)
North America > United States > Massachusetts > Hampden County > Springfield (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Sub-microsecond Transformers for Jet Tagging on FPGAs

Laatu, Lauri, Sun, Chang, Cox, Arianna, Gandrakota, Abhijith, Maier, Benedikt, Ngadiuba, Jennifer, Que, Zhiqiang, Luk, Wayne, Spiropulu, Maria, Tapper, Alexander

arXiv.org Artificial IntelligenceOct-30-2025

We present the first sub-microsecond transformer implementation on an FPGA achieving competitive performance for state-of-the-art high-energy physics benchmarks. Transformers have shown exceptional performance on multiple tasks in modern machine learning applications, including jet tagging at the CERN Large Hadron Collider (LHC). However, their computational complexity prohibits use in real-time applications, such as the hardware trigger system of the collider experiments up until now. In this work, we demonstrate the first application of transformers for jet tagging on FPGAs, achieving $\mathcal{O}(100)$ nanosecond latency with superior performance compared to alternative baseline models. We leverage high-granularity quantization and distributed arithmetic optimization to fit the entire transformer model on a single FPGA, achieving the required throughput and latency. Furthermore, we add multi-head attention and linear attention support to hls4ml, making our work accessible to the broader fast machine learning community. This work advances the next-generation trigger systems for the High Luminosity LHC, enabling the use of transformers for real-time applications in high-energy physics and beyond.

artificial intelligence, machine learning, particle, (17 more...)

arXiv.org Artificial Intelligence

2510.24784

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Ukraine > Volyn Oblast > Luts'k (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

StrikeWatch: Wrist-worn Gait Recognition with Compact Time-series Models on Low-power FPGAs

Ling, Tianheng, Qian, Chao, Zdankin, Peter, Weis, Torben, Schiele, Gregor

arXiv.org Artificial IntelligenceOct-30-2025

Abstract--Running offers substantial health benefits, but improper gait patterns can lead to injuries, particularly without expert feedback. While prior gait analysis systems based on cameras, insoles, or body-mounted sensors have demonstrated effectiveness, they are often bulky and limited to offline, post-run analysis. Wrist-worn wearables offer a more practical and non-intrusive alternative, yet enabling real-time gait recognition on such devices remains challenging due to noisy Inertial Measurement Unit (IMU) signals, limited computing resources, and dependence on cloud connectivity. This paper introduces StrikeW atch, a compact wrist-worn system that performs entirely on-device, real-time gait recognition using IMU signals. As a case study, we target the detection of heel versus forefoot strikes to enable runners to self-correct harmful gait patterns through visual and auditory feedback during running. We propose four compact DL architectures (1D-CNN, 1D-SepCNN, LSTM, and Transformer) and optimize them for energy-efficient inference on two representative embedded Field-Programmable Gate Arrays (FPGAs): the AMD Spartan-7 XC7S15 and the Lattice iCE40UP5K. Using our custom-built hardware prototype, we collect a labeled dataset from outdoor running sessions and evaluate all models via a fully automated deployment pipeline. Our results reveal clear trade-offs between model complexity and hardware efficiency. Evaluated across 12 participants, 6-bit quantized 1D-SepCNN achieves the highest average F1 score of 0.847 while consuming just 0.350 µJ per inference with a latency of 0.140 ms on the iCE40UP5K running at 20 MHz. This configuration supports up to 13.6 days of continuous inference on a 320 mAh battery. Running is one of the most widely practiced sports worldwide, offering significant physical and mental benefits [1].

artificial intelligence, machine learning, pattern recognition, (20 more...)

arXiv.org Artificial Intelligence

2510.24738

Country: Europe > Germany (0.04)

Genre: Research Report > New Finding (0.88)

Industry:

Energy > Energy Storage (0.34)
Health & Medicine > Public Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback