efpga
Runtime Tunable Tsetlin Machines for Edge Inference on eFPGAs
Rahman, Tousif, Mao, Gang, Pattison, Bob, Maheshwari, Sidharth, Sartori, Marcos, Wheeldon, Adrian, Shafik, Rishad, Yakovlev, Alex
Embedded Field-Programmable Gate Arrays (eFPGAs) allow for the design of hardware accelerators of edge Machine Learning (ML) applications at a lower power budget compared with traditional FPGA platforms. However, the limited eFPGA logic and memory significantly constrain compute capabilities and model size. As such, ML application deployment on eFPGAs is in direct contrast with the most recent FPGA approaches developing architecture-specific implementations and maximizing throughput over resource frugality. This paper focuses on the opposite side of this trade-off: the proposed eFPGA accelerator focuses on minimizing resource usage and allowing flexibility for on-field recalibration over throughput. This allows for runtime changes in model size, architecture, and input data dimensionality without offline resynthesis. This is made possible through the use of a bitwise compressed inference architecture of the Tsetlin Machine (TM) algorithm. TM compute does not require any multiplication operations, being limited to only bitwise AND, OR, NOT, summations and additions. Additionally, TM model compression allows the entire model to fit within the on-chip block RAM of the eFPGA. The paper uses this accelerator to propose a strategy for runtime model tuning in the field. The proposed approach uses 2.5x fewer Look-up-Tables (LUTs) and 3.38x fewer registers than the current most resource-fugal design and achieves up to 129x energy reduction compared with low-power microcontrollers running the same ML application.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Texas > Travis County > Austin (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
Embedded FPGA Developments in 130nm and 28nm CMOS for Machine Learning in Particle Detector Readout
Gonski, Julia, Gupta, Aseem, Jia, Haoyi, Kim, Hyunjoon, Rota, Lorenzo, Ruckman, Larry, Dragone, Angelo, Herbst, Ryan
Embedded field programmable gate array (eFPGA) technology allows the implementation of reconfigurable logic within the design of an application-specific integrated circuit (ASIC). This approach offers the low power and efficiency of an ASIC along with the ease of FPGA configuration, particularly beneficial for the use case of machine learning in the data pipeline of next-generation collider experiments. An open-source framework called "FABulous" was used to design eFPGAs using 130 nm and 28 nm CMOS technology nodes, which were subsequently fabricated and verified through testing. The capability of an eFPGA to act as a front-end readout chip was assessed using simulation of high energy particles passing through a silicon pixel sensor. A machine learning-based classifier, designed for reduction of sensor data at the source, was synthesized and configured onto the eFPGA. A successful proof-of-concept was demonstrated through reproduction of the expected algorithm result on the eFPGA with perfect accuracy. Further development of the eFPGA technology and its application to collider detector readout is discussed.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- (3 more...)
Flex Logix Improves Deep Learning Performance By 10X With New EFLX4K AI eFPGA Core
This new core has been specifically designed to enhance the performance of deep learning by 10X and enable more neural network processing per square millimeter. Many companies are using FPGA to implement AI and more specifically machine learning, deep learning and neural networks as approaches to achieve AI. The key function needed for AI are matrix multipliers, which consist of arrays of MACs (multiplier accumulators). In existing FPGA and eFPGAs, the MACs are optimized for DSPs with larger multipliers, pre-adders and other logic which are overkill for AI. For AI applications, smaller multipliers such as 16 bits or 8 bits, with the ability to support both modes with accumulators, allow more neural network processing per square millimeter.
- North America > United States > California > Santa Clara County > Mountain View (0.16)
- Europe (0.05)
- Asia > Taiwan (0.05)
- (3 more...)