AITopics | clock frequency

Collaborating Authors

clock frequency

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NeuroCoreX: An Open-Source FPGA-Based Spiking Neural Network Emulator with On-Chip Learning

Gautam, Ashish, Date, Prasanna, Kulkarni, Shruti, Patton, Robert, Potok, Thomas

arXiv.org Artificial IntelligenceJun-18-2025

--Spiking Neural Networks (SNNs) are computational models inspired by the structure and dynamics of biological neuronal networks. Their event-driven nature enables them to achieve high energy efficiency, particularly when deployed on neuromorphic hardware platforms. Unlike conventional Artificial Neural Networks (ANNs), which primarily rely on layered architectures, SNNs naturally support a wide range of connectivity patterns, from traditional layered structures to small-world graphs characterized by locally dense and globally sparse connections. In this work, we introduce NeuroCoreX, an FPGA-based emulator designed for the flexible co-design and testing of SNNs. NeuroCoreX supports all-to-all connectivity, providing the capability to implement diverse network topologies without architectural restrictions. It features a biologically motivated local learning mechanism based on Spike-Timing-Dependent Plasticity (STDP). The neuron model implemented within NeuroCoreX is the Leaky Integrate-and-Fire (LIF) model, with current-based synapses facilitating spike integration and transmission . A Universal Asynchronous Receiver-Transmitter (UART) interface is provided for programming and configuring the network parameters, including neuron, synapse, and learning rule settings. NeuroCoreX is released as an open-source framework, aiming to accelerate research and development in energy-efficient, biologically inspired computing. One of the primary goals of neuromorphic computing is to emulate the structure and dynamics of biological neuronal networks, achieving both brain-like energy efficiency and high computational accuracy.

artificial intelligence, machine learning, neuron, (18 more...)

arXiv.org Artificial Intelligence

2506.14138

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.68)

Industry:

Energy (0.70)
Government > Regional Government > North America Government > United States Government (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

E4: Energy-Efficient DNN Inference for Edge Video Analytics Via Early-Exit and DVFS

Zhang, Ziyang, Zhao, Yang, Chang, Ming-Ching, Lin, Changyao, Liu, Jie

arXiv.org Artificial IntelligenceMar-6-2025

Deep neural network (DNN) models are increasingly popular in edge video analytic applications. However, the compute-intensive nature of DNN models pose challenges for energy-efficient inference on resource-constrained edge devices. Most existing solutions focus on optimizing DNN inference latency and accuracy, often overlooking energy efficiency. They also fail to account for the varying complexity of video frames, leading to sub-optimal performance in edge video analytics. In this paper, we propose an Energy-Efficient Early-Exit (E4) framework that enhances DNN inference efficiency for edge video analytics by integrating a novel early-exit mechanism with dynamic voltage and frequency scaling (DVFS) governors. It employs an attention-based cascade module to analyze video frame diversity and automatically determine optimal DNN exit points. Additionally, E4 features a just-in-time (JIT) profiler that uses coordinate descent search to co-optimize CPU and GPU clock frequencies for each layer before the DNN exit points. Extensive evaluations demonstrate that E4 outperforms current state-of-the-art methods, achieving up to 2.8x speedup and 26% average energy saving while maintaining high accuracy.

clock frequency, energy consumption, frequency, (14 more...)

arXiv.org Artificial Intelligence

2503.04865

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Europe (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Shavette: Low Power Neural Network Acceleration via Algorithm-level Error Detection and Undervolting

Rinkinen, Mikael, Koskinen, Lauri, Silven, Olli, Safarpour, Mehdi

arXiv.org Artificial IntelligenceOct-17-2024

Reduced voltage operation is an effective technique for substantial energy efficiency improvement in digital circuits. This brief introduces a simple approach for enabling reduced voltage operation of Deep Neural Network (DNN) accelerators by mere software modifications. Conventional approaches for enabling reduced voltage operation e.g., Timing Error Detection (TED) systems, incur significant development costs and overheads, while not being applicable to the off-the-shelf components. Contrary to those, the solution proposed in this paper relies on algorithm-based error detection, and hence, is implemented with low development costs, does not require any circuit modifications, and is even applicable to commodity devices. By showcasing the solution through experimenting on popular DNNs, i.e., LeNet and VGG16, on a GPU platform, we demonstrate 18% to 25% energy saving with no accuracy loss of the models and negligible throughput compromise (< 3.9%), considering the overheads from integration of the error detection schemes into the DNN. The integration of presented algorithmic solution into the design is simpler when compared conventional TED based techniques that require extensive circuit-level modifications, cell library characterizations or special support from the design tools.

artificial intelligence, machine learning, voltage, (16 more...)

arXiv.org Artificial Intelligence

2410.13415

Country: Europe > Finland (0.18)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Embedded FPGA Developments in 130nm and 28nm CMOS for Machine Learning in Particle Detector Readout

Gonski, Julia, Gupta, Aseem, Jia, Haoyi, Kim, Hyunjoon, Rota, Lorenzo, Ruckman, Larry, Dragone, Angelo, Herbst, Ryan

arXiv.org Artificial IntelligenceJul-2-2024

Embedded field programmable gate array (eFPGA) technology allows the implementation of reconfigurable logic within the design of an application-specific integrated circuit (ASIC). This approach offers the low power and efficiency of an ASIC along with the ease of FPGA configuration, particularly beneficial for the use case of machine learning in the data pipeline of next-generation collider experiments. An open-source framework called "FABulous" was used to design eFPGAs using 130 nm and 28 nm CMOS technology nodes, which were subsequently fabricated and verified through testing. The capability of an eFPGA to act as a front-end readout chip was assessed using simulation of high energy particles passing through a silicon pixel sensor. A machine learning-based classifier, designed for reduction of sensor data at the source, was synthesized and configured onto the eFPGA. A successful proof-of-concept was demonstrated through reproduction of the expected algorithm result on the eFPGA with perfect accuracy. Further development of the eFPGA technology and its application to collider detector readout is discussed.

asic, clock frequency, efpga, (13 more...)

arXiv.org Artificial Intelligence

2404.17701

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry: Semiconductors & Electronics (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Quantized Context Based LIF Neurons for Recurrent Spiking Neural Networks in 45nm

Bezugam, Sai Sukruth, Wu, Yihao, Yoo, JaeBum, Strukov, Dmitri, Kim, Bongjin

arXiv.org Artificial IntelligenceApr-28-2024

In this study, we propose the first hardware implementation of a context-based recurrent spiking neural network (RSNN) emphasizing on integrating dual information streams within the neocortical pyramidal neurons specifically Context- Dependent Leaky Integrate and Fire (CLIF) neuron models, essential element in RSNN. We present a quantized version of the CLIF neuron (qCLIF), developed through a hardware-software codesign approach utilizing the sparse activity of RSNN. Implemented in a 45nm technology node, the qCLIF is compact (900um^2) and achieves a high accuracy of 90% despite 8 bit quantization on DVS gesture classification dataset. Our analysis spans a network configuration from 10 to 200 qCLIF neurons, supporting up to 82k synapses within a 1.86 mm^2 footprint, demonstrating scalability and efficiency

neural network, neuron, spike, (15 more...)

arXiv.org Artificial Intelligence

2404.18066

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Telecommunications > Networks (0.34)
Information Technology > Networks (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Fast Inner-Product Algorithms and Architectures for Deep Neural Network Accelerators

Pogue, Trevor E., Nicolici, Nicola

arXiv.org Artificial IntelligenceNov-20-2023

We introduce a new algorithm called the Free-pipeline Fast Inner Product (FFIP) and its hardware architecture that improve an under-explored fast inner-product algorithm (FIP) proposed by Winograd in 1968. Unlike the unrelated Winograd minimal filtering algorithms for convolutional layers, FIP is applicable to all machine learning (ML) model layers that can mainly decompose to matrix multiplication, including fully-connected, convolutional, recurrent, and attention/transformer layers. We implement FIP for the first time in an ML accelerator then present our FFIP algorithm and generalized architecture which inherently improve FIP's clock frequency and, as a consequence, throughput for a similar hardware cost. Finally, we contribute ML-specific optimizations for the FIP and FFIP algorithms and architectures. We show that FFIP can be seamlessly incorporated into traditional fixed-point systolic array ML accelerators to achieve the same throughput with half the number of multiply-accumulate (MAC) units, or it can double the maximum systolic array size that can fit onto devices with a fixed hardware budget. Our FFIP implementation for non-sparse ML models with 8 to 16-bit fixed-point inputs achieves higher throughput and compute efficiency than the best-in-class prior solutions on the same type of compute platform.

accelerator, architecture, throughput, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TC.2023.3334140

2311.12224

Country:

North America > Canada > Ontario > Hamilton (0.28)
Europe > United Kingdom > England > Hampshire > Southampton (0.04)
Europe > Romania > Vest Development Region > Timiș County > Timișoara (0.04)

Genre: Research Report (0.82)

Industry: Semiconductors & Electronics (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)

Add feedback

C3S Micro-architectural Enhancement: Spike Encoder Block and Relaxing Gamma Clock (Asynchronous)

Anand, Alok, Khokhlov, Ivan, Anand, Abhishek

arXiv.org Artificial IntelligenceJun-26-2023

The field of neuromorphic computing is rapidly evolving. As both biological accuracy and practical implementations are explored, existing architectures are modified and improved for both purposes. The Temporal Neural Network(TNN) style of architecture is a good basis for approximating biological neurons due to its use of timed pulses to encode data and a voltage-threshold-like system. Using the Temporal Neural Network cortical column C3S architecture design as a basis, this project seeks to augment the network's design. This project takes note of two ideas and presents their designs with the goal of improving existing cortical column architecture. One need in this field is for an encoder that could convert between common digital formats and timed neuronal spikes, as biologically accurate networks are temporal in nature. To this end, this project presents an encoder to translate between binary encoded values and timed spikes to be processed by the neural network. Another need is for the reduction of wasted processing time to idleness, caused by lengthy Gamma cycle processing bursts. To this end, this project presents a relaxation of Gamma cycles to allow for them to end arbitrarily early once the network has determined an output response. With the goal of contributing to the betterment of the field of neuromorphic computer architecture, designs for both a binary-to-spike encoder, as well as a Gamma cycle controller, are presented and evaluated for optimal design parameters, with overall system gain and performance.

artificial intelligence, comparator, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2306.15093

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.82)

Industry: Media (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.75)

Add feedback

ArrayFlex: A Systolic Array Architecture with Configurable Transparent Pipelining

Peltekis, C., Filippas, D., Dimitrakopoulos, G., Nicopoulos, C., Pnevmatikatos, D.

arXiv.org Artificial IntelligenceJun-6-2023

Convolutional Neural Networks (CNNs) are the state-of-the-art solution for many deep learning applications. For maximum scalability, their computation should combine high performance and energy efficiency. In practice, the convolutions of each CNN layer are mapped to a matrix multiplication that includes all input features and kernels of each layer and is computed using a systolic array. In this work, we focus on the design of a systolic array with configurable pipeline with the goal to select an optimal pipeline configuration for each CNN layer. The proposed systolic array, called ArrayFlex, can operate in normal, or in shallow pipeline mode, thus balancing the execution time in cycles and the operating clock frequency. By selecting the appropriate pipeline configuration per CNN layer, ArrayFlex reduces the inference latency of state-of-the-art CNNs by 11%, on average, as compared to a traditional fixed-pipeline systolic array. Most importantly, this result is achieved while using 13%-23% less power, for the same applications, thus offering a combined energy-delay-product efficiency between 1.4x and 1.8x.

artificial intelligence, machine learning, matrix multiplication, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.23919/DATE56975.2023.10136913

2211.126

Country:

Europe > Middle East > Cyprus (0.04)
Europe > Greece > Attica > Athens (0.04)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Implementing Neural Network-Based Equalizers in a Coherent Optical Transmission System Using Field-Programmable Gate Arrays

Freire, Pedro J., Srivallapanondh, Sasipim, Anderson, Michael, Spinnler, Bernhard, Bex, Thomas, Eriksson, Tobias A., Napoli, Antonio, Schairer, Wolfgang, Costa, Nelson, Blott, Michaela, Turitsyn, Sergei K., Prilepsky, Jaroslaw E.

arXiv.org Artificial IntelligenceFeb-19-2023

In this work, we demonstrate the offline FPGA realization of both recurrent and feedforward neural network (NN)-based equalizers for nonlinearity compensation in coherent optical transmission systems. First, we present a realization pipeline showing the conversion of the models from Python libraries to the FPGA chip synthesis and implementation. Then, we review the main alternatives for the hardware implementation of nonlinear activation functions. The main results are divided into three parts: a performance comparison, an analysis of how activation functions are implemented, and a report on the complexity of the hardware. The performance in Q-factor is presented for the cases of bidirectional long-short-term memory coupled with convolutional NN (biLSTM + CNN) equalizer, CNN equalizer, and standard 1-StpS digital back-propagation (DBP) for the simulation and experiment propagation of a single channel dual-polarization (SC-DP) 16QAM at 34 GBd along 17x70km of LEAF. The biLSTM+CNN equalizer provides a similar result to DBP and a 1.7 dB Q-factor gain compared with the chromatic dispersion compensation baseline in the experimental dataset. After that, we assess the Q-factor and the impact of hardware utilization when approximating the activation functions of NN using Taylor series, piecewise linear, and look-up table (LUT) approximations. We also show how to mitigate the approximation errors with extra training and provide some insights into possible gradient problems in the LUT approximation. Finally, to evaluate the complexity of hardware implementation to achieve 200G and 400G throughput, fixed-point NN-based equalizers with approximated activation functions are developed and implemented in an FPGA.

approximation, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JLT.2023.3272011

2212.04703

Country:

North America > United States (0.04)
Europe > United Kingdom (0.04)
Europe > Portugal (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Energy > Power Industry (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dynamic GPU Energy Optimization for Machine Learning Training Workloads

Wang, Farui, Zhang, Weizhe, Lai, Shichao, Hao, Meng, Wang, Zheng

arXiv.org Artificial IntelligenceJan-5-2022

GPUs are widely used to accelerate the training of machine learning workloads. As modern machine learning models become increasingly larger, they require a longer time to train, leading to higher GPU energy consumption. This paper presents GPOEO, an online GPU energy optimization framework for machine learning training workloads. GPOEO dynamically determines the optimal energy configuration by employing novel techniques for online measurement, multi-objective prediction modeling, and search optimization. To characterize the target workload behavior, GPOEO utilizes GPU performance counters. To reduce the performance counter profiling overhead, it uses an analytical model to detect the training iteration change and only collects performance counter data when an iteration shift is detected. GPOEO employs multi-objective models based on gradient boosting and a local search algorithm to find a trade-off between execution time and energy consumption. We evaluate the GPOEO by applying it to 71 machine learning workloads from two AI benchmark suites running on an NVIDIA RTX3080Ti GPU. Compared with the NVIDIA default scheduling strategy, GPOEO delivers a mean energy saving of 16.2% with a modest average execution time increase of 5.1%.

application, clock frequency, frequency, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TPDS.2021.3137867

2201.01684

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.84)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
(2 more...)

Add feedback