AITopics | power dissipation

A Massively Parallel Digital Learning Processor

Neural Information Processing SystemsFeb-16-2024, 12:22:29 GMT

We present a new, massively parallel architecture for accelerating machine learning algorithms, based on arrays of variable-resolution arithmetic vector processing elements (VPE). Groups of VPEs operate in SIMD (single instruction multiple data) mode, and each group is connected to an independent memory bank. In this way memory bandwidth scales with the number of VPE, and the main data flows are local, keeping power dissipation low. With 256 VPEs, implemented on two FPGA (field programmable gate array) chips, we obtain a sustained speed of 19 GMACS (billion multiply-accumulate per sec.) for SVM training, and 86 GMACS for SVM classification. This performance is more than an order of magnitude higher than that of any FPGA implementation reported so far.

clock rate, massively parallel digital learning processor, power dissipation, (3 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Realtime Facial Expression Recognition: Neuromorphic Hardware vs. Edge AI Accelerators

Smith, Heath, Seekings, James, Mohammadi, Mohammadreza, Zand, Ramtin

arXiv.org Artificial IntelligenceJan-30-2024

The paper focuses on real-time facial expression recognition (FER) systems as an important component in various real-world applications such as social robotics. We investigate two hardware options for the deployment of FER machine learning (ML) models at the edge: neuromorphic hardware versus edge AI accelerators. Our study includes exhaustive experiments providing comparative analyses between the Intel Loihi neuromorphic processor and four distinct edge platforms: Raspberry Pi-4, Intel Neural Compute Stick (NSC), Jetson Nano, and Coral TPU. The results obtained show that Loihi can achieve approximately two orders of magnitude reduction in power dissipation and one order of magnitude energy savings compared to Coral TPU which happens to be the least power-intensive and energy-consuming edge AI accelerator. These reductions in power and energy are achieved while the neuromorphic solution maintains a comparable level of accuracy with the edge accelerators, all within the real-time latency requirements.

edge ai accelerator, latency, loihi, (11 more...)

arXiv.org Artificial Intelligence

2403.08792

Country:

North America > United States > South Carolina > Richland County > Columbia (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Hardware (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

A Massively Parallel Digital Learning Processor

Graf, Hans P., Cadambi, Srihari, Jakkula, Venkata, Sankaradass, Murugan, Cosatto, Eric, Chakradhar, Srimat, Dourdanovic, Igor

Neural Information Processing SystemsFeb-15-2020, 01:56:09 GMT

We present a new, massively parallel architecture for accelerating machine learning algorithms, based on arrays of variable-resolution arithmetic vector processing elements (VPE). Groups of VPEs operate in SIMD (single instruction multiple data) mode, and each group is connected to an independent memory bank. In this way memory bandwidth scales with the number of VPE, and the main data flows are local, keeping power dissipation low. With 256 VPEs, implemented on two FPGA (field programmable gate array) chips, we obtain a sustained speed of 19 GMACS (billion multiply-accumulate per sec.) for SVM training, and 86 GMACS for SVM classification. This performance is more than an order of magnitude higher than that of any FPGA implementation reported so far.

clock rate, massively parallel digital learning processor, power dissipation, (3 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SNRA: A Spintronic Neuromorphic Reconfigurable Array for In-Circuit Training and Evaluation of Deep Belief Networks

Zand, Ramtin, DeMara, Ronald F.

arXiv.org Machine LearningJan-8-2019

Abstract--In this paper, a spintronic neuromorphic reconfigurable Array(SNRA) is developed to fuse together power-efficient probabilistic and infield programmable deterministic computing during both training and evaluation phases of restricted Boltzmann machines(RBMs). First, probabilistic spin logic devices are used to develop an RBM realization which is adapted to construct deep belief networks (DBNs) having one to three hidden layers of size 10 to 800 neurons each. The functionality of our proposed CD hardware implementation is validated using ModelSim simulations. We synthesize the developed Verilog HDL implementation of our proposed test/train control circuitry for various DBN topologies where the maximal RBM dimensions yield resource utilization ranging from 51 to 2,421 lookup tables (LUTs). Next, we leverage spin Hall effect (SHE)-magnetic tunnel junction (MTJ) based nonvolatile LUTs circuits as an alternative for static random access memory (SRAM)-based LUTs storing the deterministic logic configuration to form a reconfigurable fabric. Finally, we compare the performance of our proposed SNRA with SRAMbased configurablefabrics focusing on the area and power consumption induced by the LUTs used to implement both CD and evaluation modes. The results obtained indicate more than 80% reduction in combined dynamic and static power dissipation, while achieving at least 50% reduction in device count.

ieee transaction, opération, rbm, (14 more...)

arXiv.org Machine Learning

1901.02415

Country:

North America > United States > Florida > Orange County > Orlando (0.14)
Asia > Japan > Honshū > Tōhoku (0.04)

Genre: Research Report (0.50)

Industry:

Semiconductors & Electronics (0.68)
Health & Medicine > Consumer Health (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)

Add feedback

Why use RBF Learning rather than Deep Learning in an industrial environment

#artificialintelligenceDec-13-2017, 19:45:50 GMT

One of today's most overused buzzword is "Artificial Intelligence". Both technical and general press is full of articles talking about machines that drive autonomous cars and invent new languages. Machine Learning is an essential part of the AI puzzle and Deep Learning is one of the most popular approaches to implement Machine Learning. Interestingly, Deep Learning is not new. Geoffrey Hinton demonstrated the use of back-propagation of errors for training multi-layer neural networks in 1986, more than 30 years ago. Even earlier, in the 60's, Kelley, Bryson and Ho published research papers about dynamic optimization which many consider as the basis for back-propagation.

artificial intelligence, deep learning, machine learning, (15 more...)

#artificialintelligence

Industry:

Information Technology (0.91)
Transportation > Ground (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

3D Stacking Could Boost GPU Machine Learning

#artificialintelligenceMar-15-2017, 01:30:28 GMT

Nvidia has staked its growth in the datacenter on machine learning. Over the past few years, the company has rolled out features in its GPUs aimed neural networks and related processing, notably with the "Pascal" generation GPUs with features explicitly designed for the space, such as 16-bit half precision math. The company is preparing its upcoming "Volta" GPU architecture, which promises to offer significant gains in capabilities. More details on the Volta chip are expected at Nvidia's annual conference in May. CEO Jen-Hsun Huang late last year spoke to The Next Platform about what he called the upcoming "hyper-Moore's Law" era in HPC and supercomputers that will drive such emerging technologies as AI and deep learning and in which GPUs will play an increasingly central role.

artificial intelligence, machine learning, power dissipation, (15 more...)

#artificialintelligence

Industry: Semiconductors & Electronics (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

A universal tradeoff between power, precision and speed in physical communication

Lahiri, Subhaneil, Sohl-Dickstein, Jascha, Ganguli, Surya

arXiv.org Machine LearningMar-24-2016

Maximizing the speed and precision of communication while minimizing power dissipation is a fundamental engineering design goal. Also, biological systems achieve remarkable speed, precision and power efficiency using poorly understood physical design principles. Powerful theories like information theory and thermodynamics do not provide general limits on power, precision and speed. Here we go beyond these classical theories to prove that the product of precision and speed is universally bounded by power dissipation in any physical communication channel whose dynamics is faster than that of the signal. Moreover, our derivation involves a novel connection between friction and information geometry. These results may yield insight into both the engineering design of communication devices and the structure and function of biological signaling systems.

artificial intelligence, information, machine learning, (17 more...)

arXiv.org Machine Learning

1603.07758

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

A Massively Parallel Digital Learning Processor

Graf, Hans P., Cadambi, Srihari, Jakkula, Venkata, Sankaradass, Murugan, Cosatto, Eric, Chakradhar, Srimat, Dourdanovic, Igor

Neural Information Processing SystemsDec-31-2009

We present a new, massively parallel architecture for accelerating machine learning algorithms, based on arrays of variable-resolution arithmetic vector processing elements (VPE). Groups of VPEs operate in SIMD (single instruction multiple data) mode, and each group is connected to an independent memory bank. In this way memory bandwidth scales with the number of VPE, and the main data flows are local, keeping power dissipation low. With 256 VPEs, implemented on two FPGA (field programmable gate array) chips, we obtain a sustained speed of 19 GMACS (billion multiply-accumulate per sec.) for SVM training, and 86 GMACS for SVM classification. This performance is more than an order of magnitude higher than that of any FPGA implementation reported so far. The speed on one FPGA is similar to the fastest speeds published on a Graphics Processor for the MNIST problem, despite a clock rate of the FPGA that is six times lower. High performance at low clock rates makes this massively parallel architecture particularly attractive for embedded applications, where low power dissipation is critical. Tests with Convolutional Neural Networks and other learning algorithms are under way now.

artificial intelligence, computation, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry:

Automobiles & Trucks (0.57)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Sub-Microwatt Analog VLSI Support Vector Machine for Pattern Classification and Sequence Estimation

Chakrabartty, Shantanu, Cauwenberghs, Gert

Neural Information Processing SystemsDec-31-2005

An analog system-on-chip for kernel-based pattern classification and sequence estimation is presented. State transition probabilities conditioned on input data are generated by an integrated support vector machine. Dot product based kernels and support vector coefficients are implemented in analog programmable floating gate translinear circuits, and probabilities are propagated and normalized using sub-threshold current-mode circuits. A 14-input, 24-state, and 720-support vector forward decoding kernel machine is integrated on a 3mm 3mm chip in 0.5µm CMOS technology. Experiments with the processor trained for speaker verification and phoneme sequence estimation demonstrate real-time recognition accuracy at par with floating-point software, at sub-microwatt power.

coefficient, probability, sequence, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry: Semiconductors & Electronics (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Sub-Microwatt Analog VLSI Support Vector Machine for Pattern Classification and Sequence Estimation

Chakrabartty, Shantanu, Cauwenberghs, Gert

Neural Information Processing SystemsDec-31-2005

An analog system-on-chip for kernel-based pattern classification and sequence estimation is presented. State transition probabilities conditioned on input data are generated by an integrated support vector machine. Dot product based kernels and support vector coefficients are implemented in analog programmable floating gate translinear circuits, and probabilities are propagated and normalized using sub-threshold current-mode circuits. A 14-input, 24-state, and 720-support vector forward decoding kernel machine is integrated on a 3mm 3mm chip in 0.5µm CMOS technology. Experiments with the processor trained for speaker verification and phoneme sequence estimation demonstrate real-time recognition accuracy at par with floating-point software, at sub-microwatt power.

coefficient, probability, sequence, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry: Semiconductors & Electronics (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Filters

Collaborating Authors

power dissipation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

A Massively Parallel Digital Learning Processor

Realtime Facial Expression Recognition: Neuromorphic Hardware vs. Edge AI Accelerators

A Massively Parallel Digital Learning Processor

SNRA: A Spintronic Neuromorphic Reconfigurable Array for In-Circuit Training and Evaluation of Deep Belief Networks

Why use RBF Learning rather than Deep Learning in an industrial environment

3D Stacking Could Boost GPU Machine Learning

A universal tradeoff between power, precision and speed in physical communication

A Massively Parallel Digital Learning Processor

Sub-Microwatt Analog VLSI Support Vector Machine for Pattern Classification and Sequence Estimation

Sub-Microwatt Analog VLSI Support Vector Machine for Pattern Classification and Sequence Estimation