nvdla
Tempus Core: Area-Power Efficient Temporal-Unary Convolution Core for Low-Precision Edge DLAs
Vellaisamy, Prabhu, Nair, Harideep, Kang, Thomas, Ni, Yichen, Fan, Haoyang, Qi, Bin, Chen, Jeff, Blanton, Shawn, Shen, John Paul
The increasing complexity of deep neural networks (DNNs) poses significant challenges for edge inference deployment due to resource and power constraints of edge devices. Recent works on unary-based matrix multiplication hardware aim to leverage data sparsity and low-precision values to enhance hardware efficiency. However, the adoption and integration of such unary hardware into commercial deep learning accelerators (DLA) remain limited due to processing element (PE) array dataflow differences. This work presents Tempus Core, a convolution core with highly scalable unary-based PE array comprising of tub (temporal-unary-binary) multipliers that seamlessly integrates with the NVDLA (NVIDIA's open-source DLA for accelerating CNNs) while maintaining dataflow compliance and boosting hardware efficiency. Analysis across various datapath granularities shows that for INT8 precision in 45nm CMOS, Tempus Core's PE cell unit (PCU) yields 59.3% and 15.3% reductions in area and power consumption, respectively, over NVDLA's CMAC unit. Considering a 16x16 PE array in Tempus Core, area and power improves by 75% and 62%, respectively, while delivering 5x and 4x iso-area throughput improvements for INT8 and INT4 precisions. Post-place and route analysis of Tempus Core's PCU shows that the 16x4 PE array for INT4 precision in 45nm CMOS requires only 0.017 mm^2 die area and consumes only 6.2mW of total power. We demonstrate that area-power efficient unary-based hardware can be seamlessly integrated into conventional DLAs, paving the path for efficient unary hardware for edge AI inference.
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Mercury: An Automated Remote Side-channel Attack to Nvidia Deep Learning Accelerator
Yan, Xiaobei, Lou, Xiaoxuan, Xu, Guowen, Qiu, Han, Guo, Shangwei, Chang, Chip Hong, Zhang, Tianwei
DNN accelerators have been widely deployed in many scenarios to speed up the inference process and reduce the energy consumption. One big concern about the usage of the accelerators is the confidentiality of the deployed models: model inference execution on the accelerators could leak side-channel information, which enables an adversary to preciously recover the model details. Such model extraction attacks can not only compromise the intellectual property of DNN models, but also facilitate some adversarial attacks. Although previous works have demonstrated a number of side-channel techniques to extract models from DNN accelerators, they are not practical for two reasons. (1) They only target simplified accelerator implementations, which have limited practicality in the real world. (2) They require heavy human analysis and domain knowledge. To overcome these limitations, this paper presents Mercury, the first automated remote side-channel attack against the off-the-shelf Nvidia DNN accelerator. The key insight of Mercury is to model the side-channel extraction process as a sequence-to-sequence problem. The adversary can leverage a time-to-digital converter (TDC) to remotely collect the power trace of the target model's inference. Then he uses a learning model to automatically recover the architecture details of the victim model from the power trace without any prior knowledge. The adversary can further use the attention mechanism to localize the leakage points that contribute most to the attack. Evaluation results indicate that Mercury can keep the error rate of model extraction below 1%.
The Linley Group
Designers no longer need to worry about the costs of deep-learning acceleration: Nvidia is making the technology available for free. The company has extracted the deep-learning accelerator (NVDLA) from its Xavier autonomous-driving processor and is offering it for use under a royalty-free open-source license. It's managing the NVDLA project as a directed community, which it supports with comprehensive documentation and instructions. Nvidia delivers the NVDLA core as synthesizable Verilog RTL code, along with a step-by-step SoC-integrator manual, a run-time engine, and a software manual. The company's strategy in creating the open-source project is to foster more-widespread adoption of neural-network inference engines. It expects to thereby benefit from greater demand for its expensive GPU-based training platforms. Most neural-network developers train their models on Nvidia GPUs, and many use the Cuda deep-neural-network (cuDNN) library and software-development kit (SDK) to run models built in Caffe2, Pytorch, TensorFlow, and other popular frameworks.
- Transportation > Ground > Road (0.35)
- Information Technology > Robotics & Automation (0.35)
Nvidia Open Source It's Deep Learning Inference Compiler "NVDLA"
The most part of the computing effort for deep learning inference is based on mathematical operations which can be mostly grouped into the four-part that are convolutions; activations; pooling; and normalization. These all four share a few characteristics that make them well suited for special-purpose hardware implementation: their memory access patterns are extremely predictable & they are readily parallelized. For designing a new custom hardware accelerators for deep learning is clearly popular, but achieving the state-of-the-art performance, and efficiency with a new design is a complex and challenging problem. In order to help developers to advance the adoption of efficient AI inferencing in custom hardware designs, in 2017 Nvidia opened the source for the hardware design of the NVIDIA Deep Learning Accelerator. NVIDIA Deep Learning Accelerator is both scalable and highly configurable; it consists of many great features like the modular design that maintains flexibility & simplifies integration and it also promotes standardized, open architecture to address the computational demands of inference.
NVIDIA And Arm Partnership To Bring Deep Learning Technology To IoT Devices
NVIDIA and Arm just announced that they are partnering to bring deep learning inferencing technology to mobile, consumer electronics and the Internet of Things devices. As a result of the partnership, NVIDIA and Arm will integrate NVIDIA's open-source Deep Learning Accelerator (NVDLA) architecture into Arm's Project Trillium platform for machine learning. "Accelerating AI at the edge is critical in enabling Arm's vision of connecting a trillion IoT devices," said Rene Haas, executive vice president, and president of the IP Group, at Arm. "Today we are one step closer to that vision by incorporating NVDLA into the Arm Project Trillium platform, as our entire ecosystem will immediately benefit from the expertise and capabilities our two companies bring in AI and IoT." The collaboration is meant to simplify integration of AI into IoT device and chip companies. Arm's Project Trillium is integral to the Arm Heterogenous ML compute platform, and leverages Arm ML processors, the Arm object detection (OD) processor, and open-source Arm NN software. NVIDIA's NVDLA is a free, open architecture meant to promote a standard method to design deep learning inference accelerators.
Nvidia and Arm partner to meld AI and IoT in major evolutionary step
Nvidia and Arm team up to make a wealth of IoT consumer devices substantially more intelligent, while the connected clothing market shows no signs of wearing out. The big news this week from an internet of things (IoT) perspective was China-based iPhone supplier Foxconn announcing it was to acquire Belkin – one of the largest IoT device providers globally – for $866m. Foxconn plans to establish a new smart home division combining Belkin's Linksys and Wemo businesses with its own IoT assets. Two of the biggest processor giants in the business – Nvidia and Arm – announced this week that they are to enter a partnership to make it easier for chipmakers to embed deep-learning capabilities into their hardware. According to TechCrunch, Arm will integrate Nvidia's open source Deep Learning Accelerator (NVDLA) architecture into its recently announced Project Trillium hardware, allowing for artificial intelligence (AI) to be put into any smart device.
- North America > United States (0.31)
- Asia > China (0.25)
- Oceania > Australia > South Australia (0.05)
Arm Chooses NVIDIA Open-Source CNN AI Chip Technology
A few weeks ago, we covered ARM's announcement that it would be delivering a suite of AI hardware IP for Deep Learning, called Project Trillium. ARM announced at the time that third party IP could be integrated with the Trillium platform, and now ARM and NVIDIA have teamed up to do just that. Specifically the two companies will integrate NVIDIA's IP for the acceleration of Convolutional Neural Networks (CNNs), the bread and butter for image processing and visually guided systems such as vehicles and drones. Without a lot of fanfare, NVIDIA's Deep Learning Accelerator (NVDLA) was open-sourced last fall, providing free Intellectual Property (IP) licensing to anyone wanting to build a chip that uses CNNs for inference applications (inference, for those unfamiliar, is the processing of a trained neural network). The crying sound you're now hearing around the world is probably a bunch of well-funded startups and their investors who thought that a dozen guys in a garage could out-engineer NVIDIA when it came to CNN accelerator chips.
NVIDIA's next AI steps: An ARM deal and a new 'personal supercomputer'
Soon you won't need one of NVIDIA's tiny Jetson systems if you want to tap into its AI smarts for smaller devices. At its GPU Technology Conference (GTC) today, the company announced it'll be bringing its open source Deep Learning Architecture (NVDLA) over to ARM's upcoming Project Trillium platform, which is focused on mobile AI. Specifically, NVDLA will help developers by accelerating inferencing, the processing of using trained neural networks to perform specific tasks. While it's a surprising move for NVIDIA, which typically relies on its own closed platforms, it makes a lot of sense. NVIDIA already relies on ARM designs for its Jetson and Tegra systems.
NVIDIA Deep Learning Accelerator
The NVIDIA Deep Learning Accelerator (NVDLA) is a free and open architecture that promotes a standard way to design deep learning inference accelerators. With its modular architecture, NVDLA is scalable, highly configurable, and designed to simplify integration and portability. The hardware supports a wide range of IoT devices. Delivered as an open source project under the NVIDIA Open NVDLA License, all of the software, hardware, and documentation will be available on GitHub.