Collaborating Authors

Powering artificial intelligence: The explosion of new AI hardware accelerators


AI's rapid evolution is producing an explosion in new types of hardware accelerators for machine learning and deep learning. Some people refer to this as a "Cambrian explosion," which is an apt metaphor for the current period of fervent innovation. From that point onward, these creatures--ourselves included--fanned out to occupy, exploit, and thoroughly transform every ecological niche on the planet. The range of innovative AI hardware-accelerator architectures continues to expand. Although you may think that graphic processing units (GPUs) are the dominant AI hardware architecture, that is far from the truth.

TensorFlow Quantum Boosts Quantum Computer Hardware Performance


Google recently released TensorFlow Quantum, a toolset for combining state-of-the-art machine learning techniques with quantum algorithm design. This is an essential step to build tools for developers working on quantum applications. Simultaneously, they have focused on improving quantum computing hardware performance by integrating a set of quantum firmware techniques and building a TensorFlow-based toolset working from the hardware level up – from the bottom of the stack. The fundamental driver for this work is tackling the noise and error in quantum computers. Here's a small overview of the above and how the impact of noise and imperfections (critical challenges) is suppressed in quantum hardware.

Mobile Machine Learning Hardware At Arm


Machine learning is playing an increasingly significant role in emerging mobile application domains such as AR/VR, ADAS, etc. Accordingly, hardware architects have designed customized hardware for machine learning algorithms, especially neural networks, to improve compute efficiency. However, machine learning is typically just one processing stage in complex end-to-end applications, which involve multiple components in a mobile Systems-on-a-chip (SoC). Focusing on just ML accelerators loses bigger optimization opportunity at the system (SoC) level. This paper argues that hardware architects should expand the optimization scope to the entire SoC. We demonstrate one particular case-study in the domain of continuous computer vision where camera sensor, image signal processor (ISP), memory, and NN accelerator are synergistically co-designed to achieve optimal system-level efficiency.

2018 AI Trends: Cloud Models, AI Hardware


Nvidia's greatest growth in chips in 2017 was in the AI and cloud-based sectors, which should increase in 2018. This year tech companies will begin moving AI more to the "edge" of access, leveraging trained machine learning software with cloud-based computing, according to a The authors, Daniel Li, Principal, and S. Somasegar, Managing Director, predicted four new trends in 2018: Machine learning models will operate outside of the data centers and via phones and personal assistant devices, like Alexa and SIRI to reduce power and bandwidth consumption, reduce latency and ensure privacy. Specialized chips for AI will perform better than all-purpose chips, and computers built to optimize AI are already being designed. Text, voice, gestures and vision will all be used more widely to communicate with systems.



The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA CUDA primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes.