openvino
Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems
Furutanpey, Alireza, Walser, Carmen, Raith, Philipp, Frangoudis, Pantelis A., Dustdar, Schahram
--This work presents a comprehensive evaluation of neural network graph compilers across heterogeneous hardware platforms, addressing the critical gap between theoretical optimization techniques and practical deployment scenarios. We demonstrate how vendor-specific optimizations can invalidate relative performance comparisons between architectural archetypes, with performance advantages sometimes completely reversing after compilation. Our systematic analysis reveals that graph compilers exhibit performance patterns highly dependent on both neural architecture and batch sizes. Through fine-grained block-level experimentation, we establish that vendor-specific compilers can leverage repeated patterns in simple architectures, yielding disproportionate throughput gains as model depth increases. We introduce novel metrics to quantify a compiler's ability to mitigate performance friction as batch size increases. HE pervasiveness of neural networks (NNs) in modern computing systems has generated significant demand for methods to improve the efficiency of available hardware. As computational complexity increases and deployment scenarios diversify, optimizing neural network execution becomes indispensable for practical applications across various computational platforms. Among the most promising optimization approaches are graph compilers, which optimize the computational graphs of neural networks to enhance scheduling, improve data flow, and exploit dedicated hardware modules. Graph compilers can enhance throughput by orders of magnitude with no loss in accuracy. While these compilers can be used independently, they may also be combined with model compression or acceleration methods, such as quantization, that trade off efficiency for accuracy. The potential performance improvements are substantial.
- North America > United States (0.04)
- Europe > Austria (0.04)
NITRO: LLM Inference on Intel Laptop NPUs
Fei, Anthony, Abdelfattah, Mohamed S.
Large Language Models (LLMs) have become essential tools in natural language processing, finding large usage in chatbots such as ChatGPT and Gemini, and are a central area of research. A particular area of interest includes designing hardware specialized for these AI applications, with one such example being the neural processing unit (NPU). In 2023, Intel released the Intel Core Ultra processor with codename Meteor Lake, featuring a CPU, GPU, and NPU system-on-chip. However, official software support for the NPU through Intel's OpenVINO framework is limited to static model inference. The dynamic nature of autoregressive token generation in LLMs is therefore not supported out of the box. To address this shortcoming, we present NITRO (NPU Inference for Transformers Optimization), a Python-based framework built on top of OpenVINO to support text and chat generation on NPUs. In this paper, we discuss in detail the key modifications made to the transformer architecture to enable inference, some performance benchmarks, and future steps towards improving the package. The code repository for NITRO can be found here: https://github.com/abdelfattah-lab/nitro.
Intel pushes harder to make AI apps run best on Core Ultra
Intel said Tuesday that it is expanding what it calls its AI Acceleration program into midrange software vendors, launching an AI developer NUC to speed the process. It's all a bid to lasso software developers and bring them under the Core Ultra banner. For consumers, the program is an ongoing acknowledgement that Intel continues to work to integrate the NPU inside its Core Ultra processor with software vendors, in order to extract actual value from the logic, and not just capitalize on the latest buzzword, AI. There's a more subtle message, too: if Intel is able to convince software developers to use its OpenVINO toolkit to help them code AI applications, it will help ensure that Intel's Core Ultra chips are the preferred or "better" AI chips. That might not actually be the case, of course. But the push to sign up software developers seems similar to the way in which graphics vendors work with game developers to convince them to add GPU-specific features to their games and thus deliver improved performance.
Audacity's cool audio AI tools are now free for you to try
As AI PCs debut, one question you'll be asking yourself is: What can I do with them? Audacity has an early answer, with the release of its on-chip audio AI tools for music generation, transcription, and more. Intel used Audacity as a demo partner while describing the Meteor Lake (now rebranded as Core Ultra) architecture in Malaysia, showing off some of the tools that it formally released on Monday. The tools use OpenVINO, an open-source toolkit, but one developed by Intel and that the company has separately optimized. Audacity's new AI tools include: The issue is that these new AI tools, in addition to the CPU limitations placed upon them, require a single older version of Audacity installed: Audacity 3.4.2.
Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO
Barad, Haim, Aidova, Ekaterina, Gorbachev, Yury
Inference optimizations are critical for improving user experience and reducing infrastructure costs and power consumption. In this article, we illustrate a form of dynamic execution known as speculative sampling to reduce the overall latency of text generation and compare it with standard autoregressive sampling. This can be used together with model-based optimizations (e.g. quantization) to provide an optimized solution. Both sampling methods make use of KV caching. A Jupyter notebook and some sample executions are provided.
AI at the Edge Spurs New Industrial Opportunities
The world is moving fast, and manufacturers must be able to keep up with the pace of change. Luckily, with technologies like AI, machine learning, computer vision, and edge computing, solution developers have the tools to help them do so. And we are already seeing major results--both inside and outside the factory. For instance, smart manufacturers have started to deploy AI at the edge on the shop floor to reduce the risk of unplanned shutdowns and production issues. By automating the process with AI platforms like the Intel OpenVINO Toolkit, image analysis can be performed directly on smart factory equipment, and workers can be quickly notified of any issues happening. This reduces manual work, which is prone to errors, and stops problems before they snowball.
AI Inference Software Fundamentals: Getting Started with Optical Character Recognition
You can find the full source code to today's demo in a Kaggle notebook where it is formatted as a series of very short, numbered blocks. For the sake of brevity, this post will walk through only the most significant snippets of the notebook's code. But, of course, you can study the full notebook at your leisure by the block number and learn how we trained a neural network from scratch to achieve a level of accuracy not possible a decade ago. In blocks 1 to 3, the notebook sets the Python environment for TensorFlow. In blocks 4 to 14, the notebook loads the database MNIST, which is what we will use to create a model that can recognize handwritten digits and train our neural networks. Then the new and exciting part Intel offers today is how these models can be optimized on Intel hardware to run more efficiently and quickly.
The AI Journey: Why You Should Pack OpenShift and OpenVINO
AI can be an intimidating field to get into, and there is a lot that goes into deploying an AI application. But if you don't choose the right tools, it can be even more difficult than it needs to be. Luckily, the work that Intel and Red Hat are doing is easing the burden for businesses and developers. They'll discuss machine learning and natural language processing; using the OpenVINO AI toolkit with Red Hat OpenShift; and the life cycle of an AI intelligent application. Ryan Loney: Everything today has some intelligence embedded into it.
Does Form Follow Function? An Empirical Exploration of the Impact of Deep Neural Network Architecture Design on Hardware-Specific Acceleration
Abbasi, Saad, Shafiee, Mohammad Javad, Chan, Ellick, Wong, Alexander
The fine-grained relationship between form and function with respect to deep neural network architecture design and hardware-specific acceleration is one area that is not well studied in the research literature, with form often dictated by accuracy as opposed to hardware function. In this study, a comprehensive empirical exploration is conducted to investigate the impact of deep neural network architecture design on the degree of inference speedup that can be achieved via hardware-specific acceleration. More specifically, we empirically study the impact of a variety of commonly used macro-architecture design patterns across different architectural depths through the lens of OpenVINO microprocessor-specific and GPU-specific acceleration. Experimental results showed that while leveraging hardware-specific acceleration achieved an average inference speed-up of 380%, the degree of inference speed-up varied drastically depending on the macro-architecture design pattern, with the greatest speedup achieved on the depthwise bottleneck convolution design pattern at 550%. Furthermore, we conduct an in-depth exploration of the correlation between FLOPs requirement, level 3 cache efficacy, and network latency with increasing architectural depth and width. Finally, we analyze the inference time reductions using hardware-specific acceleration when compared to native deep learning frameworks across a wide variety of hand-crafted deep convolutional neural network architecture designs as well as ones found via neural architecture search strategies. We found that the DARTS-derived architecture to benefit from the greatest improvement from hardware-specific software acceleration (1200%) while the depthwise bottleneck convolution-based MobileNet-V2 to have the lowest overall inference time of around 2.4 ms.
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
- North America > United States > New York > New York County > New York City (0.04)