Goto

Collaborating Authors

 onednn


Scale Vision Transformers (ViT) Beyond Hugging Face 1/3

#artificialintelligence

I am one of the contributors to the Spark NLP open-source project and just recently this library started supporting end-to-end Vision Transformers (ViT) models. I use Spark NLP and other ML/DL open-source libraries for work daily and I have decided to deploy a ViT pipeline for a state-of-the-art image classification task and provide in-depth comparisons between Hugging Face and Spark NLP. The purpose of this article is to demonstrate how to scale out Vision Transformer (ViT) models from Hugging Face and deploy them in production-ready environments for accelerated and high-performance inference. By the end, we will scale a ViT model from Hugging Face by 25x times (2300%) by using Databricks, Nvidia, and Spark NLP. Back in 2017, a group of researchers at Google AI published a paper that introduced a transformer model architecture that changed all Natural Language Processing (NLP) standards.


Accelerate PyTorch with IPEX and oneDNN using Intel BF16 Technology

#artificialintelligence

Intel and Facebook previously collaborated to enable BF16, a first-class data type in PyTorch. It supports basic math and tensor operations and adds CPU optimization with multi-threading, vectorization, and neural network kernels from oneAPI Deep Neural Network Library (oneDNN, formerly known as MKL-DNN). The related work was published in an earlier blog during the launch of the 3rd Gen Intel Xeon scalable processors (formerly codename Cooper Lake). In that blog, we introduced the HW advancements for native BF16 support in Cooper Lake with BF16- FP32 fused multiply-add (FMA) Intel Advanced Vector Extensions-512 (Intel AVX-512) instructions that bring doubled theoretical compute throughput over FP32 FMA. Based on the HW advancement and SW optimization from Intel and Facebook, we showcased 1.40x-1.64x


Artificial Intelligence at Intel - Three Current Applications

#artificialintelligence

Daniel Faggella is Head of Research at Emerj. Called upon by the United Nations, World Bank, INTERPOL, and leading enterprises, Daniel is a globally sought-after expert on the competitive strategy implications of AI for business and government leaders. Intel was founded in 1968 by Robert Noyce and Gordon Moore, who had previously been among the founders of Fairchild Semiconductors. Today, Intel employs over 121,000 people worldwide. In its 2021 annual report, the company reported revenues of $79 billion.


AbbVie Accelerates Natural Language Processing

#artificialintelligence

AbbVie is a research-based biopharmaceutical company that serves more than 30 million patients in 175 countries. With its global scale, AbbVie partnered with Intel to optimize processes for its more than 47,000 employees. This whitepaper highlights two use cases that are important to AbbVie's research. The first is Abbelfish Machine Translation, AbbVie's language translation service based on the Transformer NLP model, that leverages second-generation Intel Xeon Scalable processors and the Intel Optimization for TensorFlow with Intel oneAPI Deep Neural Network Library (oneDNN). AbbVie was able to achieve a 1.9x improvement in throughput for Abbelfish language translation using Intel Optimization for TensorFlow 1.15 with oneAPI Deep Neural Network Library when compared to TensorFlow 1.15 without oneDNN.1


Optimizing Inference Performance of Transformers on CPUs

arXiv.org Artificial Intelligence

This paper comes to address this gap by presenting an empirical analysis of scalability and performance of inferencing Transfomerbased The Transformer architecture revolutionized the field of natural models on CPUs. We identify the key component of the language processing (NLP). Transformers-based models (e.g., BERT) Transformer architecture where the bulk of the computation happens, power many important Web services, such as search, translation, namely, the matrix multiplication (matmul) operations, and question-answering, etc. While enormous research attention is paid propose three optimizations to speed them up. to the training of those models, relatively little efforts are made The first optimization is based on the observation that the performance to improve their inference performance. This paper comes to address of the matmul operation is heavily impacted not only this gap by presenting an empirical analysis of scalability by the shape (dimensions) of the source matrices and the available and performance of inferencing a Transformer-based model on computing resources (the number of worker threads), but also by CPUs.