When NVIDIA announced breakthroughs in language understanding to enable real-time conversational AI, we were caught off guard. We were still trying to digest the proceedings of ACL, one of the biggest research events for computational linguistics worldwide, in which Facebook, Salesforce, Microsoft and Amazon were all present. While these represent two different sets of achievements, they are still closely connected. Here is what NVIDIA's breakthrough is about, and what it means for the world at large. As ZDNet reported yesterday, NVIDIA says its AI platform now has the fastest training record, the fastest inference, and largest training model of its kind to date.
Nvidia says it's achieved significant advances in conversation natural language processing (NLP) training and inference, enabling more complex, immediate-response interchanges between customers and chatbots. And the company says it has a new language training model in the works that dwarfs existing ones. Nvidia said its DGX-2 AI platform trained the BERT-Large AI language model in less than an hour and performed AI inference in 2 milliseconds making "it possible for developers to use state-of-the-art language understanding for large-scale applications…." Training: Running the largest version of Bidirectional Encoder Representations from Transformers (BERT-Large) language model, an Nvidia DGX SuperPOD with 92 Nvidia DGX-2H systems running 1,472 V100 GPUs cut training from several days to 53 minutes. A single DGX-2 system trained BERT-Large in 2.8 days.
After breaking all the records related to training computer vision models, NVIDIA now claims that it's AI platform is able to train a natural language neural network model based on one of the largest datasets in a record time. It also claims that the inference time is just 2 milliseconds which translates to an extremely fast response from the model participating in a conversation with a user. After computer vision, natural language processing is one of the top applications of AI. From Siri to Alexa to Cortana to Google Assistant, all conversational user experiences are powered by AI. The advancements in AI research is putting the power of language understanding and conversational interface into the hands of developers.
Nvidia Corp. is upping its artificial intelligence game with the release of a new version of its TensorRT software platform for high-performance deep learning inference. TensorRT is a platform that combines a high-performance deep learning inference optimizer with a runtime that delivers low-latency, high-throughput inference for AI applications. Inference is an important aspect of AI. Whereas AI training relates to the development of an algorithm's ability to understand a data set, inference refers to its ability to act on that data to infer answers to specific queries. The latest version brings with it some dramatic improvements on the performance side.
This repository provides scripts to train the Jasper model to achieve near state of the art accuracy and perform high-performance inference using NVIDIA TensorRT. This repository is tested and maintained by NVIDIA. The Jasper model is an end-to-end neural acoustic model for automatic speech recognition (ASR) that provides near state-of-the-art results on LibriSpeech among end-to-end ASR models without any external data. The Jasper architecture of convolutional layers was designed to facilitate fast GPU inference, by allowing whole sub-blocks to be fused into a single GPU kernel. This is important for meeting strict real-time requirements of ASR systems in deployment.