Nvidia says it's achieved significant advances in conversation natural language processing (NLP) training and inference, enabling more complex, immediate-response interchanges between customers and chatbots. And the company says it has a new language training model in the works that dwarfs existing ones. Nvidia said its DGX-2 AI platform trained the BERT-Large AI language model in less than an hour and performed AI inference in 2 milliseconds making "it possible for developers to use state-of-the-art language understanding for large-scale applications…." Training: Running the largest version of Bidirectional Encoder Representations from Transformers (BERT-Large) language model, an Nvidia DGX SuperPOD with 92 Nvidia DGX-2H systems running 1,472 V100 GPUs cut training from several days to 53 minutes. A single DGX-2 system trained BERT-Large in 2.8 days.
Nvidia's GPU-powered platform for developing and running conversational AI that understands and responds to natural language requests has achieved some key milestones and broken some records that have big implications for anyone building on their tech -- which includes companies large and small, as much of the code they've used to achieve these advancements is open source, written in PyTorch and easy to run. The biggest achievements Nvidia announced today include its breaking the hour mark in training BERT, one of the world's most advanced AI language models and a state-of-the-art model widely considered a good standard for natural language processing. Nvidia's AI platform was able to train the model in less than an hour, a record-breaking achievement at just 53 minutes, and the trained model could then successfully infer (i.e. Nvidia's breakthroughs aren't just cause for bragging rights -- these advances scale and provide real-world benefits for anyone working with their NLP conversational AI and GPU hardware. Nvidia achieved its record-setting times for training on one of its SuperPOD systems, which is made up of 92 Nvidia DGX-2H systems runnings 1,472 V100 GPUs, and managed the inference on Nvidia T4 GPUs running Nvidia TensorRT -- which beat the performance of even highly optimized CPUs by many orders of magnitude.
I was wrong to say that Intel (INTC) doesn't need GPUs to compete with Nvidia (NVDA) on artificial intelligence/deep learning computing. Further research told me that along with FPGA (Field Programmable Field Gate Array), there's an embedded Intel Processor Graphics for deep learning inference. It's a new concept that was discussed by Intel only last May. Nvidia's GPU can be the Training Engine for deep learning computers. Intel's FPGAs and embedded Processor Graphics could be the go-to hardware accelerators for inference computing.
Nvidia has released a new version of TensorRT, a runtime system for serving inferences using deep learning models through Nvidia's own GPUs. Inferences, or predictions made from a trained model, can be served from either CPUs or GPUs. Serving inferences from GPUs is part of Nvidia's strategy to get greater adoption of its processors, countering what AMD is doing to break Nvidia's stranglehold on the machine learning GPU market. Nvidia claims the GPU-based TensorRT is better across the board for inferencing than CPU-only approaches. One of Nvidia's proffered benchmarks, the AlexNet image classification test under the Caffe framework, claims TensorRT to be 42 times faster than a CPU-only version of the same test -- 16,041 images per second vs. 374--when run on Nvidia's Tesla P40 processor.
The GPU maker says its AI platform now has the fastest training record, the fastest inference, and largest training model of its kind to date. Nvidia is touting advancements to its artificial intelligence (AI) technology for language understanding that it said sets new performance records for conversational AI. The GPU maker said its AI platform now has the fastest training record, the fastest inference, and largest training model of its kind to date. By adding key optimizations to its AI platform and GPUs, Nvidia is aiming to become the premier provider of conversational AI services, which it says have been limited up to this point due to a broad inability to deploy large AI models in real time. Unlike the much simpler transactional AI, conversational AI uses context and nuance and the responses are instantaneous, explained Nvidia's vice president of applied deep learning research, Bryan Catanzaro, on a press briefing.