Nvidia has released a new version of TensorRT, a runtime system for serving inferences using deep learning models through Nvidia's own GPUs. Inferences, or predictions made from a trained model, can be served from either CPUs or GPUs. Serving inferences from GPUs is part of Nvidia's strategy to get greater adoption of its processors, countering what AMD is doing to break Nvidia's stranglehold on the machine learning GPU market. Nvidia claims the GPU-based TensorRT is better across the board for inferencing than CPU-only approaches. One of Nvidia's proffered benchmarks, the AlexNet image classification test under the Caffe framework, claims TensorRT to be 42 times faster than a CPU-only version of the same test -- 16,041 images per second vs. 374--when run on Nvidia's Tesla P40 processor.
NVIDIA announced the integration of our TensorRT inference optimization tool with TensorFlow. TensorRT integration will be available for use in the TensorFlow 1.7 branch. TensorFlow remains the most popular deep learning framework today while NVIDIA TensorRT speeds up deep learning inference through optimizations and high-performance runtimes for GPU-based platforms. We wish to give TensorFlow users the highest inference performance possible along with a near transparent workflow using TensorRT. The new integration provides a simple API which applies powerful FP16 and INT8 optimizations using TensorRT from within TensorFlow.
Learn how you can generate CUDA code from a trained deep neural network in MATLAB and leverage the NVIDIA TensorRT library for inference on NVIDIA GPUs. The video demonstrates this by using a pedestrian detection application as an example. The NVIDIA TensorRT library is a high-performance deep learning inference optimizer and runtime library. The generated code leverages the network-level and layer-level TensorRT APIs to get the best performance, and you see the neural network for pedestrian detection running on a NVIDIA Titan XP around 700 fps. You can export the generated code along with the rest of the application and deploy the algorithm on embedded GPU targets such as Jetson Tegra or Drive PX platforms.
FILE - In this Oct. 1, 2019, file photo the symbol for Intel appears on a screen at the Nasdaq ... [ ] MarketSite, in New York. Intel said Monday, Dec. 16, that it has bought Israeli artificial intelligence startup Habana Labs for $2 billion. This week Intel agreed to pay roughly $2 billion for Habana Labs. Based in Israel and founded in 2015, the company is a startup focused on AI (Artificial Intelligence) chips. Keep in mind that Habana has raised a total of $75 million, which is a fairly modest amount for a hardware company (Intel Capital was one of the investors).
NVIDIA TensorRT is a high-performance deep learning inference library for production environments. Power efficiency and speed of response are two key metrics for deployed deep learning applications, because they directly affect the user experience and the cost of the service provided. Tensor RT automatically optimizes trained neural networks for run-time performance, delivering up to 16x higher energy efficiency (performance per watt) on a Tesla P100 GPU compared to common CPU-only deep learning inference systems (see Figure 1). Figure 2 shows the performance of NVIDIA Tesla P100 and K80 running inference using TensorRT with the relatively complex GoogLenet neural network architecture. In this post we will show you how you can use Tensor RT to get the best efficiency and performance out of your trained deep neural network on a GPU-based deployment platform.