Deep Learning


TensorFlow* Optimizations on Modern Intel Architecture

@machinelearnbot

TensorFlow* is a leading deep learning and machine learning framework, which makes it important for Intel and Google to ensure that it is able to extract maximum performance from Intel's hardware offering. This paper introduces the Artificial Intelligence (AI) community to TensorFlow optimizations on Intel Xeon and Intel Xeon Phi processor-based platforms. These optimizations are the fruit of a close collaboration between Intel and Google engineers announced last year by Intel's Diane Bryant and Google's Diane Green at the first Intel AI Day. We describe the various performance challenges that we encountered during this optimization exercise and the solutions adopted. We also report out performance improvements on a sample of common neural networks models.


How to Train your Self-Driving Car to Steer – Towards Data Science – Medium

#artificialintelligence

Neural networks, and particularly deep learning research, have obtained many breakthroughs recently in the field of computer vision and other important fields in computer science. Deep neural networks, especially in the field of computer vision, object recognition and so on, have often a lot of parameters, millions of them. It's a quite recent model that achieved remarkable performances on object recognition tasks with very few parameters, and weighting just some megabytes. I added a recurrent layer to the output of one of the first densely connected layers of SqueezeNet: the network now takes as input 5 consecutive frames, and then the recurrent layers outputs a single real-valued number, the steering angle.


NVIDIA Targets Next AI Frontiers: Inference And China

#artificialintelligence

NVIDIA's meteoric growth in the datacenter, where its business is now generating some $1.6B annually, has been largely driven by the demand to train deep neural networks for Machine Learning (ML) and Artificial Intelligence (AI)--an area where the computational requirements are simply mindboggling. First, and perhaps most importantly, Huang announced new TensorRT3 software that optimizes trained neural networks for inference processing on NVIDIA GPUs. In addition to announcing the Chinese deployment wins, Huang provided some pretty compelling benchmarks to demonstrate the company's prowess in accelerating Machine Learning inference operations, in the datacenter and at the edge. In addition to the TensorRT3 deployments, Huang announced that the largest Chinese Cloud Service Providers, Alibaba, Baidu, and Tencent, are all offering the company's newest Tesla V100 GPUs to their customers for scientific and deep learning applications.


New Optimizations Improve Deep Learning Frameworks For CPUs

#artificialintelligence

Since most of us need more than a "machine learning only" server, I'll focus on the reality of how Intel Xeon SP Platinum processors remain the best choice for servers, including servers needing to do machine learning as part of their workload. Here is a partial run down of key software for accelerating deep learning on Intel Xeon Platinum processor versions enough that the best performance advantage of GPUs is closer to 2X than to 100X. There is also a good article in Parallel Universe Magazine, Issue 28, starting on page 26, titled Solving Real-World Machine Learning Problems with Intel Data Analytics Acceleration Library. High-core count CPUs (the Intel Xeon Phi processors – in particular the upcoming "Knights Mill" version), and FPGAs (Intel Xeon processors coupled with Intel/Altera FPGAs), offer highly flexible options excellent price/performance and power efficiencies.


New Optimizations Improve Deep Learning Frameworks For CPUs

#artificialintelligence

Since most of us need more than a "machine learning only" server, I'll focus on the reality of how Intel Xeon SP Platinum processors remain the best choice for servers, including servers needing to do machine learning as part of their workload. Here is a partial run down of key software for accelerating deep learning on Intel Xeon Platinum processor versions enough that the best performance advantage of GPUs is closer to 2X than to 100X. There is also a good article in Parallel Universe Magazine, Issue 28, starting on page 26, titled Solving Real-World Machine Learning Problems with Intel Data Analytics Acceleration Library. High-core count CPUs (the Intel Xeon Phi processors – in particular the upcoming "Knights Mill" version), and FPGAs (Intel Xeon processors coupled with Intel/Altera FPGAs), offer highly flexible options excellent price/performance and power efficiencies.


Hyperparameter Tuning of Deep Learning Algorithm

#artificialintelligence

Random Sampling Method: In random method, we have high probability of finding good set of params quickly. Random sampling allows efficient search in hyperparameter space. In this range, it is quite reasonable to pick random values. This way we will spend equal resource to explor each interval of hyperparameter range.


Fujitsu adds deep learning to nVidia GPUs

@machinelearnbot

Fujitsu is rising to this challenge by introducing native deep learning processing capabilities to select Fujitsu Primergy CX and RX server models. To achieve the highest possible levels of system performance, Fujitsu is introducing native support for NVIDIA GPUs via direct connection to the mainboard. Connected either via plug-in PCIe cards or the nVidia NVLink high-speed interconnect, Fujitsu Primergy servers provide access to more than 100 Teraflops per second (tflops) of deep learning performance. The first models to offer native support for nVidia Volta GPUs are the Fujitsu Server Primergy CX2570 M4, as one component of the modular CX400 M4 scale-out ecosystem, and the Fujitsu Server Primergy RX2540 M4.


Microsoft launches 'Project Brainwave' for real-time AI

#artificialintelligence

With the help of ultra-low latency, the system processes requests as fast as it receives them. He added that the system architecture reduces latency, since the CPU does not need to process incoming requests, and allows very high throughput, with the FPGA processing requests as fast as the network can stream them. Microsoft is also planning to bring the real-time AI system to users in Azure. "With the'Project Brainwave' system incorporated at scale and available to our customers, Microsoft Azure will have industry-leading capabilities for real-time AI," Burger noted.


Optimizing OpenCV on the Raspberry Pi - PyImageSearch

#artificialintelligence

Otherwise, if you're compiling OpenCV for Python 3, check the "Python 3" output of CMake: Figure 2: After running CMake, Python 3 NumPy are correctly set from within our cv virtualenv on the Raspberry Pi. Now that we've updated the swap size, kick off the optimized OpenCV compile using all four cores: Figure 3: Our optimized compile of OpenCV 3.3 for the Raspberry Pi 3 has been completed successfully. Given that we just optimized for floating point operations a great test would be to run a pre-trained deep neural network on the Raspberry Pi, similar to what we did last week. Let's give SqueezeNet a try: Figure 5: Squeezenet on the Raspberry Pi 3 also achieves performance gains using our optimized install of OpenCV 3.3.


Deep Dive Into Sentiment Analysis - DZone AI

#artificialintelligence

RNNs recursively apply the same function (the function it learns during training) on a combination of previous memory (called hidden unit gathered from time 0 through t-1) and new input (at time t) to get output at time t. General RNNs have problems like gradients becoming too large and too small when you try to train a sentiment model using them due to the recursive nature. Gated Feedback Recurrent Neural Network extends the existing approach of stacking multiple recurrent layers by allowing and controlling signals flowing from upper recurrent layers to lower layers using a global gating unit for each pair of layers. Tree LSTMs outperform all existing systems and strong LSTM baselines on sentiment classification on Stanford Sentiment Treebank dataset. Open AI's unsupervised model using this representation achieved state-of-the-art sentiment analysis accuracy on a small but extensively studied dataset, the Stanford Sentiment Treebank, churning 91.8% accuracy versus the previous best of 90.2%.