Deep Learning
Object detection with deep learning and OpenCV - PyImageSearch
A couple weeks ago we learned how to classify images using deep learning and OpenCV 3.3's deep neural network ( dnn) module. While this original blog post demonstrated how we can categorize an image into one of ImageNet's 1,000 separate class labels it could not tell us where an object resides in image. In order to obtain the bounding box (x, y)-coordinates for an object in a image we need to instead apply object detection. Object detection can not only tell us what is in an image but also where the object is as well. In the remainder of today's blog post we'll discuss how to apply object detection using deep learning and OpenCV.
Facebook and Microsoft introduce new open ecosystem for interchangeable AI frameworks
Facebook and Microsoft are today introducing Open Neural Network Exchange (ONNX) format, a standard for representing deep learning models that enables models to be transferred between frameworks. When developing learning models, engineers and researchers have many AI frameworks to choose from. At the outset of a project, developers have to choose features and commit to a framework. We developed ONNX together with Microsoft to bridge this gap and to empower AI developers to choose the framework that fits the current stage of their project and easily switch between frameworks as the project evolves. Enabling interoperability between different frameworks and streamlining the path from research to production will help increase the speed of innovation in the AI community.
The Wild Week in AI - IBM MIT AI Lab; EMNLP Videos; Tensorflow RL library; Lots of PyTorch projects;
A novel architectural unit for CNNs, termed the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modeling interdependencies between channels. SE blocks produce performance improvements for existing state-of-the-art deep architectures at slight computational cost. SENets formed the foundation of our ILSVRC 2017 classification submission which won first place and significantly reduced the top-5 error to 2.251%, achieving a 25% relative improvement over the winning entry of 2016.
The West in Unaware of The Deep Learning Sputnik Moment
Many readers are unfamiliar with the history of Sputnik The effect of Soviet Union's achievement in launching the first man made satellite (i.e. Sputnik created the urgency to upgrade America's science and technology infrastructure: This was viewed by a shocked audience of over 200 million people. A vast majority of that audience was from countries were the game of Go is popularly played (i.e. To have a Western developed automation arrive and vanquish a legendary player like Lee Sedol certainly shocked the population to its core. Chinese authorities were concerned enough about the social ramifications that they hastily imposed a country-wide ban on the live-streaming of the event. This kind of shock of one's core view of the world is certainly to galvanize serious action.
Prolongation of SMAP to Spatio-temporally Seamless Coverage of Continental US Using a Deep Learning Neural Network
Fang, Kuai, Shen, Chaopeng, Kifer, Daniel, Yang, Xiao
The Soil Moisture Active Passive (SMAP) mission has delivered valuable sensing of surface soil moisture since 2015. However, it has a short time span and irregular revisit schedule. Utilizing a state-of-the-art time-series deep learning neural network, Long Short-Term Memory (LSTM), we created a system that predicts SMAP level-3 soil moisture data with atmospheric forcing, model-simulated moisture, and static physiographic attributes as inputs. The system removes most of the bias with model simulations and improves predicted moisture climatology, achieving small test root-mean-squared error (<0.035) and high correlation coefficient >0.87 for over 75\% of Continental United States, including the forested Southeast. As the first application of LSTM in hydrology, we show the proposed network avoids overfitting and is robust for both temporal and spatial extrapolation tests. LSTM generalizes well across regions with distinct climates and physiography. With high fidelity to SMAP, LSTM shows great potential for hindcasting, data assimilation, and weather forecasting.
Deep learning: Technical introduction
At this time, I knew nothing about backpropagation, and was completely ignorant about the differences between a Feedforward, Con-volutional and a Recurrent Neural Network. As I navigated through the humongous amount of data available on deep learning online, I found myself quite frustrated when it came to really understand what deep learning is, and not just applying it with some available library . In particular, the backpropagation update rules are seldom derived, and never in index form. Unfortunately for me, I have an "index" mind: seeing a 4 Dimensional convolution formula in matrix form does not do it for me. Since I am also stupid enough to like recoding the wheel in low level programming languages, the matrix form cannot be directly converted into working code either. I therefore started some notes for my personal use, where I tried to rederive everything from scratch in index form. I did so for the vanilla Feedforward network, then learned about L1 and L2 regularization, dropout[1], batch normalization[2], several gradient descent optimization techniques... Then turned to convolutional networks, from conventional single digit number of layer conv-pool architectures[3] to recent VGG[4] ResNet[5] ones, from local contrast normalization and rectification to bacthnorm... And finally I studied Recurrent Neural Network structures[6], from the standard formulation to the most recent LSTM one[7]. As my work progressed, my notes got bigger and bigger, until a point when I realized I might have enough material to help others starting their own deep learning journey .
Audio-Visual Speech Enhancement based on Multimodal Deep Convolutional Neural Network
Hou, Jen-Cheng, Wang, Syu-Siang, Lai, Ying-Hui, Tsao, Yu, Chang, Hsiu-Wen, Wang, Hsin-Min
Speech enhancement (SE) aims to reduce noise in speech signals. Most SE techniques focus on addressing audio information only. In this work, inspired by multimodal learning, which utilizes data from different modalities, and the recent success of convolutional neural networks (CNNs) in SE, we propose an audio-visual deep CNN (AVDCNN) SE model, which incorporates audio and visual streams into a unified network model. In the proposed AVDCNN SE model, audio and visual data are first processed using individual CNNs, and then, fused into a joint network to generate enhanced speech at the output layer. The AVDCNN model is trained in an end-to-end manner, and parameters are jointly learned through back-propagation. We evaluate enhanced speech using five objective criteria. Results show that the AVDCNN yields notably better performance, compared with an audio-only CNN-based SE model and two conventional SE approaches, confirming the effectiveness of integrating visual information into the SE process.
Review of Stanford Course on Deep Learning for Natural Language Processing - Machine Learning Mastery
Natural Language Processing, or NLP, is a subfield of machine learning concerned with understanding speech and text data. Statistical methods and statistical machine learning dominate the field and more recently deep learning methods have proven very effective in challenging NLP problems like speech recognition and text translation. In this post, you will discover the Stanford course on the topic of Natural Language Processing with Deep Learning methods. This course is free and I encourage you to make use of this excellent resource. The course is taught by Chris Manning and Richard Socher.