If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
Scale variance is one of the crucial challenges in multi-scale object detection. Early approaches address this problem by exploiting the image and feature pyramid, which raises suboptimal results with computation burden and constrains from inherent network structures. Pioneering works also propose multi-scale (i.e., multi-level and multi-branch) feature fusions to remedy the issue and have achieved encouraging progress. However, existing fusions still have certain limitations such as feature scale inconsistency, ignorance of level-wise semantic transformation, and coarse granularity. In this work, we present a novel module, the Fluff block, to alleviate drawbacks of current multi-scale fusion methods and facilitate multi-scale object detection. Specifically, Fluff leverages both multi-level and multi-branch schemes with dilated convolutions to have rapid, effective and finer-grained feature fusions. Furthermore, we integrate Fluff to SSD as FluffNet, a powerful real-time single-stage detector for multi-scale object detection. Empirical results on MS COCO and PASCAL VOC have demonstrated that FluffNet obtains remarkable efficiency with state-of-the-art accuracy. Additionally, we indicate the great generality of the Fluff block by showing how to embed it to other widely-used detectors as well.
In deep learning, convolutional layers have been major building blocks in many deep neural networks. The design was inspired by the visual cortex, where individual neurons respond to a restricted region of the visual field known as the receptive field. A collection of such fields overlap to cover the entire visible area. Though convolutional layers were initially applied in computer vision, its shift-invariant characteristics have allowed convolutional layers to be applied in natural language processing, time series, recommender systems, and signal processing. The easiest way to understand a convolution is by thinking of it as a sliding window function applied to a matrix.
Music source separation involves a large input field to model a long-term dependence of an audio signal. Previous convolutional neural network (CNN) -based approaches address the large input field modeling using sequentially down- and up-sampling feature maps or dilated convolution. In this paper, we claim the importance of a rapid growth of a receptive field and a simultaneous modeling of multi-resolution data in a single convolution layer, and propose a novel CNN architecture called densely connected dilated DenseNet (D3Net). D3Net involves a novel multi-dilated convolution that has different dilation factors in a single layer to model different resolutions simultaneously. By combining the multi-dilated convolution with DenseNet architecture, D3Net avoids the aliasing problem that exists when we naively incorporate the dilated convolution in DenseNet. Experimental results on MUSDB18 dataset show that D3Net achieves state-of-the-art performance with an average signal to distortion ratio (SDR) of 6.01 dB.
These are the lecture notes for FAU's YouTube Lecture "Deep Learning". This is a full transcript of the lecture video & matching slides. We hope, you enjoy this as much as the videos. Of course, this transcript was created with deep learning techniques largely automatically and only minor manual modifications were performed. If you spot mistakes, please let us know! Welcome back to deep learning!
In the world of Deep Computer Vision, there are several types of convolutional layers that differ from the original convolutional layer which was discussed in the previous Deep CV tutorial. These layers are used in many popular advanced convolutional neural network implementations found in the Deep Learning research side of Computer Vision. Each of these layers has a different mechanism than the original convolutional layer and this allows each type of layer to have a particularly special function. Before getting into these advanced convolutional layers, let's first have a quick recap on how the original convolutional layer works. In the original convolutional layer, we have an input that has a shape (W*H*C) where W and H are the width and height of each feature map and C is the number of channels, which is basically the total number of feature maps.
This time, Dilated Convolution, from Princeton University and Intel Lab, is briefly reviewed. The idea of Dilated Convolution is come from the wavelet decomposition. Thus, any ideas from the past are still useful if we can turn them into the deep learning framework. And this dilated convolution has been published in 2016 ICLR with more than 1000 citations when I was writing this story.
In this article, we will discuss multiple perspectives that involve the receptive field of a deep convolutional architecture. We will address the influence of the receptive field starting for the human visual system. As you will see, a lot of terminology of deep learning comes from neuroscience. As a short motivation, convolutions are awesome but it is not enough just to understand how it works. The idea of the receptive field will help you dive into the architecture that you are using or developing. If you are looking for an in-depth analysis to understand how you can calculate the receptive field of your model as well as the most effective ways to increase it, this article was made for you.
Non-Intrusive Load Monitoring (NILM) or Energy Disaggregation (ED), seeks to save energy by decomposing corresponding appliances power reading from an aggregate power reading of the whole house. It is a single channel blind source separation problem (SCBSS) and difficult prediction problem because it is unidentifiable. Recent research shows that deep learning has become a growing popularity for NILM problem. The ability of neural networks to extract load features is closely related to its depth. However, deep neural network is difficult to train because of exploding gradient, vanishing gradient and network degradation. To solve these problems, we propose a sequence to point learning framework based on bidirectional (non-casual) dilated convolution for NILM. To be more convincing, we compare our method with the state of art method--Seq2point (Zhang) directly and compare with existing algorithms indirectly via two same datasets and metrics. Experiments based on REDD and UK-DALE data sets show that our proposed approach is far superior to existing approaches in all appliances.
Global information is essential for dense prediction problems, whose goal is to compute a discrete or continuous label for each pixel in the images. Traditional convolutional layers in neural networks, originally designed for image classification, are restrictive in these problems since their receptive fields are limited by the filter size. In this work, we propose autoregressive moving-average (ARMA) layer, a novel module in neural networks to allow explicit dependencies of output neurons, which significantly expands the receptive field with minimal extra parameters. We show experimentally that the effective receptive field of neural networks with ARMA layers expands as autoregressive coefficients become larger. In addition, we demonstrate that neural networks with ARMA layers substantially improve the performance of challenging pixel-level video prediction tasks as our model enlarges the effective receptive field.
In this work we propose a novel self-attention mechanism model to address electricity theft detection on an imbalanced realistic dataset that presents a daily electricity consumption provided by State Grid Corporation of China. Our key contribution is the introduction of a multi-head self-attention mechanism concatenated with dilated convolutions and unified by a convolution of kernel size $1$. Moreover, we introduce a binary input channel (Binary Mask) to identify the position of the missing values, allowing the network to learn how to deal with these values. Our model achieves an AUC of $0.926$ which is an improvement in more than $17\%$ with respect to previous baseline work. The code is available on GitHub at https://github.com/neuralmind-ai/electricity-theft-detection-with-self-attention.