If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
In the world of Deep Computer Vision, there are several types of convolutional layers that differ from the original convolutional layer which was discussed in the previous Deep CV tutorial. These layers are used in many popular advanced convolutional neural network implementations found in the Deep Learning research side of Computer Vision. Each of these layers has a different mechanism than the original convolutional layer and this allows each type of layer to have a particularly special function. Before getting into these advanced convolutional layers, let's first have a quick recap on how the original convolutional layer works. In the original convolutional layer, we have an input that has a shape (W*H*C) where W and H are the width and height of each feature map and C is the number of channels, which is basically the total number of feature maps.
This time, Dilated Convolution, from Princeton University and Intel Lab, is briefly reviewed. The idea of Dilated Convolution is come from the wavelet decomposition. Thus, any ideas from the past are still useful if we can turn them into the deep learning framework. And this dilated convolution has been published in 2016 ICLR with more than 1000 citations when I was writing this story.
In this article, we will discuss multiple perspectives that involve the receptive field of a deep convolutional architecture. We will address the influence of the receptive field starting for the human visual system. As you will see, a lot of terminology of deep learning comes from neuroscience. As a short motivation, convolutions are awesome but it is not enough just to understand how it works. The idea of the receptive field will help you dive into the architecture that you are using or developing. If you are looking for an in-depth analysis to understand how you can calculate the receptive field of your model as well as the most effective ways to increase it, this article was made for you.
Global information is essential for dense prediction problems, whose goal is to compute a discrete or continuous label for each pixel in the images. Traditional convolutional layers in neural networks, originally designed for image classification, are restrictive in these problems since their receptive fields are limited by the filter size. In this work, we propose autoregressive moving-average (ARMA) layer, a novel module in neural networks to allow explicit dependencies of output neurons, which significantly expands the receptive field with minimal extra parameters. We show experimentally that the effective receptive field of neural networks with ARMA layers expands as autoregressive coefficients become larger. In addition, we demonstrate that neural networks with ARMA layers substantially improve the performance of challenging pixel-level video prediction tasks as our model enlarges the effective receptive field.
In this work we propose a novel self-attention mechanism model to address electricity theft detection on an imbalanced realistic dataset that presents a daily electricity consumption provided by State Grid Corporation of China. Our key contribution is the introduction of a multi-head self-attention mechanism concatenated with dilated convolutions and unified by a convolution of kernel size $1$. Moreover, we introduce a binary input channel (Binary Mask) to identify the position of the missing values, allowing the network to learn how to deal with these values. Our model achieves an AUC of $0.926$ which is an improvement in more than $17\%$ with respect to previous baseline work. The code is available on GitHub at https://github.com/neuralmind-ai/electricity-theft-detection-with-self-attention.
Deep learning-based health status representation learning and clinical prediction have raised much research interest in recent years. Existing models have shown superior performance, but there are still several major issues that have not been fully taken into consideration. First, the historical variation pattern of the biomarker in diverse time scales plays a vital role in indicating the health status, but it has not been explicitly extracted by existing works. Second, key factors that strongly indicate the health risk are different among patients. It is still challenging to adaptively make use of the features for patients in diverse conditions. Third, using prediction models as the black box will limit the reliability in clinical practice. However, none of the existing works can provide satisfying interpretability and meanwhile achieve high prediction performance. In this work, we develop a general health status representation learning model, named AdaCare. It can capture the long and short-term variations of biomarkers as clinical features to depict the health status in multiple time scales. It also models the correlation between clinical features to enhance the ones which strongly indicate the health status and thus can maintain a state-of-the-art performance in terms of prediction accuracy while providing qualitative interpretability. We conduct a health risk prediction experiment on two real-world datasets. Experiment results indicate that AdaCare outperforms state-of-the-art approaches and provides effective interpretability, which is verifiable by clinical experts.
Deep convolutional neural networks for multi-scale time-series classification and application to disruption prediction in fusion devices R.M. Churchill Theory Department Princeton Plasma Physics Laboratory 100 Stellarator Road, Princeton, NJ 08540, USA email@example.com and the DIII-D team General Atomics P .O. Box 85608, San Diego, California 92186, USA Abstract The multi-scale, mutli-physics nature of fusion plasmas makes predicting plasma events challenging. Recent advances in deep convolutional neural network architectures (CNN) utilizing dilated convolutions enable accurate predictions on sequences which have long-range, multi-scale characteristics, such as the time-series generated by diagnostic instruments observing fusion plasmas. Here we apply this neural network architecture to the popular problem of disruption prediction in fusion tokamaks, utilizing raw data from a single diagnostic, the Electron Cyclotron Emission imaging (ECEi) diagnostic from the DIII-D tokamak. ECEi measures a fundamental plasma quantity (electron temperature) with high temporal resolution over the entire plasma discharge, making it sensitive to a number of potential pre-disruptions markers with different temporal and spatial scales. Promising, initial disruption prediction results are obtained training a deep CNN with large receptive field ( 30k), achieving an F 1-score of 91% on individual time-slices using only the ECEi data. 1 Introduction Plasma phenomena contain a wide range of temporal and spatial scales, often exhibiting multi-scale characteristics (see Figure 1).
There are many award-winning pre-trained Convolutional Neural Network (CNN), which have a common phenomenon of increasing depth in convolutional layers. However, I inspect on VGG network, which is one of the famous model submitted to ILSVRC-2014, to show that slight modification in the basic architecture can enhance the accuracy result of the image classification task. In this paper, We present two improve architectures of pre-trained VGG-16 and VGG-19 networks that apply transfer learning when trained on a different dataset. I report a series of experimental result on various modification of the primary VGG networks and achieved significant out-performance on image classification task by: (1) freezing the first two blocks of the convolutional layers to prevent over-fitting and (2) applying different combination of dilation rate in the last three blocks of convolutional layer to reduce image resolution for feature extraction. Both the proposed architecture achieves a competitive result on CIFAR-10 and CIFAR-100 dataset.
The common configuration is to have 9 pre-defined anchors, involving three different scales, and three different height-width ratios (usually 1:1, 2:1 and 1:2). As a region is proposed for each anchor box and at each point of the feature map, the output of this step is k (number of points in the feature map). Finally, the proposed regions are filtered out to only keep the best ones (highest object classification score for instance). In order to train the RPN, we need to determine a matching strategy between the predicted bounding-boxed and the ground-truth bounding boxes.
Expanding the receptive field to capture large-scale context is key to obtaining good performance in dense prediction tasks, such as human pose estimation. While many state-of-the-art fully-convolutional architectures enlarge the receptive field by reducing resolution using strided convolution or pooling layers, the most straightforward strategy is adopting large filters. This, however, is costly because of the quadratic increase in the number of parameters and multiply-add operations. In this work, we explore using learnable box filters to allow for convolution with arbitrarily large kernel size, while keeping the number of parameters per filter constant. In addition, we use precomputed summed-area tables to make the computational cost of convolution independent of the filter size. We adapt and incorporate the box filter as a differentiable module in a fully-convolutional neural network, and demonstrate its competitive performance on popular benchmarks for the task of human pose estimation.