In today's summary we dive into the architecture of WaveNet and its successor ByteNet which are autoregressive generative models for generating audio and respectively sentences on character-level. The architectures behind both models are based on dilated causal convolutional layers which recently got much attention also in image generation tasks. Especially modeling sequential data with long term dependencies like audio or text seem to benefit from convolutions with dilations to increase the receptive field. Without further introduction we start right away with the main components behind WaveNet, which will later also appear in the architecture of ByteNet. The key ingredient are so called dilated causal convolutions which have some advantages over standard convolutions.
At Qure, we regularly work on segmentation and object detection problems and we were therefore interested in reviewing the current state of the art. In this post, I review the literature on semantic segmentation. Although the results are not directly applicable to medical images, I review these papers because research on the natural images is much more mature than that of medical images. Post is organized as follows: I first explain the semantic segmentation problem, give an overview of the approaches and summarize a few interesting papers. In a later post, I'll explain why medical images are different from natural images and examine how the approaches from this review fare on a dataset representative of medical images.
Vanilla convolutional neural networks are known to provide superior performance not only in image recognition tasks but also in natural language processing and time series analysis. One of the strengths of convolutional layers is the ability to learn features about spatial relations in the input domain using various parameterized convolutional kernels. However, in time series analysis learning such spatial relations is not necessarily required nor effective. In such cases, kernels which model temporal dependencies or kernels with broader spatial resolutions are recommended for more efficient training as proposed by dilation kernels. However, the dilation has to be fixed a priori which limits the flexibility of the kernels. We propose generalized dilation networks which generalize the initial dilations in two aspects. First we derive an end-to-end learnable architecture for dilation layers where also the dilation rate can be learned. Second we break up the strict dilation structure, in that we develop kernels operating independently in the input space.
Image denoising is a classical problem in low level computer vision. Model-based optimization methods and deep learning approaches have been the two main strategies for solving the problem. Model-based optimization methods are flexible for handling different inverse problems but are usually time-consuming. In contrast, deep learning methods have fast testing speed but the performance of these CNNs is still inferior. To address this issue, here we propose a novel deep residual learning model that combines the dilated residual convolution and multi-scale convolution groups. Due to the complex patterns and structures of inside an image, the multiscale convolution group is utilized to learn those patterns and enlarge the receptive field. Specifically, the residual connection and batch normalization are utilized to speed up the training process and maintain the denoising performance. In order to decrease the gridding artifacts, we integrate the hybrid dilated convolution design into our model. To this end, this paper aims to train a lightweight and effective denoiser based on multiscale convolution group. Experimental results have demonstrated that the enhanced denoiser can not only achieve promising denoising results, but also become a strong competitor in practical application.
In the world of Deep Computer Vision, there are several types of convolutional layers that differ from the original convolutional layer which was discussed in the previous Deep CV tutorial. These layers are used in many popular advanced convolutional neural network implementations found in the Deep Learning research side of Computer Vision. Each of these layers has a different mechanism than the original convolutional layer and this allows each type of layer to have a particularly special function. Before getting into these advanced convolutional layers, let's first have a quick recap on how the original convolutional layer works. In the original convolutional layer, we have an input that has a shape (W*H*C) where W and H are the width and height of each feature map and C is the number of channels, which is basically the total number of feature maps.