Neural Networks: Overviews

Recent Advances for a Better Understanding of Deep Learning Part I


This call for a better understanding of deep learning was the core of Ali Rahimi's Test-of-Time Award presentation at NIPS in December 2017. By comparing deep learning with alchemy, the goal of Ali was not to dismiss the entire field, but "to open a conversation". This goal has definitely been achieved and people are still debating whether our current practice of deep learning should be considered as alchemy, engineering or science. Seven months later, the machine learning community gathered again, this time in Stockholm for the International Conference on Machine Learning (ICML). With more than 5,000 participants and 629 papers published, it was one of the most important events regarding fundamental machine learning research.

AI, Machine Learning and Data Science Roundup: August 2018


This is an eclectic collection of interesting blog posts, software announcements and data applications I've noted over the past month or so. ONNX Model Zoo is now available, providing a library of pre-trained state-of-the-art models in deep learning in the ONNX format. In the 2018 IEEE Spectrum Top Programming Language rankings, Python takes the top spot and R ranks #7. Julia 1.0 has been released, marking the stabilization of the scientific computing language and promising forwards compatibility. Google announces Cloud AutoML, a beta service to train vision, text categorization, or language translation models from provided data.

Parallel Statistical and Machine Learning Methods for Estimation of Physical Load Machine Learning

Several statistical and machine learning methods are proposed to estimate the type and intensity of physical load and accumulated fatigue . They are based on the statistical analysis of accumulated and moving window data subsets with construction of a kurtosis-skewness diagram. This approach was applied to the data gathered by the wearable heart monitor for various types and levels of physical activities, and for people with various physical conditions. The different levels of physical activities, loads, and fitness can be distinguished from the kurtosis-skewness diagram, and their evolution can be monitored. Several metrics for estimation of the instant effect and accumulated effect (physical fatigue) of physical loads were proposed. The data and results presented allow to extend application of these methods for modeling and characterization of complex human activity patterns, for example, to estimate the actual and accumulated physical load and fatigue, model the potential dangerous development, and give cautions and advice in real time.

A Review of Learning with Deep Generative Models from perspective of graphical modeling Machine Learning

This document aims to provide a review on learning with deep generative models (DGMs), which is an highly-active area in machine learning and more generally, artificial intelligence. This review is not meant to be a tutorial, but when necessary, we provide self-contained derivations for completeness. This review has two features. First, though there are different perspectives to classify DGMs, we choose to organize this review from the perspective of graphical modeling, because the learning methods for directed DGMs and undirected DGMs are fundamentally different. Second, we differentiate model definitions from model learning algorithms, since different learning algorithms can be applied to solve the learning problem on the same model, and an algorithm can be applied to learn different models. We thus separate model definition and model learning, with more emphasis on reviewing, differentiating and connecting different learning algorithms. We also discuss promising future research directions. This review is by no means comprehensive as the field is evolving rapidly. The authors apologize in advance for any missed papers and inaccuracies in descriptions. Corrections and comments are highly welcome.

Kernel Flows: from learning kernels from data into the abyss Machine Learning

Learning can be seen as approximating an unknown function by interpolating the training data. Kriging offers a solution to this problem based on the prior specification of a kernel. We explore a numerical approximation approach to kernel selection/construction based on the simple premise that a kernel must be good if the number of interpolation points can be halved without significant loss in accuracy (measured using the intrinsic RKHS norm $\|\cdot\|$ associated with the kernel). We first test and motivate this idea on a simple problem of recovering the Green's function of an elliptic PDE (with inhomogeneous coefficients) from the sparse observation of one of its solutions. Next we consider the problem of learning non-parametric families of deep kernels of the form $K_1(F_n(x),F_n(x'))$ with $F_{n+1}=(I_d+\epsilon G_{n+1})\circ F_n$ and $G_{n+1} \in \operatorname{Span}\{K_1(F_n(x_i),\cdot)\}$. With the proposed approach constructing the kernel becomes equivalent to integrating a stochastic data driven dynamical system, which allows for the training of very deep (bottomless) networks and the exploration of their properties. These networks learn by constructing flow maps in the kernel and input spaces via incremental data-dependent deformations/perturbations (appearing as the cooperative counterpart of adversarial examples) and, at profound depths, they (1) can achieve accurate classification from only one data point per class (2) appear to learn archetypes of each class (3) expand distances between points that are in different classes and contract distances between points in the same class. For kernels parameterized by the weights of Convolutional Neural Network, minimizing approximation errors incurred by halving random subsets of interpolation points, appears to outperform training (the same CNN architecture) with relative entropy and dropout.

A Survey on Methods and Theories of Quantized Neural Networks Machine Learning

Deep neural networks are the state-of-the-art methods for many real-world tasks, such as computer vision, natural language processing and speech recognition. For all its popularity, deep neural networks are also criticized for consuming a lot of memory and draining battery life of devices during training and inference. This makes it hard to deploy these models on mobile or embedded devices which have tight resource constraints. Quantization is recognized as one of the most effective approaches to satisfy the extreme memory requirements that deep neural network models demand. Instead of adopting 32-bit floating point format to represent weights, quantized representations store weights using more compact formats such as integers or even binary numbers. Despite a possible degradation in predictive performance, quantization provides a potential solution to greatly reduce the model size and the energy consumption. In this survey, we give a thorough review of different aspects of quantized neural networks. Current challenges and trends of quantized neural networks are also discussed.

Grassmannian Learning: Embedding Geometry Awareness in Shallow and Deep Learning Machine Learning

Modern machine learning algorithms have been adopted in a range of signal-processing applications spanning computer vision, natural language processing, and artificial intelligence. Many relevant problems involve subspace-structured features, orthogonality constrained or low-rank constrained objective functions, or subspace distances. These mathematical characteristics are expressed naturally using the Grassmann manifold. Unfortunately, this fact is not yet explored in many traditional learning algorithms. In the last few years, there have been growing interests in studying Grassmann manifold to tackle new learning problems. Such attempts have been reassured by substantial performance improvements in both classic learning and learning using deep neural networks. We term the former as shallow and the latter deep Grassmannian learning. The aim of this paper is to introduce the emerging area of Grassmannian learning by surveying common mathematical problems and primary solution approaches, and overviewing various applications. We hope to inspire practitioners in different fields to adopt the powerful tool of Grassmannian learning in their research.

CT Super-resolution GAN Constrained by the Identical, Residual, and Cycle Learning Ensemble(GAN-CIRCLE) Machine Learning

Computed tomography (CT) is a popular medical imaging modality for screening, diagnosis, and image-guided therapy. However, CT has its limitations, especially involved ionizing radiation dose. Practically, it is highly desirable to have ultrahigh quality CT imaging for fine structural details at a minimized radiation dosage. In this paper, we propose a semi-supervised deep learning approach to recover high-resolution (HR) CT images from low-resolution (LR) counterparts. Especially, with the generative adversarial network (GAN) as the basic component, we enforce the cycle-consistency in terms of the Wasserstein distance to establish a nonlinear end-to-end mapping from noisy LR input images to denoised HR outputs. In this deep imaging process, we incorporate deep convolutional neural network (CNNs), residual learning, and network in network techniques for feature extraction and restoration. In contrast to the current trend of increasing network depth and complexity to boost the CT imaging performance, which limit its real-world applications by imposing considerable computational and memory overheads, we apply a parallel 1x1 CNN to reduce the dimensionality of the output of the hidden layer. Furthermore, we optimize the number of layers and the number of filters for each CNN layer. Quantitative and qualitative evaluations demonstrate that our proposed model is accurate, efficient and robust for SR image restoration from noisy LR input images. In particular, we validate our composite SR networks on two large-scale CT datasets, and obtain very encouraging results as compared to the other state-of-the-art methods.

AI has the power to change the way our world operates: Q&A (Includes interview)


The potential of artificial intelligence is the subject of a new book titled "Embracing the Power of AI". In the book, five AI experts have come together to take a deep dive into topics like the state of AI today, how organizations can implement it, and why AI will continue to impact our daily lives. One extract from the book explains how AI is: Just now reaching its moment of maturity--its tipping point--largely due to recent advancements in machine learning and deep learning, fueled by the enormous amount of data humankind is producing on a daily basis. These developments have emerged as potent drivers of innovation across a number of industries and are powering the rapid acceleration of artificial intelligence investments. As this new version of AI arrives, bringing with it applications that will affect the way that billions of people live and work, it's important to think beyond the technology and start a new conversation.

Paying Attention to Attention: Highlighting Influential Samples in Sequential Analysis Machine Learning

In (Yang et al. 2016), a hierarchical attention network (HAN) is created for document classification. The attention layer can be used to visualize text influential in classifying the document, thereby explaining the model's prediction. We successfully applied HAN to a sequential analysis task in the form of real-time monitoring of turn taking in conversations. However, we discovered instances where the attention weights were uniform at the stopping point (indicating all turns were equivalently influential to the classifier), preventing meaningful visualization for real-time human review or classifier improvement. We observed that attention weights for turns fluctuated as the conversations progressed, indicating turns had varying influence based on conversation state. Leveraging this observation, we develop a method to create more informative real-time visuals (as confirmed by human reviewers) in cases of uniform attention weights using the changes in turn importance as a conversation progresses over time.