AITopics

Symmetric Positive Definite (SPD) matrix learning methods have become popular in many image and video processing tasks, thanks to their ability to learn appropriate statistical representations while respecting Riemannian geometry of underlying SPD manifolds. In this paper we build a Riemannian network architecture to open up a new direction of SPD matrix non-linear learning in a deep model. In particular, we devise bilinear mapping layers to transform input SPD matrices to more desirable SPD matrices, exploit eigenvalue rectification layers to apply a non-linear activation function to the new SPD matrices, and design an eigenvalue logarithm layer to perform Riemannian computing on the resulting SPD matrices for regular output layers. For training the proposed deep network, we exploit a new backpropagation with a variant of stochastic gradient descent on Stiefel manifolds to update the structured connection weights and the involved SPD matrix data. We show through experiments that the proposed SPD matrix network can be simply trained and outperform existing SPD matrix learning and state-of-the-art methods in three typical visual classification tasks.

artificial intelligence, machine learning, matrix, (19 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: Europe > Switzerland (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

He, Ran (Institute of Automation, Chinese Academy of Sciences) | Wu, Xiang (Institute of Automation, Chinese Academy of Sciences) | Sun, Zhenan (Institute of Automation, Chinese Academy of Sciences) | Tan, Tieniu (Institute of Automation, Chinese Academy of Sciences)

Learning Invariant Deep Representation for NIR-VIS Face Recognition

Visual versus near infrared (VIS-NIR) face recognition is still a challenging heterogeneous task due to large appearance difference between VIS and NIR modalities. This paper presents a deep convolutional network approach that uses only one network to map both NIR and VIS images to a compact Euclidean space. The low-level layers of this network are trained only on large-scale VIS data. Each convolutional layer is implemented by the simplest case of maxout operator. The high-level layer is divided into two orthogonal subspaces that contain modality-invariant identity information and modality-variant spectrum information respectively. Our joint formulation leads to an alternating minimization approach for deep representation at the training time and an efficient computation for heterogeneous data at the testing time. Experimental evaluations show that our method achieves 94% verification rate at FAR=0.1% on the challenging CASIA NIR-VIS 2.0 face recognition dataset. Compared with state-of-the-art methods, it reduces the error rate by 58% only with a compact 64-D representation.

artificial intelligence, machine learning, recognition, (17 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Han, Tian (University of California, Los Angeles) | Lu, Yang (University of California, Los Angeles) | Zhu, Song-Chun (University of California, Los Angeles) | Wu, Ying Nian (University of California, Los Angeles)

Alternating Back-Propagation for Generator Network

This paper proposes an alternating back-propagation algorithm for learning the generator network model. The model is a non-linear generalization of factor analysis. In this model, the mapping from the continuous latent factors to the observed signal is parametrized by a convolutional neural network. The alternating back-propagation algorithm iterates the following two steps: (1) Inferential back-propagation, which infers the latent factors by Langevin dynamics or gradient descent. (2) Learning back-propagation, which updates the parameters given the inferred latent factors by gradient descent. The gradient computations in both steps are powered by back-propagation, and they share most of their code in common. We show that the alternating back-propagation algorithm can learn realistic generator models of natural images, video sequences, and sounds. Moreover, it can also be used to learn from incomplete or indirect training data.

algorithm, artificial intelligence, machine learning, (19 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > California (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Ghosh, Aritra (Microsoft, Bangalore) | Kumar, Himanshu (Indian Institute of Science, Bangalore) | Sastry, P. S. (Indian Institute of Science, Bangalore)

Robust Loss Functions under Label Noise for Deep Neural Networks

In many applications of classifier learning, training data suffers from label noise. Deep networks are learned using huge training data where the problem of noisy labels is particularly relevant. The current techniques proposed for learning deep networks under label noise focus on modifying the network architecture and on algorithms for estimating true labels from noisy labels. An alternate approach would be to look for loss functions that are inherently noise-tolerant. For binary classification there exist theoretical results on loss functions that are robust to label noise. In this paper, we provide some sufficient conditions on a loss function so that risk minimization under that loss function would be inherently tolerant to label noise for multiclass classification problems. These results generalize the existing results on noise-tolerant loss functions for binary classification. We study some of the widely used loss functions in deep networks and show that the loss function based on mean absolute value of error is inherently robust to label noise. Thus standard back propagation is enough to learn the true classifier even under label noise. Through experiments, we illustrate the robustness of risk minimization with such loss functions for learning neural networks.

artificial intelligence, machine learning, noise, (18 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Deep MIML Network

Feng, Ji (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)

In many real world applications, the concerned objects are with multiple labels, and can be represented as a bag of instances. Multi-instance Multi-label (MIML) learning provides a framework for handling such task and has exhibited excellent performance in various domains. In a MIML setting, the feature representation of instances usually has big impact on the final performance; inspired by the recent deep learning studies, in this paper, we propose the DeepMIML network which exploits deep neural network formation to generate instance representation for MIML. The sub-concept learning component of the DeepMIML structure reserves the instance-label relation discovery ability of MIML algorithms; that is, it can automatically locating the key input patterns that trigger the labels. The effectiveness of DeepMIML network is validated by experiments on various domains of data.

artificial intelligence, machine learning, sub-concept layer, (13 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Chen, Zhourong (The Hong Kong University of Science and Technology) | Zhang, Nevin L. (The Hong Kong University of Science and Technology) | Yeung, Dit-Yan (The Hong Kong University of Science and Technology) | Chen, Peixian (The Hong Kong University of Science and Technology)

Sparse Boltzmann Machines with Structure Learning as Applied to Text Analysis

We are interested in exploring the possibility and benefits of structure learning for deep models. As the first step, this paper investigates the matter for Restricted Boltzmann Machines (RBMs) . We conduct the study with Replicated Softmax, a variant of RBMs for unsupervised text analysis. We present a method for learning what we call Sparse Boltzmann Machines , where each hidden unit is connected to a subset of the visible units instead of all of them. Empirical results show that the method yields models with significantly improved model fit and interpretability as compared with RBMs where each hidden unit is connected to all visible units.

artificial intelligence, machine learning, replicated softmax, (17 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America > United States (0.28)
Asia > China (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.81)

Unsupervised Domain Adaptation with a Relaxed Covariate Shift Assumption

Adel, Tameem (University of Manchester) | Zhao, Han (Carnegie Mellon University) | Wong, Alexander (University of Waterloo)

The distributions can be different (Storkey and Sugiyama 2006; training and test domains are commonly referred to in the Ben-David and Urner 2012; 2014). Covariate shift is a valid domain adaptation literature as the source and target domains, assumption in some problems, but it can as well be quite respectively. Domain diversity can emerge as a result of the unrealistic for many other domain adaptation tasks where the scarcity of available labeled data from the target domain. It conditional label distributions are not (or, more precisely, not can as well be innate in the problem itself due to, for example, guaranteed to be) identical. The simplification resulting from an ongoing change occurring to the source domain like assuming identical labeling distributions facilitates the quest in cases where the original source domain keeps changing for a tractable learning algorithm, albeit possibly at the cost over time. Domain adaptation aims at finding solutions for of reducing the expressiveness power of the representation, this kind of problem, where the training (source) data are and consequently the accuracy of the resulting hypothesis.

artificial intelligence, assumption, machine learning, (17 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction

Zhang, Junbo (Microsoft Research) | Zheng, Yu (Microsoft Research) | Qi, Dekang ( Southwest Jiaotong University )

Forecasting the flow of crowds is of great importance to traffic management and public safety, and very challenging as it is affected by many complex factors, such as inter-region traffic, events, and weather. We propose a deep-learning-based approach, called ST-ResNet, to collectively forecast the inflow and outflow of crowds in each and every region of a city. We design an end-to-end structure of ST-ResNet based on unique properties of spatio-temporal data. More specifically, we employ the residual neural network framework to model the temporal closeness, period, and trend properties of crowd traffic. For each property, we design a branch of residual convolutional units, each of which models the spatial properties of crowd traffic. ST-ResNet learns to dynamically aggregate the output of the three residual neural networks based on data, assigning different weights to different branches and regions. The aggregation is further combined with external factors, such as weather and day of the week, to predict the final traffic of crowds in each and every region. Experiments on two types of crowd flows in Beijing and New York City (NYC) demonstrate that the proposed ST-ResNet outperforms six well-known methods.

artificial intelligence, deep learning, machine learning, (18 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country:

Asia > China > Beijing > Beijing (0.26)
North America > United States > New York (0.24)

Industry:

Transportation > Infrastructure & Services (0.46)
Transportation > Ground > Road (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Fine-Grained Recurrent Neural Networks for Automatic Prostate Segmentation in Ultrasound Images

Yang, Xin (The Chinese University of Hong Kong) | Yu, Lequan (The Chinese University of Hong Kong) | Wu, Lingyun (Shenzhen University) | Wang, Yi (The Chinese University of Hong Kong) | Ni, Dong (Shenzhen University) | Qin, Jing (The Hong Kong Polytechnic University) | Heng, Pheng-Ann (The Chinese University of Hong Kong)

Boundary incompleteness raises great challenges to automatic prostate segmentation in ultrasound images. Shape prior can provide strong guidance in estimating the missing boundary, but traditional shape models often suffer from hand-crafted descriptors and local information loss in the fitting procedure. In this paper, we attempt to address those issues with a novel framework. The proposed framework can seamlessly integrate feature extraction and shape prior exploring, and estimate the complete boundary with a sequential manner. Our framework is composed of three key modules. Firstly, we serialize the static 2D prostate ultrasound images into dynamic sequences and then predict prostate shapes by sequentially exploring shape priors. Intuitively, we propose to learn the shape prior with the biologically plausible Recurrent Neural Networks (RNNs). This module is corroborated to be effective in dealing with the boundary incompleteness. Secondly, to alleviate the bias caused by different serialization manners, we propose a multi-view fusion strategy to merge shape predictions obtained from different perspectives. Thirdly, we further implant the RNN core into a multiscale Auto-Context scheme to successively refine the details of the shape prediction map. With extensive validation on challenging prostate ultrasound images, our framework bridges severe boundary incompleteness and achieves the best performance in prostate boundary delineation when compared with several advanced methods. Additionally, our approach is general and can be extended to other medical image segmentation tasks, where boundary incompleteness is one of the main challenges.

artificial intelligence, deep learning, machine learning, (18 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > China (0.29)

Industry:

Health & Medicine > Therapeutic Area (0.94)
Health & Medicine > Diagnostic Medicine > Imaging (0.89)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval

With benefits of low storage cost and fast query speed, cross-modal hashing has received considerable attention recently. However, almost all existing methods on cross-modal hashing cannot obtain powerful hash codes due to directly utilizing hand-crafted features or ignoring heterogeneous correlations across different modalities, which will greatly degrade the retrieval performance. In this paper, we propose a novel deep cross-modal hashing method to generate compact hash codes through an end-to-end deep learning architecture, which can effectively capture the intrinsic relationships between various modalities. Our architecture integrates different types of pairwise constraints to encourage the similarities of the hash codes from an intra-modal view and an inter-modal view, respectively. Moreover, additional decorrelation constraints are introduced to this architecture, thus enhancing the discriminative ability of each hash bit. Extensive experiments show that our proposed method yields state-of-the-art results on two cross-modal retrieval datasets.

artificial intelligence, machine learning, natural language, (20 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America (0.68)
Asia > China (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)