Goto

Collaborating Authors

 Ding, Guiguang


Auto-Balanced Filter Pruning for Efficient Convolutional Neural Networks

AAAI Conferences

In recent years considerable research efforts have been devoted to compression techniques of convolutional neural networks (CNNs). Many works so far have focused on CNN connection pruning methods which produce sparse parameter tensors in convolutional or fully-connected layers. It has been demonstrated in several studies that even simple methods can effectively eliminate connections of a CNN. However, since these methods make parameter tensors just sparser but no smaller, the compression may not transfer directly to acceleration without support from specially designed hardware. In this paper, we propose an iterative approach named Auto-balanced Filter Pruning, where we pre-train the network in an innovative auto-balanced way to transfer the representational capacity of its convolutional layers to a fraction of the filters, prune the redundant ones, then re-train it to restore the accuracy. In this way, a smaller version of the original network is learned and the floating-point operations (FLOPs) are reduced. By applying this method on several common CNNs, we show that a large portion of the filters can be discarded without obvious accuracy drop, leading to significant reduction of computational burdens. Concretely, we reduce the inference cost of LeNet-5 on MNIST, VGG-16 and ResNet-56 on CIFAR-10 by 95.1%, 79.7% and 60.9%, respectively.


On Trivial Solution and High Correlation Problems in Deep Supervised Hashing

AAAI Conferences

Deep supervised hashing (DSH), which combines binary learning and convolutional neural network, has attracted considerable research interests and achieved promising performance for highly efficient image retrieval. In this paper, we show that the widely used loss functions, pair-wise loss and triplet loss, suffer from the trivial solution problem and usually lead to highly correlated bits in practice, limiting the performance of DSH. One important reason is that it is difficult to incorporate proper constraints into the loss functions under the mini-batch based optimization algorithm. To tackle these problems, we propose to adopt ensemble learning strategy for deep model training. We found out that this simple strategy is capable of effectively decorrelating different bits, making the hashcodes more informative. Moreover, it is very easy to parallelize the training and support incremental model learning, which are very useful for real-world applications but usually ignored by existing DSH approaches. Experiments on benchmarks demonstrate the proposed ensemble based DSH can improve the performance of DSH approaches significant.


Temporal-Difference Learning With Sampling Baseline for Image Captioning

AAAI Conferences

The existing methods for image captioning usually train the language model under the cross entropy loss, which results in the exposure bias and inconsistency of evaluation metric. Recent research has shown these two issues can be well addressed by policy gradient method in reinforcement learning domain attributable to its unique capability of directly optimizing the discrete and non-differentiable evaluation metric. In this paper, we utilize reinforcement learning method to train the image captioning model. Specifically, we train our image captioning model to maximize the overall reward of the sentences by adopting the temporal-difference (TD) learning method, which takes the correlation between temporally successive actions into account. In this way, we assign different values to different words in one sampled sentence by a discounted coefficient when back-propagating the gradient with the REINFORCE algorithm, enabling the correlation between actions to be learned. Besides, instead of estimating a "baseline" to normalize the rewards with another network, we utilize the reward of another Monte-Carlo sample as the "baseline" to avoid high variance. We show that our proposed method can improve the quality of generated captions and outperforms the state-of-the-art methods on the benchmark dataset MS COCO in terms of seven evaluation metrics.


Zero-Shot Learning With Attribute Selection

AAAI Conferences

Zero-shot learning (ZSL) is regarded as an effective way to construct classification models for target classes which have no labeled samples available. The basic framework is to transfer knowledge from (different) auxiliary source classes having sufficient labeled samples with some attributes shared by target and source classes as bridge. Attributes play an important role in ZSL but they have not gained sufficient attention in recent years. Previous works mostly assume attributes are perfect and treat each attribute equally. However, as shown in this paper, different attributes have different properties, such as their class distribution, variance, and entropy, which may have considerable impact on ZSL accuracy if treated equally. Based on this observation, in this paper we propose to use a subset of attributes, instead of the whole set, for building ZSL models. The attribute selection is conducted by considering the information amount and predictability under a novel joint optimization framework. To our knowledge, this is the first work that notices the influence of attributes themselves and proposes to use a refined attribute set for ZSL. Since our approach focuses on selecting good attributes for ZSL, it can be combined to any attribute based ZSL approaches so as to augment their performance. Experiments on four ZSL benchmarks demonstrate that our approach can improve zero-shot classification accuracy and yield state-of-the-art results.


Reference Based LSTM for Image Captioning

AAAI Conferences

Image captioning is an important problem in artificial intelligence, related to both computer vision and natural language processing. There are two main problems in existing methods: in the training phase, it is difficult to find which parts of the captions are more essential to the image; in the caption generation phase, the objects or the scenes are sometimes misrecognized. In this paper, we consider the training images as the references and propose a Reference based Long Short Term Memory (R-LSTM) model, aiming to solve these two problems in one goal. When training the model, we assign different weights to different words, which enables the network to better learn the key information of the captions. When generating a caption, the consensus score is utilized to exploit the reference information of neighbor images, which might fix the misrecognition and make the descriptions more natural-sounding. The proposed R-LSTM model outperforms the state-of-the-art approaches on the benchmark dataset MS COCO and obtains top 2 position on 11 of the 14 metrics on the online test server.


Zero-Shot Recognition via Direct Classifier Learning with Transferred Samples and Pseudo Labels

AAAI Conferences

As an interesting and emerging topic, zero-shot recognition (ZSR) makes it possible to train a recognition model by specifying the category's attributes when there are no labeled exemplars available. The fundamental idea for ZSR is to transfer knowledge from the abundant labeled data in different but related source classes via the class attributes. Conventional ZSR approaches adopt a two-step strategy in test stage, where the samples are projected into the attribute space in the first step, and then the recognition is carried out based on considering the relationship between samples and classes in the attribute space. Due to this intermediate transformation, information loss is unavoidable, thus degrading the performance of the overall system. Rather than following this two-step strategy, in this paper, we propose a novel one-step approach that is able to perform ZSR in the original feature space by using directly trained classifiers. To tackle the problem that no labeled samples of target classes are available, we propose to assign pseudo labels to samples based on the reliability and diversity, which in turn will be used to train the classifiers. Moreover, we adopt a robust SVM that accounts for the unreliability of pseudo labels. Extensive experiments on four datasets demonstrate consistent performance gains of our approach over the state-of-the-art two-step ZSR approaches.


Active Learning with Cross-Class Similarity Transfer

AAAI Conferences

How to save labeling efforts for training supervised classifiers is an important research topic in machine learning community. Active learning (AL) and transfer learning (TL) are two useful tools to achieve this goal, and their combination, i.e., transfer active learning (T-AL) has also attracted considerable research interest. However, existing T-AL approaches consider to transfer knowledge from a source/auxiliary domain which has the same class labels as the target domain, but ignore the relationship among classes. In this paper, we investigate a more practical setting where the classes in source domain are related/similar to but different from the target domain classes. Specifically, we propose a novel cross-class T-AL approach to simultaneously transfer knowledge from source domain and actively annotate the most informative samples in target domain so that we can train satisfactory classifiers with as few labeled samples as possible. In particular, based on the class-class similarity and sample-sample similarity, we adopt a similarity propagation to find the source domain samples that can well capture the characteristics of a target class and then transfer the similar samples as the (pseudo) labeled data for the target class. In turn, the labeled and transferred samples are used to train classifiers and actively select new samples for annotation. Extensive experiments on three datasets demonstrate that the proposed approach outperforms significantly the state-of-the-art related approaches.


Fully Convolutional Neural Networks with Full-Scale-Features for Semantic Segmentation

AAAI Conferences

In this work, we propose a novel method to involve full-scale-features into the fully convolutional neural networks (FCNs) for Semantic Segmentation. Current works on FCN has brought great advances in the task of semantic segmentation, but the receptive field, which represents region areas of input volume connected to any output neuron, limits the available information of output neuron's prediction accuracy. We investigate how to involve the full-scale or full-image features into FCNs to enrich the receptive field. Specially, the full-scale feature network (FFN) extends the full-connected network and makes an end-to-end unified training structure. It has two appealing properties. First, the introduction of full-scale-features is beneficial for prediction. We build a unified extracting network and explore several fusion functions for concatenating features. Amounts of experiments have been carried out to prove that full-scale-features makes fair accuracy raising. Second, FFN is applicable to many variants of FCN which could be regarded as a general strategy to improve the segmentation accuracy. Our proposed method is evaluated on PASCAL VOC 2012, and achieves a state-of-art result.


Active Learning with Cross-Class Knowledge Transfer

AAAI Conferences

When there are insufficient labeled samples for training a supervised model, we can adopt active learning to select the most informative samples for human labeling, or transfer learning to transfer knowledge from related labeled data source. Combining transfer learning with active learning has attracted much research interest in recent years. Most existing works follow the setting where the class labels in source domain are the same as the ones in target domain. In this paper, we focus on a more challenging cross-class setting where the class labels are totally different in two domains but related to each other in an intermediary attribute space, which is barely investigated before. We propose a novel and effective method that utilizes the attribute representation as the seed parameters to generate the classification models for classes. And we propose a joint learning framework that takes into account the knowledge from the related classes in source domain, and the information in the target domain. Besides, it is simple to perform uncertainty sampling, a fundamental technique for active learning, based on the framework. We conduct experiments on three benchmark datasets and the results demonstrate the efficacy of the proposed method.


Multi-Domain Active Learning for Recommendation

AAAI Conferences

Recently, active learning has been applied to recommendation to deal with data sparsity on a single domain. In this paper, we propose an active learning strategy for recommendation to alleviate the data sparsity in a multi-domain scenario. Specifically, our proposed active learning strategy simultaneously consider both specific and independent knowledge over all domains. We use the expected entropy to measure the generalization error of the domain-specific knowledge and propose a variance-based strategy to measure the generalization error of the domain-independent knowledge. The proposed active learning strategy use a unified function to effectively combine these two measurements. We compare our strategy with five state-of-the-art baselines on five different multi-domain recommendation tasks, which are constituted by three real-world data sets. The experimental results show that our strategy performs significantly better than all the baselines and reduces human labeling efforts by at least 5.6%, 8.3%, 11.8%, 12.5% and 15.4% on the five tasks, respectively.