AITopics

2105.01883

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceMar-24-2021

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Ding, Xiaohan, Zhang, Xiangyu, Han, Jungong, Ding, Guiguang

We propose a universal building block of Convolutional Neural Network (ConvNet) to improve the performance without any inference-time costs. The block is named Diverse Branch Block (DBB), which enhances the representational capacity of a single convolution by combining diverse branches of different scales and complexities to enrich the feature space, including sequences of convolutions, multi-scale convolutions, and average pooling. After training, a DBB can be equivalently converted into a single conv layer for deployment. Unlike the advancements of novel ConvNet architectures, DBB complicates the training-time microstructure while maintaining the macro architecture, so that it can be used as a drop-in replacement for regular conv layers of any architecture. In this way, the model can be trained to reach a higher level of performance and then transformed into the original inference-time structure for inference. DBB improves ConvNets on image classification (up to 1.9% higher top-1 accuracy on ImageNet), object detection and semantic segmentation. The PyTorch code and models are released at https://github.com/DingXiaoH/DiverseBranchBlock.

conv, deep learning, neural network, (17 more...)

2103.13425

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

arXiv.org Artificial IntelligenceMar-19-2021

Computational Emotion Analysis From Images: Recent Advances and Future Directions

Zhao, Sicheng, Huang, Quanwei, Tang, Youbao, Yao, Xingxu, Yang, Jufeng, Ding, Guiguang, Schuller, Björn W.

Understanding the information contained in the increasing repository of data is of vital importance to behavior sciences [34], which aim to predict human decision making and enable wide applications, such as mental health evaluation [14], business recommendation [33], opinion mining [54], and entertainment assistance [78]. Analyzing media data on an affective (emotional) level belongs to affective computing, which is defined as "the computing that relates to, arises from, or influences emotions" [38]. The importance of emotions has been emphasized for decades since Minsky introduced the relationship between intelligence and emotion [31]. One famous claim is "The question is not whether intelligent machines can have any emotions, but whether machines can be intelligent without emotions." Based on the types of media data, the research on affective computing can be classified into different categories, such as text [13, 72], image [75], speech [45], music [64], facial expression [24], video [56, 79], physiological signals [2], and multi-modal data [52, 41, 80]. The adage "a picture is worth a thousand words" indicates that images can convey rich semantics.

deep learning, emotion, neural network, (19 more...)

2103.10798

Genre: Research Report (1.00)

Industry:

Information Technology (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.68)
(3 more...)

arXiv.org Artificial IntelligenceJan-10-2021

RepVGG: Making VGG-style ConvNets Great Again

Ding, Xiaohan, Zhang, Xiangyu, Ma, Ningning, Han, Jungong, Ding, Guiguang, Sun, Jian

We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG. On ImageNet, RepVGG reaches over 80\% top-1 accuracy, which is the first time for a plain model, to the best of our knowledge. On NVIDIA 1080Ti GPU, RepVGG models run 83% faster than ResNet-50 or 101% faster than ResNet-101 with higher accuracy and show favorable accuracy-speed trade-off compared to the state-of-the-art models like EfficientNet and RegNet. The code and trained models are available at https://github.com/megvii-model/RepVGG.

architecture, deep learning, neural network, (16 more...)

2101.03697

Country:

Europe (0.67)
North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Artificial IntelligenceNov-24-2020

Emotional Semantics-Preserved and Feature-Aligned CycleGAN for Visual Emotion Adaptation

Zhao, Sicheng, Chen, Xuanbai, Yue, Xiangyu, Lin, Chuang, Xu, Pengfei, Krishna, Ravi, Yang, Jufeng, Ding, Guiguang, Sangiovanni-Vincentelli, Alberto L., Keutzer, Kurt

Thanks to large-scale labeled training data, deep neural networks (DNNs) have obtained remarkable success in many vision and multimedia tasks. However, because of the presence of domain shift, the learned knowledge of the well-trained DNNs cannot be well generalized to new domains or datasets that have few labels. Unsupervised domain adaptation (UDA) studies the problem of transferring models trained on one labeled source domain to another unlabeled target domain. In this paper, we focus on UDA in visual emotion analysis for both emotion distribution learning and dominant emotion classification. Specifically, we design a novel end-to-end cycle-consistent adversarial model, termed CycleEmotionGAN++. First, we generate an adapted domain to align the source and target domains on the pixel-level by improving CycleGAN with a multi-scale structured cycle-consistency loss. During the image translation, we propose a dynamic emotional semantic consistency loss to preserve the emotion labels of the source images. Second, we train a transferable task classifier on the adapted domain with feature-level alignment between the adapted and target domains. We conduct extensive UDA experiments on the Flickr-LDL & Twitter-LDL datasets for distribution learning and ArtPhoto & FI datasets for emotion classification. The results demonstrate the significant improvements yielded by the proposed CycleEmotionGAN++ as compared to state-of-the-art UDA approaches.

adaptation, deep learning, neural network, (24 more...)

2011.1247

Country:

Asia > China (0.28)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Education (0.48)
Information Technology > Services (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Machine LearningSep-1-2020

Lossless CNN Channel Pruning via Gradient Resetting and Convolutional Re-parameterization

Ding, Xiaohan, Hao, Tianxiang, Liu, Ji, Han, Jungong, Guo, Yuchen, Ding, Guiguang

Channel pruning (a.k.a. filter pruning) aims to slim down a convolutional neural network (CNN) by reducing the width (i.e., numbers of output channels) of convolutional layers. However, as CNN's representational capacity depends on the width, doing so tends to degrade the performance. A traditional learning-based channel pruning paradigm applies a penalty on parameters to improve the robustness to pruning, but such a penalty may degrade the performance even before pruning. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain the performance and the latter learn for efficiency. By training the re-parameterized model using regular SGD on the former but a novel update rule with penalty gradients on the latter, we achieve structured sparsity, enabling us to equivalently convert the re-parameterized model into the original architecture with narrower layers. With our method, we can slim down a standard ResNet-50 with 76.15\% top-1 accuracy on ImageNet to a narrower one with only 43.9\% FLOPs and no accuracy drop. Code and models are released at https://github.com/DingXiaoH/ResRep.

deep learning, neural network, pruning, (18 more...)

2007.0326

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Machine LearningOct-25-2019

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

Ding, Xiaohan, Ding, Guiguang, Zhou, Xiangxin, Guo, Yuchen, Han, Jungong, Liu, Ji

Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front-end devices. DNN pruning is an approach for deep model compression, which aims at eliminating some parameters with tolerable performance degradation. In this paper, we propose a novel momentum-SGD-based optimization method to reduce the network complexity by on-the-fly pruning. Concretely, given a global compression ratio, we categorize all the parameters into two parts at each training iteration which are updated using different rules. In this way, we gradually zero out the redundant parameters, as we update them using only the ordinary weight decay but no gradients derived from the objective function. As a departure from prior methods that require heavy human works to tune the layer-wise sparsity ratios, prune by solving complicated non-differentiable problems or finetune the model after pruning, our method is characterized by 1) global compression that automatically finds the appropriate per-layer sparsity ratios; 2) end-to-end training; 3) no need for a time-consuming re-training process after pruning; and 4) superior capability to find better winning tickets which have won the initialization lottery.

deep learning, neural network, pruning, (18 more...)

1909.12778

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-11-2019

PDANet: Polarity-consistent Deep Attention Network for Fine-grained Visual Emotion Regression

Zhao, Sicheng, Jia, Zizhou, Chen, Hui, Li, Leida, Ding, Guiguang, Keutzer, Kurt

Existing methods on visual emotion analysis mainly focus on coarse-grained emotion classification, i.e. assigning an image with a dominant discrete emotion category. However, these methods cannot well reflect the complexity and subtlety of emotions. In this paper, we study the fine-grained regression problem of visual emotions based on convolutional neural networks (CNNs). Specifically, we develop a Polarity-consistent Deep Attention Network (PDANet), a novel network architecture that integrates attention into a CNN with an emotion polarity constraint. First, we propose to incorporate both spatial and channel-wise attentions into a CNN for visual emotion regression, which jointly considers the local spatial connectivity patterns along each channel and the interdependency between different channels. Second, we design a novel regression loss, i.e. polarity-consistent regression (PCR) loss, based on the weakly supervised emotion polarity to guide the attention generation. By optimizing the PCR loss, PDANet can generate a polarity preserved attention map and thus improve the emotion regression performance. Extensive experiments are conducted on the IAPS, NAPS, and EMOTIC datasets, and the results demonstrate that the proposed PDANet outperforms the state-of-the-art approaches by a large margin for fine-grained visual emotion regression. Our source code is released at: https://github.com/ZizhouJia/PDANet.

deep learning, emotion, neural network, (21 more...)

doi: 10.1145/3343031.3351062

1909.05693

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

arXiv.org Machine LearningMay-12-2019

Approximated Oracle Filter Pruning for Destructive CNN Width Optimization

Ding, Xiaohan, Ding, Guiguang, Guo, Yuchen, Han, Jungong, Yan, Chenggang

It is not easy to design and run Convolutional Neural Networks (CNNs) due to: 1) finding the optimal number of filters (i.e., the width) at each layer is tricky, given an architecture; and 2) the computational intensity of CNNs impedes the deployment on computationally limited devices. Oracle Pruning is designed to remove the unimportant filters from a well-trained CNN, which estimates the filters' importance by ablating them in turn and evaluating the model, thus delivers high accuracy but suffers from intolerable time complexity, and requires a given resulting width but cannot automatically find it. To address these problems, we propose Approximated Oracle Filter Pruning (AOFP), which keeps searching for the least important filters in a binary search manner, makes pruning attempts by masking out filters randomly, accumulates the resulting errors, and finetunes the model via a multi-path framework. As AOFP enables simultaneous pruning on multiple layers, we can prune an existing very deep CNN with acceptable time cost, negligible accuracy drop, and no heuristic knowledge, or re-design a model which exerts higher accuracy and faster inference.

deep learning, neural network, pruning, (18 more...)

1905.04748

Country:

Asia > China (0.28)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

arXiv.org Machine LearningApr-8-2019

Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure

Ding, Xiaohan, Ding, Guiguang, Guo, Yuchen, Han, Jungong

The redundancy is widely recognized in Convolutional Neural Networks (CNNs), which enables to remove unimportant filters from convolutional layers so as to slim the network with acceptable performance drop. Inspired by the linear and combinational properties of convolution, we seek to make some filters increasingly close and eventually identical for network slimming. To this end, we propose Centripetal SGD (C-SGD), a novel optimization method, which can train several filters to collapse into a single point in the parameter hyperspace. When the training is completed, the removal of the identical filters can trim the network with NO performance loss, thus no finetuning is needed. By doing so, we have partly solved an open problem of constrained filter pruning on CNNs with complicated structure, where some layers must be pruned following others. Our experimental results on CIFAR-10 and ImageNet have justified the effectiveness of C-SGD-based filter pruning. Moreover, we have provided empirical evidences for the assumption that the redundancy in deep neural networks helps the convergence of training by showing that a redundant CNN trained using C-SGD outperforms a normally trained counterpart with the equivalent width.

deep learning, neural network, pruning, (20 more...)

1904.03837

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)