AITopics | He, Wei

Collaborating Authors

He, Wei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Attention Network: Accelerate Attention by Searching Where to Plug

Huang, Zhongzhan, Liang, Senwei, Liang, Mingfu, He, Wei, Yang, Haizhao

arXiv.org Artificial IntelligenceNov-27-2020

Recently, many plug-and-play self-attention modules are proposed to enhance the model generalization by exploiting the internal information of deep convolutional neural networks (CNNs). Previous works lay an emphasis on the design of attention module for specific functionality, e.g., light-weighted or task-oriented attention. However, they ignore the importance of where to plug in the attention module since they connect the modules individually with each block of the entire CNN backbone for granted, leading to incremental computational cost and number of parameters with the growth of network depth. Thus, we propose a framework called Efficient Attention Network (EAN) to improve the efficiency for the existing attention modules. In EAN, we leverage the sharing mechanism (Huang et al. 2020) to share the attention module within the backbone and search where to connect the shared attention module via reinforcement learning. Finally, we obtain the attention network with sparse connections between the backbone and modules, while (1) maintaining accuracy (2) reducing extra parameter increment and (3) accelerating inference. Extensive experiments on widely-used benchmarks and popular attention networks show the effectiveness of EAN. Furthermore, we empirically illustrate that our EAN has the capacity of transferring to other tasks and capturing the informative features. The code is available at https://github.com/gbup-group/EAN-efficient-attention-network

artificial intelligence, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2011.14058

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

Jia, Xianyan, Song, Shutao, He, Wei, Wang, Yangzihao, Rong, Haidong, Zhou, Feihu, Xie, Liqiang, Guo, Zhenyu, Yang, Yuanzhou, Yu, Liwei, Chen, Tiegang, Hu, Guangxiao, Shi, Shaohuai, Chu, Xiaowen

arXiv.org Machine LearningJul-30-2018

Synchronized stochastic gradient descent (SGD) optimizers with data parallelism are widely used in training large-scale deep neural networks. Although using larger mini-batch sizes can improve the system scalability by reducing the communication-to-computation ratio, it may hurt the generalization ability of the models. To this end, we build a highly scalable deep learning training system for dense GPU clusters with three main contributions: (1) We propose a mixed-precision training method that significantly improves the training throughput of a single GPU without losing accuracy. (2) We propose an optimization approach for extremely large mini-batch size (up to 64k) that can train CNN models on the ImageNet dataset without losing accuracy. (3) We propose highly optimized all-reduce algorithms that achieve up to 3x and 11x speedup on AlexNet and ResNet-50 respectively than NCCL-based training on a cluster with 1024 Tesla P40 GPUs. On training ResNet-50 with 90 epochs, the state-of-the-art GPU-based system with 1024 Tesla P100 GPUs spent 15 minutes and achieved 74.9\% top-1 test accuracy, and another KNL-based system with 2048 Intel KNLs spent 20 minutes and achieved 75.4\% accuracy. Our training system can achieve 75.8\% top-1 test accuracy in only 6.6 minutes using 2048 Tesla P40 GPUs. When training AlexNet with 95 epochs, our system can achieve 58.7\% top-1 test accuracy within 4 minutes, which also outperforms all other existing systems.

accuracy, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

1807.11205

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

One-Step Spectral Clustering via Dynamically Learning Affinity Matrix and Subspace

Zhu, Xiaofeng (Guangxi Normal University) | He, Wei (Guangxi Normal University) | Li, Yonggang (Guangxi Normal University) | Yang, Yang ( University of Electronic Science and Technology of China ) | Zhang, Shichao (Guangxi Normal University) | Hu, Rongyao (Guangxi Normal University) | Zhu, Yonghua (Guangxi University)

AAAI ConferencesFeb-14-2017

This paper proposes a one-step spectral clustering method by learning an intrinsic affinity matrix (i.e., the clustering result) from the low-dimensional space (i.e., intrinsic subspace) of original data. Specifically, the intrinsic affinitymatrix is learnt by: 1) the alignment of the initial affinity matrix learnt from original data; 2) the adjustment of the transformation matrix, which transfers the original feature space into its intrinsic subspace by simultaneously conducting feature selection and subspace learning; and 3) the clustering result constraint, i.e., the graph constructed by the intrinsic affinity matrix has exact c connected components where c is the number of clusters. In this way, two affinity matrices and a transformation matrix are iteratively updated until achieving their individual optimum, so that these two affinity matrices are consistent and the intrinsic subspace is learnt via the transformation matrix. Experimental results on both synthetic and benchmark datasets verified that our proposed method outputted more effective clustering result than the previous clustering methods.

affinity matrix, artificial intelligence, machine learning, (18 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > China (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.91)

Add feedback

Improved Neural Machine Translation with SMT Features

He, Wei (Baidu Inc.) | He, Zhongjun (Baidu Inc.) | Wu, Hua (Baidu Inc.) | Wang, Haifeng (Baidu Inc.)

AAAI ConferencesApr-19-2016

Neural machine translation (NMT) conducts end-to-end translation with a source language encoder and a target language decoder, making promising translation performance. However, as a newly emerged approach, the method has some limitations. An NMT system usually has to apply a vocabulary of certain size to avoid the time-consuming training and decoding, thus it causes a serious out-of-vocabulary problem. Furthermore, the decoder lacks a mechanism to guarantee all the source words to be translated and usually favors short translations, resulting in fluent but inadequate translations. In order to solve the above problems, we incorporate statistical machine translation (SMT) features, such as a translation model and an n-gram language model, with the NMT model under the log-linear framework. Our experiments show that the proposed method significantly improves the translation quality of the state-ofthe-art NMT system on Chinese-to-English translation tasks. Our method produces a gain of up to 2.33 BLEU score on NIST open test sets.

deep learning, neural network, translation, (21 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback