AITopics | Yang, Haizhao

Collaborating Authors

Yang, Haizhao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Attention Network: Accelerate Attention by Searching Where to Plug

Huang, Zhongzhan, Liang, Senwei, Liang, Mingfu, He, Wei, Yang, Haizhao

arXiv.org Artificial IntelligenceNov-27-2020

Recently, many plug-and-play self-attention modules are proposed to enhance the model generalization by exploiting the internal information of deep convolutional neural networks (CNNs). Previous works lay an emphasis on the design of attention module for specific functionality, e.g., light-weighted or task-oriented attention. However, they ignore the importance of where to plug in the attention module since they connect the modules individually with each block of the entire CNN backbone for granted, leading to incremental computational cost and number of parameters with the growth of network depth. Thus, we propose a framework called Efficient Attention Network (EAN) to improve the efficiency for the existing attention modules. In EAN, we leverage the sharing mechanism (Huang et al. 2020) to share the attention module within the backbone and search where to connect the shared attention module via reinforcement learning. Finally, we obtain the attention network with sparse connections between the backbone and modules, while (1) maintaining accuracy (2) reducing extra parameter increment and (3) accelerating inference. Extensive experiments on widely-used benchmarks and popular attention networks show the effectiveness of EAN. Furthermore, we empirically illustrate that our EAN has the capacity of transferring to other tasks and capturing the informative features. The code is available at https://github.com/gbup-group/EAN-efficient-attention-network

artificial intelligence, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2011.14058

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth

Shen, Zuowei, Yang, Haizhao, Zhang, Shijun

arXiv.org Machine LearningOct-26-2020

Recently, there has been a large number of successful real-world applications of deep neural networks in many fields of computer science and engineering, especially for large-scale and high-dimensional learning problems. Understanding the approximation capacity of deep neural networks has become a fundamental research direction for revealing the advantages of deep learning compared to traditional methods. This paper introduces new theories and network architectures achieving root exponential convergence and avoiding the curse of dimensionality simultaneously for (Hölder) continuous functions with an explicit error bound in deep network approximation, which might be two foundational laws supporting the application of deep network approximation in large-scale and high-dimensional problems. The approximation results here are quantitative and apply to networks with essentially arbitrary width and depth. These results suggest considering Floor-ReLU networks as a possible alternative to ReLU networks in deep learning.

deep learning, floor-relu network, neural network, (16 more...)

arXiv.org Machine Learning

2006.12231

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neural Network Approximation: Three Hidden Layers Are Enough

Shen, Zuowei, Yang, Haizhao, Zhang, Shijun

arXiv.org Machine LearningOct-25-2020

In particular, leveraging the power of advanced yet simple activation functions, we will introduce new theories and network architectures with only three hidden layers achieving exponential convergence and avoiding the curse of dimensionality simultaneously for (Hölder) continuous functions with an explicit approximation bound. The theories established in this paper would provide new insights to explain why deeper neural networks are better than one-hidden-layer neural networks for large-scale and high-dimensional problems. The approximation theories here are constructive (i.e., with explicit formulas to specify network parameters) and quantitative (i.e., results valid for essentially arbitrary width and/or depth without lower bound constraints) with explicit error bounds working for three-hiddenlayer networks with arbitrary width. Constructive approximation with quantitative results and explicit error bounds would provide important guides for deciding the network sizes in deep learning.

artificial intelligence, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

2010.14075

Country:

North America > United States (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise

Liang, Senwei, Huang, Zhongzhan, Liang, Mingfu, Yang, Haizhao

arXiv.org Machine LearningAug-12-2019

Batch Normalization (BN) (Ioffe and Szegedy 2015) normalizes the features of an input image via statistics of a batch of images and this batch information is considered as batch noise that will be brought to the features of an instance by BN. We offer a point of view that self-attention mechanism can help regulate the batch noise by enhancing instance-specific information. Based on this view, we propose combining BN with a self-attention mechanism to adjust the batch noise and give an attention-based version of BN called Instance Enhancement Batch Normalization (IEBN) which recalibrates channel information by a simple linear transformation. IEBN outperforms BN with a light parameter increment in various visual tasks universally for different network structures and benchmark data sets. Besides, even if under the attack of synthetic noise, IEBN can still stabilize network training with good generalization. The code of IEBN is available at https://github.com/gbup-group/IEBN

deep learning, neural network, normalization, (18 more...)

arXiv.org Machine Learning

1908.04008

Country: Asia (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

DIANet: Dense-and-Implicit Attention Network

Huang, Zhongzhan, Liang, Senwei, Liang, Mingfu, Yang, Haizhao

arXiv.org Artificial IntelligenceMay-25-2019

Attention-based deep neural networks (DNNs) that emphasize the informative information in a local receptive field of an input image have successfully boosted the performance of deep learning in various challenging problems. In this paper, we propose a Dense-and-Implicit-Attention (DIA) unit that can be applied universally to different network architectures and enhance their generalization capacity by repeatedly fusing the information throughout different network layers. The communication of information between different layers is carried out via a modified Long Short Term Memory (LSTM) module within the DIA unit that is in parallel with the DNN. The sharing DIA unit links multi-scale features from different depth levels of the network implicitly and densely. Experiments on benchmark datasets show that the DIA unit is capable of emphasizing channel-wise feature interrelation and leads to significant improvement of image classification accuracy. We further empirically show that the DIA unit is a nonlocal normalization tool that enhances the Batch Normalization. The code is released at https://github.com/gbup-group/DIANet.

deep learning, dia unit, neural network, (15 more...)

arXiv.org Artificial Intelligence

1905.10671

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generative Imaging and Image Processing via Generative Encoder

Chen, Lin, Yang, Haizhao

arXiv.org Machine LearningMay-23-2019

This paper introduces a novel generative encoder (GE) model for generative imaging and image processing with applications in compressed sensing and imaging, image compression, denoising, inpainting, deblurring, and super-resolution. The GE model consists of a pre-training phase and a solving phase. In the pre-training phase, we separately train two deep neural networks: a generative adversarial network (GAN) with a generator $\G$ that captures the data distribution of a given image set, and an auto-encoder (AE) network with an encoder $\EN$ that compresses images following the estimated distribution by GAN. In the solving phase, given a noisy image $x=\mathcal{P}(x^*)$, where $x^*$ is the target unknown image, $\mathcal{P}$ is an operator adding an addictive, or multiplicative, or convolutional noise, or equivalently given such an image $x$ in the compressed domain, i.e., given $m=\EN(x)$, we solve the optimization problem \[ z^*=\underset{z}{\mathrm{argmin}} \|\EN(\G(z))-m\|_2^2+\lambda\|z\|_2^2 \] to recover the image $x^*$ in a generative way via $\hat{x}:=\G(z^*)\approx x^*$, where $\lambda>0$ is a hyperparameter. The GE model unifies the generative capacity of GANs and the stability of AEs in an optimization framework above instead of stacking GANs and AEs into a single network or combining their loss functions into one as in existing literature. Numerical experiments show that the proposed model outperforms several state-of-the-art algorithms.

deep learning, neural network, reconstruction, (21 more...)

arXiv.org Machine Learning

1905.133

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CASS: Cross Adversarial Source Separation via Autoencoder

Ong, Yong Zheng, Chui, Charles K., Yang, Haizhao

arXiv.org Machine LearningMay-23-2019

This paper introduces a cross adversarial source separation (CASS) framework via autoencoder, a new model that aims at separating an input signal consisting of a mixture of multiple components into individual components defined via adversarial learning and autoencoder fitting. CASS unifies popular generative networks like auto-encoders (AEs) and generative adversarial networks (GANs) in a single framework. The basic building block that filters the input signal and reconstructs the $i$-th target component is a pair of deep neural networks $\mathcal{EN}_i$ and $\mathcal{DE}_i$ as an encoder for dimension reduction and a decoder for component reconstruction, respectively. The decoder $\mathcal{DE}_i$ as a generator is enhanced by a discriminator network $\mathcal{D}_i$ that favors signal structures of the $i$-th component in the $i$-th given dataset as guidance through adversarial learning. In contrast with existing practices in AEs which trains each Auto-Encoder independently, or in GANs that share the same generator, we introduce cross adversarial training that emphasizes adversarial relation between any arbitrary network pairs $(\mathcal{DE}_i,\mathcal{D}_j)$, achieving state-of-the-art performance especially when target components share similar data structures.

deep learning, separation, vascular disease, (22 more...)

arXiv.org Machine Learning

1905.09877

Country: Asia > China > Hong Kong (0.14)

Genre: Research Report (0.51)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (0.97)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

SelectNet: Learning to Sample from the Wild for Imbalanced Data Training

Liu, Yunru, Gao, Tingran, Yang, Haizhao

arXiv.org Machine LearningMay-23-2019

Supervised learning from training data with imbalanced class sizes, a commonly encountered scenario in real applications such as anomaly/fraud detection, has long been considered a significant challenge in machine learning. Motivated by recent progress in curriculum and self-paced learning, we propose to adopt a semi-supervised learning paradigm by training a deep neural network, referred to as SelectNet, to selectively add unlabelled data together with their predicted labels to the training dataset. Unlike existing techniques designed to tackle the difficulty in dealing with class imbalanced training data such as resampling, cost-sensitive learning, and margin-based learning, SelectNet provides an end-to-end approach for learning from important unlabelled data "in the wild" that most likely belong to the under-sampled classes in the training data, thus gradually mitigates the imbalance in the data used for training the classifier. We demonstrate the efficacy of SelectNet through extensive numerical experiments on standard datasets in computer vision.

deep learning, minor class, neural network, (19 more...)

arXiv.org Machine Learning

1905.09872

Country:

Asia > Singapore (0.14)
Europe > Sweden (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Nonlinear Approximation via Compositions

Shen, Zuowei, Yang, Haizhao, Zhang, Shijun

arXiv.org Machine LearningApr-10-2019

We study the approximation efficiency of function compositions in nonlinear approximation, especially the case when compositions are implemented using multi-layer feed-forward neural networks (FNNs) with ReLU activation functions. The central question of interest is what are the advantages of function compositions in generating dictionaries and what is the optimal implementation of function compositions via ReLU FNNs, especially in modern computing architecture. This question is answered by studying the $N$-term approximation rate, which is the decrease in error versus the number of computational nodes (neurons) in the approximant, together with parallel efficiency for the first time. First, for an arbitrary function $f$ on $[0,1]$, regardless of its smoothness and even the continuity, if $f$ can be approximated via nonlinear approximation using one-hidden-layer ReLU FNNs with an approximation rate $O(N^{-\eta})$, we quantitatively show that dictionaries with function compositions via deep ReLU FNNs can improve the approximation rate to $O(N^{-2\eta})$. Second, for H{\"o}lder continuous functions of order $\alpha$ with a uniform Lipchitz constant $\omega$ on a $d$-dimensional cube, we show that the $N$-term approximation via ReLU FNNs with two or three function compositions can achieve an approximation rate $O( N^{-2\alpha/d})$. The approximation rate can be improved to $O(L^{-2\alpha/d})$ by composing $L$ times, if $N$ is fixed and sufficiently large; but further compositions cannot achieve the approximation rate $O(N^{-\alpha L/d})$. Finally, considering the computational efficiency per training iteration in parallel computing, FNNs with $O(1)$ hidden layers are an optimal choice for approximating H{\"o}lder continuous functions if computing resources are enough.

approximation, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1902.1017

Country:

North America > United States (0.46)
Asia (0.28)
Europe (0.28)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Drop-Activation: Implicit Parameter Reduction and Harmonic Regularization

Liang, Senwei, Kwoo, Yuehaw, Yang, Haizhao

arXiv.org Machine LearningNov-14-2018

Overfitting frequently occurs in deep learning. In this paper, we propose a novel regularization method called Drop-Activation to reduce overfitting and improve generalization. The key idea is to \emph{drop} nonlinear activation functions by setting them to be identity functions randomly during training time. During testing, we use a deterministic network with a new activation function to encode the average effect of dropping activations randomly. Experimental results on CIFAR-10, CIFAR-100, SVHN, and EMNIST show that Drop-Activation generally improves the performance of popular neural network architectures. Furthermore, unlike dropout, as a regularizer Drop-Activation can be used in harmony with standard training and regularization techniques such as Batch Normalization and AutoAug. Our theoretical analyses support the regularization effect of Drop-Activation as implicit parameter reduction and its capability to be used together with Batch Normalization.

batch normalization, deep learning, neural network, (15 more...)

arXiv.org Machine Learning

1811.0585

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback