AITopics | Liu, Zhuang

Plotting

Liu, Zhuang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space

Chavan, Arnav, Shen, Zhiqiang, Liu, Zhuang, Liu, Zechun, Cheng, Kwang-Ting, Xing, Eric

arXiv.org Artificial IntelligenceJan-3-2022

This paper explores the feasibility of finding an optimal sub-model from a vision transformer and introduces a pure vision transformer slimming (ViT-Slim) framework that can search such a sub-structure from the original model end-to-end across multiple dimensions, including the input tokens, MHSA and MLP modules with state-of-the-art performance. Our method is based on a learnable and unified l1 sparsity constraint with pre-defined factors to reflect the global importance in the continuous searching space of different dimensions. The searching process is highly efficient through a single-shot training scheme. For instance, on DeiT-S, ViT-Slim only takes ~43 GPU hours for searching process, and the searched structure is flexible with diverse dimensionalities in different modules. Then, a budget threshold is employed according to the requirements of accuracy-FLOPs trade-off on running devices, and a re-training process is performed to obtain the final models. The extensive experiments show that our ViT-Slim can compress up to 40% of parameters and 40% FLOPs on various vision transformers while increasing the accuracy by ~0.6% on ImageNet. We also demonstrate the advantage of our searched models on several downstream datasets. Our source code will be publicly available.

artificial intelligence, dimension, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2201.00814

Country: North America > United States (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Convolutional Networks with Dense Connectivity

Huang, Gao, Liu, Zhuang, Pleiss, Geoff, van der Maaten, Laurens, Weinberger, Kilian Q.

arXiv.org Machine LearningJan-8-2020

IEEE TRANSACTIONS ON P A TTERN ANAL YSIS AND MACHINE INTELLIGENCE 1 Convolutional Networks with Dense Connectivity Gao Huang, Zhuang Liu, Geoff Pleiss, Laurens van der Maaten and Kilian Q. Weinberger Abstract--Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections--one between each layer and its subsequent layer--our network has L(L 1) 2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, encourage feature reuse and substantially improve parameter efficiency . We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less parameters and computation to achieve high performance. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. I NTRODUCTION C ONVOLUTIONALneural networks (CNNs) have become the dominant machine learning approach for visual object recognition. Although they were originally introduced over 20 years ago [1], improvements in computer hardware and network structure have enabled the training of truly deep CNNs only recently. The original LeNet5 [2] consisted of 5 layers, VGG featured 19 [3], and thanks to the skip/shortcut connections, Highway Networks [4] and Residual Networks (ResNets) [5] have surpassed the 100-layer barrier. As CNNs become increasingly deep, a new research problem emerges: information about the input or gradient that passes through many layers it can vanish and "wash out" by the time it reaches the end (or beginning) of the network. Many recent publications address this problem. For example, Rectified Linear Unites (ReLU) [6] avoid gradient saturation, batch-normalization [7] reduces covariate shift across layers by re-scaling the outputs of its previous layer. ResNets [5] and Highway Networks [4] bypass signal from one layer to the next via identity connections. Stochastic depth [8] shortens ResNets by randomly dropping layers during training to allow better information and gradient flow.

deep learning, densenet, neural network, (22 more...)

arXiv.org Machine Learning

2001.02394

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Hardware (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Regularization Matters in Policy Optimization

Liu, Zhuang, Li, Xuanlin, Kang, Bingyi, Darrell, Trevor

arXiv.org Artificial IntelligenceOct-21-2019

Deep Reinforcement Learning (Deep RL) has been receiving increasingly more attention thanks to its encouraging performance on a variety of control tasks. Yet, conventional regularization techniques in training neural networks (e.g., $L_2$ regularization, dropout) have been largely ignored in RL methods, possibly because agents are typically trained and evaluated in the same environment. In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks. Interestingly, we find conventional regularization techniques on the policy networks can often bring large improvement on the task performance, and the improvement is typically more significant when the task is more difficult. We also compare with the widely used entropy regularization and find $L_2$ regularization is generally better. Our findings are further confirmed to be robust against the choice of training hyperparameters. We also study the effects of regularizing different components and find that only regularizing the policy network is typically enough. We hope our study provides guidance for future practices in regularizing policy optimization algorithms.

neural network, optimization problem, regularization, (19 more...)

arXiv.org Artificial Intelligence

1910.09191

Country:

North America > United States > California (0.14)
Europe > Sweden (0.14)

Genre: Research Report > New Finding (0.88)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Knowledge Distillation from Few Samples

Li, Tianhong, Li, Jianguo, Liu, Zhuang, Zhang, Changshui

arXiv.org Machine LearningDec-5-2018

Current knowledge distillation methods require full training data to distill knowledge from a large "teacher" network to a compact "student" network by matching certain statistics between "teacher" and "student" such as softmax outputs and feature responses. This is not only time-consuming but also inconsistent with human cognition in which children can learn knowledge from adults with few examples. This paper proposes a novel and simple method for knowledge distillation from few samples. Taking the assumption that both "teacher" and "student" have the same feature map sizes at each corresponding block, we add a 1x1 conv-layer at the end of each block in the student-net, and align the block-level outputs between "teacher" and "student" by estimating the parameters of the added layer with limited samples. We prove that the added layer can be absorbed/merged into the previous conv-layer to formulate a new conv-layer with the same size of parameters and computation cost as the previous one. Experiments verify that the proposed method is very efficient and effective to distill knowledge from teacher-net to student-net constructing in different ways on various datasets.

deep learning, fskd, neural network, (19 more...)

arXiv.org Machine Learning

1812.01839

Country:

North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Education (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Rethinking the Value of Network Pruning

Liu, Zhuang, Sun, Mingjie, Zhou, Tinghui, Huang, Gao, Darrell, Trevor

arXiv.org Machine LearningOct-11-2018

Network pruning is widely used for reducing the heavy computational cost of deep models. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all the six state-of-the-art pruning algorithms we examined, fine-tuning a pruned model only gives comparable or even worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for a wide variety of pruning algorithms with multiple network architectures, datasets, and tasks. Our results have several implications: 1) training a large, over-parameterized model is not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are not necessarily useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is what leads to the efficiency benefit in the final model, which suggests that some pruning algorithms could be seen as performing network architecture search. Over-parameterization is a widely-recognized property of deep neural networks (Denton et al., 2014; Ba & Caruana, 2014), which leads to high computational cost and high memory footprint.

architecture, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1810.0527

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.49)

Industry: Education > Educational Setting (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback