Not enough data to create a plot.
Try a different view from the menu above.
Liu, Ning
Lottery Ticket Implies Accuracy Degradation, Is It a Desirable Phenomenon?
Liu, Ning, Yuan, Geng, Che, Zhengping, Shen, Xuan, Ma, Xiaolong, Jin, Qing, Ren, Jian, Tang, Jian, Liu, Sijia, Wang, Yanzhi
In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) (Frankle & Carbin, 2018) pointed out that there could exist a winning ticket (i.e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance than the original dense network. However, it is not easy to observe such winning property in many scenarios, where for example, a relatively large learning rate is used even if it benefits training the original dense model. In this work, we investigate the underlying condition and rationale behind the winning property, and find that the underlying reason is largely attributed to the correlation between initialized weights and final-trained weights when the learning rate is not sufficiently large. Thus, the existence of winning property is correlated with an insufficient DNN pretraining, and is unlikely to occur for a well-trained DNN. To overcome this limitation, we propose the "pruning & fine-tuning" method that consistently outperforms lottery ticket sparse training under the same pruning algorithm and the same total training epochs. Extensive experiments over multiple deep models (VGG, ResNet, MobileNet-v2) on different datasets have been conducted to justify our proposals.
Learn to Navigate Maplessly with Varied LiDAR Configurations: A Support Point Based Approach
Zhang, Wei, Liu, Ning, Zhang, Yunfeng
Deep reinforcement learning (DRL) demonstrates great potential in mapless navigation domain. However, such a navigation model is normally restricted to a fixed configuration of the range sensor because its input format is fixed. In this paper, we propose a DRL model that can address range data obtained from different range sensors with different installation positions. Our model first extracts the goal-directed features from each obstacle point. Subsequently, it chooses global obstacle features from all point-feature candidates and uses these features for the final decision. As only a few points are used to support the final decision, we refer to these points as support points and our approach as support-point based navigation (SPN). Our model can handle data from different LiDAR setups and demonstrates good performance in simulation and real-world experiments. It can also be used to guide the installation of range sensors to enhance robot navigation performance.
AutoSlim: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates
Liu, Ning, Ma, Xiaolong, Xu, Zhiyuan, Wang, Yanzhi, Tang, Jian, Ye, Jieping
Structured weight pruning is a representative model compression technique of DNNs to reduce the storage and computation requirements and accelerate inference. An automatic hyperparameter determination process is necessary due to the large number of flexible hyperparameters. This work proposes AutoSlim, an automatic structured pruning framework with the following key performance improvements: (i) effectively incorporate the combination of structured pruning schemes in the automatic process; (ii) adopt the state-of-art ADMM-based structured weight pruning as the core algorithm, and propose an innovative additional purification step for further weight reduction without accuracy loss; and (iii) develop effective heuristic search method enhanced by experience-based guided search, replacing the prior deep reinforcement learning technique which has underlying incompatibility with the target pruning problem. Extensive experiments on CIFAR-10 and ImageNet datasets demonstrate that AutoSlim is the key to achieve ultra-high pruning rates on the number of weights and FLOPs that cannot be achieved before. As an example, AutoSlim outperforms the prior work on automatic model compression by up to 33$\times$ in pruning rate under the same accuracy. We release all models of this work at anonymous link: http://bit.ly/2VZ63dS.
VIBNN: Hardware Acceleration of Bayesian Neural Networks
Cai, Ruizhe, Ren, Ao, Liu, Ning, Ding, Caiwen, Wang, Luhao, Qian, Xuehai, Pedram, Massoud, Wang, Yanzhi
Bayesian Neural Networks (BNNs) have been proposed to address the problem of model uncertainty in training and inference. By introducing weights associated with conditioned probability distributions, BNNs are capable of resolving the overfitting issue commonly seen in conventional neural networks and allow for small-data training, through the variational inference process. Frequent usage of Gaussian random variables in this process requires a properly optimized Gaussian Random Number Generator (GRNG). The high hardware cost of conventional GRNG makes the hardware implementation of BNNs challenging. In this paper, we propose VIBNN, an FPGA-based hardware accelerator design for variational inference on BNNs. We explore the design space for massive amount of Gaussian variable sampling tasks in BNNs. Specifically, we introduce two high performance Gaussian (pseudo) random number generators: the RAM-based Linear Feedback Gaussian Random Number Generator (RLF-GRNG), which is inspired by the properties of binomial distribution and linear feedback logics; and the Bayesian Neural Network-oriented Wallace Gaussian Random Number Generator. To achieve high scalability and efficient memory access, we propose a deep pipelined accelerator architecture with fast execution and good hardware utilization. Experimental results demonstrate that the proposed VIBNN implementations on an FPGA can achieve throughput of 321,543.4 Images/s and energy efficiency upto 52,694.8 Images/J while maintaining similar accuracy as its software counterpart.
Deep Reinforcement Learning for Dynamic Treatment Regimes on Medical Registry Data
Liu, Ning, Liu, Ying, Logan, Brent, Xu, Zhiyuan, Tang, Jian, Wang, Yanzhi
This paper presents the first deep reinforcement learning (DRL) framework to estimate the optimal Dynamic Treatment Regimes from observational medical data. This framework is more flexible and adaptive for high dimensional action and state spaces than existing reinforcement learning methods to model real-life complexity in heterogeneous disease progression and treatment choices, with the goal of providing doctor and patients the data-driven personalized decision recommendations. The proposed DRL framework comprises (i) a supervised learning step to predict the most possible expert actions, and (ii) a deep reinforcement learning step to estimate the long-term value function of Dynamic Treatment Regimes. Both steps depend on deep neural networks. As a key motivational example, we have implemented the proposed framework on a data set from the Center for International Bone Marrow Transplant Research (CIBMTR) registry database, focusing on the sequence of prevention and treatments for acute and chronic graft versus host disease after transplantation. In the experimental results, we have demonstrated promising accuracy in predicting human experts' decisions, as well as the high expected reward function in the DRL-based dynamic treatment regimes.
FFT-Based Deep Learning Deployment in Embedded Systems
Lin, Sheng, Liu, Ning, Nazemi, Mahdi, Li, Hongjia, Ding, Caiwen, Wang, Yanzhi, Pedram, Massoud
Deep learning has delivered its powerfulness in many application domains, especially in image and speech recognition. As the backbone of deep learning, deep neural networks (DNNs) consist of multiple layers of various types with hundreds to thousands of neurons. Embedded platforms are now becoming essential for deep learning deployment due to their portability, versatility, and energy efficiency. The large model size of DNNs, while providing excellent accuracy, also burdens the embedded platforms with intensive computation and storage. Researchers have investigated on reducing DNN model size with negligible accuracy loss. This work proposes a Fast Fourier Transform (FFT)-based DNN training and inference model suitable for embedded platforms with reduced asymptotic complexity of both computation and storage, making our approach distinguished from existing approaches. We develop the training and inference algorithms based on FFT as the computing kernel and deploy the FFT-based inference model on embedded platforms achieving extraordinary processing speed.
CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices
Ding, Caiwen, Liao, Siyu, Wang, Yanzhi, Li, Zhe, Liu, Ning, Zhuo, Youwei, Wang, Chao, Qian, Xuehai, Bai, Yu, Yuan, Geng, Ma, Xiaolong, Zhang, Yipeng, Tang, Jian, Qiu, Qinru, Lin, Xue, Yuan, Bo
Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) the irregular network structure after pruning; 2) the increased training complexity; and 3) the lack of rigorous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CirCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CirCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(nlogn) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CirCNN is distinct due to its mathematical rigor: it can converge to the same effectiveness as DNNs without compression. The CirCNN architecture, a universal DNN inference engine that can be implemented on various hardware/software platforms with configurable network architecture. To demonstrate the performance and energy efficiency, we test CirCNN in FPGA, ASIC and embedded processors. Our results show that CirCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CirCNN achieves 6-102X energy efficiency improvements compared with the best state-of-the-art results.
A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning
Liu, Ning, Li, Zhe, Xu, Zhiyuan, Xu, Jielong, Lin, Sheng, Qiu, Qinru, Tang, Jian, Wang, Yanzhi
Automatic decision-making approaches, such as reinforcement learning (RL), have been applied to (partially) solve the resource allocation problem adaptively in the cloud computing system. However, a complete cloud resource allocation framework exhibits high dimensions in state and action spaces, which prohibit the usefulness of traditional RL techniques. In addition, high power consumption has become one of the critical concerns in design and control of cloud computing systems, which degrades system reliability and increases cooling cost. An effective dynamic power management (DPM) policy should minimize power consumption while maintaining performance degradation within an acceptable level. Thus, a joint virtual machine (VM) resource allocation and power management framework is critical to the overall cloud computing system. Moreover, novel solution framework is necessary to address the even higher dimensions in state and action spaces. In this paper, we propose a novel hierarchical framework for solving the overall resource allocation and power management problem in cloud computing systems. The proposed hierarchical framework comprises a global tier for VM resource allocation to the servers and a local tier for distributed power management of local servers. The emerging deep reinforcement learning (DRL) technique, which can deal with complicated control problems with large state space, is adopted to solve the global tier problem. Furthermore, an autoencoder and a novel weight sharing structure are adopted to handle the high-dimensional state space and accelerate the convergence speed. On the other hand, the local tier of distributed server power managements comprises an LSTM based workload predictor and a model-free RL based power manager, operating in a distributed manner.
Collaborative Users’ Brand Preference Mining across Multiple Domains from Implicit Feedbacks
Tang, Jian (Peking University) | Yan, Jun (Microsoft Research Asia) | Ji, Lei (Microsoft Research Asia) | Zhang, Ming (Peking University) | Guo, Shaodan (Huazhong University of Science and Technology) | Liu, Ning (Microsoft Research Asia) | Wang, Xianfang (Microsoft Adcenter Audience Intelligence) | Chen, Zheng (Microsoft Research Asia)
Advanced e-applications require comprehensive knowledge about their users’ preferences in order to provide accurate personalized services. In this paper, we propose to learn users’ preferences to product brands from their implicit feedbacks such as their searching and browsing behaviors in user Web browsing log data. The user brand preference learning problem is challenge since (1) the users’ implicit feedbacks are extremely sparse in various product domains; and (2) we can only observe positive feedbacks from users’ behaviors. In this paper, we propose a latent factor model to collaboratively mine users’ brand preferences across multiple domains simultaneously. By collective learning, the learning processes in all the domains are mutually enhanced and hence the problem of data scarcity in each single domain can be effectively addressed. On the other hand, we learn our model with an adaption of the Bayesian personalized ranking (BPR) optimization criterion which is a general learning framework for collaborative filtering from implicit feedbacks. Experiments with both synthetic and real world datasets show that our proposed model significantly outperforms the baselines.