AITopics | He, Zhezhi

Collaborating Authors

He, Zhezhi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CLLMs: Consistency Large Language Models

Kou, Siqi, Hu, Lanxiang, He, Zhezhi, Deng, Zhijie, Zhang, Hao

arXiv.org Artificial IntelligenceJun-13-2024

Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM inference as it breaks the sequential nature of the LLM decoding process and transforms it into parallelizable computation. However, in practice, it achieves little speedup compared to traditional autoregressive (AR) decoding, primarily because Jacobi decoding seldom accurately predicts more than one token in a single fixed-point iteration step. To address this, we develop a new approach aimed at realizing fast convergence from any state to the fixed point on a Jacobi trajectory. This is accomplished by refining the target LLM to consistently predict the fixed point given any state as input. Extensive experiments demonstrate the effectiveness of our method, showing 2.4$\times$ to 3.4$\times$ improvements in generation speed while preserving generation quality across both domain-specific and open-domain benchmarks.

cllm, large language model, natural language, (12 more...)

arXiv.org Artificial Intelligence

2403.00835

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.82)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

You, Kang, Xu, Zekai, Nie, Chen, Deng, Zhijie, Guo, Qinghai, Wang, Xiang, He, Zhezhi

arXiv.org Artificial IntelligenceJun-5-2024

Spiking neural network (SNN) has attracted great attention due to its characteristic of high efficiency and accuracy. Currently, the ANN-to-SNN conversion methods can obtain ANN on-par accuracy SNN with ultra-low latency (8 time-steps) in CNN structure on computer vision (CV) tasks. However, as Transformer-based networks have achieved prevailing precision on both CV and natural language processing (NLP), the Transformer-based SNNs are still encounting the lower accuracy w.r.t the ANN counterparts. In this work, we introduce a novel ANN-to-SNN conversion method called SpikeZIP-TF, where ANN and SNN are exactly equivalent, thus incurring no accuracy degradation. SpikeZIP-TF achieves 83.82% accuracy on CV dataset (ImageNet) and 93.79% accuracy on NLP dataset (SST-2), which are higher than SOTA Transformer-based SNNs. The code is available in GitHub: https://github.com/Intelligent-Computing-Research-Group/SpikeZIP_transformer

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.0347

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Model Extraction Attacks on Split Federated Learning

Li, Jingtao, Rakin, Adnan Siraj, Chen, Xing, Yang, Li, He, Zhezhi, Fan, Deliang, Chakrabarti, Chaitali

arXiv.org Artificial IntelligenceMar-13-2023

Federated Learning (FL) is a popular collaborative learning scheme involving multiple clients and a server. FL focuses on protecting clients' data but turns out to be highly vulnerable to Intellectual Property (IP) threats. Since FL periodically collects and distributes the model parameters, a free-rider can download the latest model and thus steal model IP. Split Federated Learning (SFL), a recent variant of FL that supports training with resource-constrained clients, splits the model into two, giving one part of the model to clients (client-side model), and the remaining part to the server (server-side model). Thus SFL prevents model leakage by design. Moreover, by blocking prediction queries, it can be made resistant to advanced IP threats such as traditional Model Extraction (ME) attacks. While SFL is better than FL in terms of providing IP protection, it is still vulnerable. In this paper, we expose the vulnerability of SFL and show how malicious clients can launch ME attacks by querying the gradient information from the server side. We propose five variants of ME attack which differs in the gradient usage as well as in the data assumptions. We show that under practical cases, the proposed ME attacks work exceptionally well for SFL. For instance, when the server-side model has five layers, our proposed ME attack can achieve over 90% accuracy with less than 2% accuracy degradation with VGG-11 on CIFAR-10.

artificial intelligence, machine learning, training data, (17 more...)

arXiv.org Artificial Intelligence

2303.08581

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based Heterogeneous Computing Cores

Gong, Yu, Xu, Zhihan, He, Zhezhi, Zhang, Weifeng, Tu, Xiaobing, Liang, Xiaoyao, Jiang, Li

arXiv.org Artificial IntelligenceDec-15-2021

Accelerating the neural network inference by FPGA has emerged as a popular option, since the reconfigurability and high performance computing capability of FPGA intrinsically satisfies the computation demand of the fast-evolving neural algorithms. However, the popular neural accelerators on FPGA (e.g., Xilinx DPU) mainly utilize the DSP resources for constructing their processing units, while the rich LUT resources are not well exploited. Via the software-hardware co-design approach, in this work, we develop an FPGA-based heterogeneous computing system for neural network acceleration. From the hardware perspective, the proposed accelerator consists of DSP- and LUT-based GEneral Matrix-Multiplication (GEMM) computing cores, which forms the entire computing system in a heterogeneous fashion. The DSP- and LUT-based GEMM cores are computed w.r.t a unified Instruction Set Architecture (ISA) and unified buffers. Along the data flow of the neural network inference path, the computation of the convolution/fully-connected layer is split into two portions, handled by the DSP- and LUT-based GEMM cores asynchronously. From the software perspective, we mathematically and systematically model the latency and resource utilization of the proposed heterogeneous accelerator, regarding varying system design configurations. Through leveraging the reinforcement learning technique, we construct a framework to achieve end-to-end selection and optimization of the design specification of target heterogeneous accelerator, including workload split strategy, mixed-precision quantization scheme, and resource allocation of DSP- and LUT-core. In virtue of the proposed design framework and heterogeneous computing system, our design outperforms the state-of-the-art Mix&Match design with latency reduced by 1.12-1.32x with higher inference accuracy. The N3H-core is open-sourced at: https://github.com/elliothe/N3H_Core.

artificial intelligence, latency, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3490422.3502367

2112.08193

Country:

Asia > China (0.15)
North America > United States (0.14)

Genre: Research Report (0.40)

Industry: Semiconductors & Electronics (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

MetaGater: Fast Learning of Conditional Channel Gated Networks via Federated Meta-Learning

Lin, Sen, Yang, Li, He, Zhezhi, Fan, Deliang, Zhang, Junshan

arXiv.org Artificial IntelligenceNov-28-2020

While deep learning has achieved phenomenal successes in many AI applications, its enormous model size and intensive computation requirements pose a formidable challenge to the deployment in resource-limited nodes. There has recently been an increasing interest in computationally-efficient learning methods, e.g., quantization, pruning and channel gating. However, most existing techniques cannot adapt to different tasks quickly. In this work, we advocate a holistic approach to jointly train the backbone network and the channel gating which enables dynamical selection of a subset of filters for more efficient local computation given the data input. Particularly, we develop a federated meta-learning approach to jointly learn good meta-initializations for both backbone networks and gating modules, by making use of the model similarity across learning tasks on different nodes. In this way, the learnt meta-gating module effectively captures the important filters of a good meta-backbone network, based on which a task-specific conditional channel gated network can be quickly adapted, i.e., through one-step gradient descent, from the meta-initializations in a two-stage procedure using new samples of that task. The convergence of the proposed federated meta-learning algorithm is established under mild conditions. Experimental results corroborate the effectiveness of our method in comparison to related work.

deep learning, module, neural network, (15 more...)

arXiv.org Artificial Intelligence

2011.12511

Country: North America > United States (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning

Yang, Li, He, Zhezhi, Zhang, Junshan, Fan, Deliang

arXiv.org Artificial IntelligenceSep-11-2020

Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as \textit{catastrophic forgetting}. While recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets, some issues still remain to be tackled when applying them in real-world problems. Recently, the fast mask-based learning method (e.g. piggyback \cite{mallya2018piggyback}) is proposed to address these issues by learning only a binary element-wise mask in a fast manner, while keeping the backbone model fixed. However, the binary mask has limited modeling capacity for new tasks. A more recent work \cite{hung2019compacting} proposes a compress-grow-based method (CPG) to achieve better accuracy for new tasks by partially training backbone model, but with order-higher training cost, which makes it infeasible to be deployed into popular state-of-the-art edge-/mobile-learning. The primary goal of this work is to simultaneously achieve fast and high-accuracy multi task adaption in continual learning setting. Thus motivated, we propose a new training method called \textit{kernel-wise Soft Mask} (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task, while using the same backbone model. Such a soft mask can be viewed as a superposition of a binary mask and a properly scaled real-value tensor, which offers a richer representation capability without low-level kernel support to meet the objective of low hardware overhead. We validate KSM on multiple benchmark datasets against recent state-of-the-art methods (e.g. Piggyback, Packnet, CPG, etc.), which shows good improvement in both accuracy and training cost.

artificial intelligence, backbone model, neural network, (18 more...)

arXiv.org Artificial Intelligence

2009.05668

Country: North America > United States > Arizona (0.14)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

T-BFA: Targeted Bit-Flip Adversarial Weight Attack

Rakin, Adnan Siraj, He, Zhezhi, Li, Jingtao, Yao, Fan, Chakrabarti, Chaitali, Fan, Deliang

arXiv.org Machine LearningSep-10-2020

Traditional Deep Neural Network (DNN) security is mostly related to the well-known adversarial input example attack. Recently, another dimension of adversarial attack, namely, attack on DNN weight parameters, has been shown to be very powerful. As a representative one, the Bit-Flip based adversarial weight Attack(BFA) injects an extremely small amount of fault into weight parameters to hijack the DNN function. Prior works on BFA are focused on-targeted attacks that can classify all inputs into a random output class by flipping a very small number of weight bits stored in computer memory. This paper proposes the first work oftargetedBFA based (T-BFA) adversarial weight attack on DNN models, which can intentionally mislead selected inputs to a target output class. The objectives achieved by identifying the weight bits that are highly associated with the classification of a targeted output through a novel class-dependent weight bit ranking algorithm. T-BFA performance has been successfully demonstrated on multiple network architectures for the image classification task. For example, by merely flipping 27 out of 88 million weight bits, T-BFA can misclassify all the imagesfrom 'Ibex' class into 'Proboscis Monkey' class (i.e., 100% attack success rate)in ImageNet dataset, while maintaining 59.35% validation accuracy on ResNet-18. Moreover, we successfully demonstrate our T-BFA attack in a real computer prototype system running DNN computation.

deep learning, neural network, t-bfa, (20 more...)

arXiv.org Machine Learning

2007.12336

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Non-structured DNN Weight Pruning Considered Harmful

Wang, Yanzhi, Ye, Shaokai, He, Zhezhi, Ma, Xiaolong, Zhang, Linfeng, Lin, Sheng, Yuan, Geng, Tan, Sia Huat, Li, Zhengang, Fan, Deliang, Qian, Xuehai, Lin, Xue, Ma, Kaisheng

arXiv.org Artificial IntelligenceJul-3-2019

Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model compression with two main approaches. Weight pruning leverages the redundancy in the number of weights and can be performed in a non-structured, which has higher flexibility and pruning rate but incurs index accesses due to irregular weights, or structured manner, which preserves the full matrix structure with lower pruning rate. Weight quantization leverages the redundancy in the number of bits in weights. Compared to pruning, quantization is much more hardware-friendly, and has become a "must-do" step for FPGA and ASIC implementations. This paper provides a definitive answer to the question for the first time. First, we build ADMM-NN-S by extending and enhancing ADMM-NN, a recently proposed joint weight pruning and quantization framework. Second, we develop a methodology for fair and fundamental comparison of non-structured and structured pruning in terms of both storage and computation efficiency. Our results show that ADMM-NN-S consistently outperforms the prior art: (i) it achieves 348x, 36x, and 8x overall weight pruning on LeNet-5, AlexNet, and ResNet-50, respectively, with (almost) zero accuracy loss; (ii) we demonstrate the first fully binarized (for all layers) DNNs can be lossless in accuracy in many cases. These results provide a strong baseline and credibility of our study. Based on the proposed comparison framework, with the same accuracy and quantization, the results show that non-structrued pruning is not competitive in terms of both storage and computation efficiency. Thus, we conclude that non-structured pruning is considered harmful. We urge the community not to continue the DNN inference acceleration for non-structured sparsity.

deep learning, neural network, pruning, (20 more...)

arXiv.org Artificial Intelligence

1907.02124

Country:

North America > United States > Florida > Orange County > Orlando (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network using Truncated Gaussian Approximation

He, Zhezhi, Fan, Deliang

arXiv.org Machine LearningOct-1-2018

In the past years, Deep convolution neural network has achieved great success in many artificial intelligence applications. However, its enormous model size and massive computation cost have become the main obstacle for deployment of such powerful algorithm in the low power and resource-limited mobile systems. As the countermeasure to this problem, deep neural networks with ternarized weights (i.e. -1, 0, +1) have been widely explored to greatly reduce the model size and computational cost, with limited accuracy degradation. In this work, we propose a novel ternarized neural network training method which simultaneously optimizes both weights and quantizer during training, differentiating from prior works. Instead of fixed and uniform weight ternarization, we are the first to incorporate the thresholds of weight ternarization into a closed-form representation using the truncated Gaussian approximation, enabling simultaneous optimization of weights and quantizer through back-propagation training. With both of the first and last layer ternarized, the experiments on the ImageNet classification task show that our ternarized ResNet-18/34/50 only has 3.9/2.52/2.16% accuracy degradation in comparison to the full-precision counterparts.

deep learning, neural network, threshold, (16 more...)

arXiv.org Machine Learning

1810.01018

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Blind Pre-Processing: A Robust Defense Method Against Adversarial Examples

Rakin, Adnan Siraj, He, Zhezhi, Gong, Boqing, Fan, Deliang

arXiv.org Machine LearningFeb-7-2018

Deep learning algorithms and networks are vulnerable to perturbed inputs which is known as the adversarial attack. Many defense methodologies have been investigated to defend against such adversarial attack. In this work, we propose a novel methodology to defend the existing powerful attack model. We for the first time introduce a new attacking scheme for the attacker and set a practical constraint for white box attack. Under this proposed attacking scheme, we present the best defense ever reported against some of the recent strong attacks. It consists of a set of nonlinear function to process the input data which will make it more robust over the adversarial attack. However, we make this processing layer completely hidden from the attacker. Blind pre-processing improves the white box attack accuracy of MNIST from 94.3\% to 98.7\%. Even with increasing defense when others defenses completely fail, blind pre-processing remains one of the strongest ever reported. Another strength of our defense is that it eliminates the need for adversarial training as it can significantly increase the MNIST accuracy without adversarial training as well. Additionally, blind pre-processing can also increase the inference accuracy in the face of a powerful attack on CIFAR-10 and SVHN data set as well without much sacrificing clean data accuracy.

accuracy, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1802.01549

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback