SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation

Lv, Changze, Li, Tianlong, Xu, Jianhan, Gu, Chenxi, Ling, Zixuan, Zhang, Cenyuan, Zheng, Xiaoqing, Huang, Xuanjing

arXiv.org Artificial Intelligence 

Spiking neural networks (SNNs) offer a promising avenue to implement deep neural networks in a more energy-efficient way. However, the network architectures of existing SNNs for language tasks are still simplistic and relatively shallow, and deep architectures have not been fully explored, resulting in a significant performance gap compared to mainstream transformer-based networks such as BERT. To this end, we improve a recently-proposed spiking Transformer (i.e., Spikformer) to make it possible to process language tasks and propose a two-stage knowledge distillation method for training it, which combines pre-training by distilling knowledge from BERT with a large collection of unlabelled texts and fine-tuning with task-specific instances via knowledge distillation again from the BERT fine-tuned on the same training examples. Through extensive experimentation, we show that the models trained with our method, named SpikeBERT, outperform state-of-the-art SNNs and even achieve comparable results to BERTs on text classification tasks for both English and Chinese with much less energy consumption. Modern artificial neural networks (ANNs) have been highly successful in a wide range of natural language processing (NLP) and computer vision (CV) tasks. However, it requires too much computational energy to train and deploy state-of-the-art ANN models, leading to a consistent increase of energy consumption per model over the past decade. The energy consumption of large language models during inference, such as ChatGPT (OpenAI, 2022) and GPT-4 (OpenAI, 2023), is unfathomable. In recent years, spiking neural networks (SNNs), arguably known as the third generation of neural network (Maas, 1997), have attracted a lot of attention due to their high biological plausibility, event-driven property and low energy consumption (Roy et al., 2019). Like biological neurons, SNNs use discrete spikes to process and transmit information.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found