Otters: An Energy-Efficient SpikingTransformer via Optical Time-to-First-Spike Encoding

Yan, Zhanglu, Mao, Jiayi, Liu, Qianhui, Li, Fanfan, Pan, Gang, Luo, Tao, Zhu, Bowen, Wong, Weng-Fai

arXiv.org Artificial Intelligence 

However, such energy advantage is often unrealized because inference requires evaluating a temporal decay function and subsequent multiplication with the synaptic weights. We fabricated a custom indium oxide optoelectronic synapse, showing how its natural physical decay directly implements the required temporal function. By treating the device's analog output as the fused product of the synaptic weight and temporal decay, optoelectronic synaptic TTFS (named Otters) eliminates these expensive digital operations. To use the Otters paradigm in complex architectures like the transformer, which are challenging to train directly due to the sparsity issue, we introduce a novel quantized neural network-to-SNN conversion algorithm. This complete hardware-software co-design enables our model to achieve state-of-the-art accuracy across seven GLUE benchmark datasets and demonstrates a 1.77 improvement in energy efficiency over previous leading SNNs, based on a comprehensive analysis of compute, data movement, and memory access costs using energy measurements from a commercial 22nm process. Our work thus establishes a new paradigm for energy-efficient SNNs, translating fundamental device physics directly into powerful computational primitives. Large language models (LLMs) have demonstrated remarkable capabilities, yet their immense computational and energy costs hinder their deployment in resource-constrained environments such as edge devices (Lin et al., 2023; Jegham et al., 2025). This critical challenge has spurred research on more efficient, brain-inspired architectures, with spiking neural networks (SNNs) emerging as a promising candidate (Tang et al., 2025; Xing et al.).