ChordMixer: A Scalable Neural Attention Model for Sequences with Different Lengths
Khalitov, Ruslan, Yu, Tong, Cheng, Lei, Yang, Zhirong
–arXiv.org Artificial Intelligence
Sequential data naturally have different lengths in many domains, with some very long sequences. As an important modeling tool, neural attention should capture long-range interaction in such sequences. However, most existing neural attention models admit only short sequences, or they have to employ chunking or padding to enforce a constant input length. Here we propose a simple neural network building block called ChordMixer which can model the attention for long sequences with variable lengths. Each ChordMixer block consists of a positionwise rotation layer without learnable parameters and an element-wise MLP layer. Repeatedly applying such blocks forms an effective network backbone that mixes the input signals towards the learning targets. We have tested ChordMixer on the synthetic adding problem, long document classification, and DNA sequence-based taxonomy classification. The experiment results show that our method substantially outperforms other neural attention models. Sequential data appear widely in data science. In many domains, the sequences have a diverse distribution of lengths. Meanwhile, long-range interactions between DNA elements are common and can be up to 20,000 bases away (Gasperini et al., 2020). Modeling interactions in such sequences is a fundamental problem in machine learning and brings great challenges to attention approaches based on deep neural networks. Most existing neural attention methods cannot handle long sequences with different lengths. For efficient batch processing, architectures such as Transformer and its variants have been proposed, they usually assume constant input length.
arXiv.org Artificial Intelligence
May-5-2023