Temporal Convolutional Attention-based Network For Sequence Modeling
Hao, Hongyan, Wang, Yan, Xue, Siqiao, Xia, Yudi, Zhao, Jian, Shen, Furao
–arXiv.org Artificial Intelligence
With the development of feed-forward models, the default model for sequence modeling has gradually evolved to replace recurrent networks. Many powerful feed-forward models based on convolutional networks and attention mechanism were proposed and show more potential to handle sequence modeling tasks. We wonder that is there an architecture that can not only achieve an approximate substitution of recurrent network, but also absorb the advantages of feed-forward models. So we propose an exploratory architecture referred to Temporal Convolutional Attention-based Network (TCAN) which combines temporal convolutional network and attention mechanism. TCAN includes two parts, one is Temporal Attention (TA) which captures relevant features inside the sequence, the other is Enhanced Residual (ER) which extracts shallow layer's important information and transfers to deep layers. We improve the state-of-the-art results of bpc/perplexity to 30.28 on word-level PTB, 1.092 on character-level PTB, and 9.20 on WikiText-2.
arXiv.org Artificial Intelligence
Oct-13-2023
- Country:
- Asia > China
- Jiangsu Province > Nanjing (0.05)
- Europe
- France (0.05)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- United States
- California
- Los Angeles County > Long Beach (0.14)
- San Diego County > San Diego (0.04)
- Santa Clara County > Sunnyvale (0.04)
- Kansas > Ness County (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- California
- Canada
- Oceania > Australia
- Asia > China
- Genre:
- Research Report (0.64)
- Technology: