TranSFormer: Slow-Fast Transformer for Machine Translation
Li, Bei, Jing, Yi, Tan, Xu, Xing, Zhen, Xiao, Tong, Zhu, Jingbo
–arXiv.org Artificial Intelligence
Learning multiscale Transformer models has been evidenced as a viable approach to augmenting machine translation systems. Prior research has primarily focused on treating subwords as basic units in developing such systems. However, the incorporation of fine-grained character-level features into multiscale Transformer has not yet been explored. In this work, we present a \textbf{S}low-\textbf{F}ast two-stream learning model, referred to as Tran\textbf{SF}ormer, which utilizes a ``slow'' branch to deal with subword sequences and a ``fast'' branch to deal with longer character sequences. This model is efficient since the fast branch is very lightweight by reducing the model width, and yet provides useful fine-grained features for the slow branch. Our TranSFormer shows consistent BLEU improvements (larger than 1 BLEU point) on several machine translation benchmarks.
arXiv.org Artificial Intelligence
May-26-2023
- Country:
- Oceania > Australia
- North America > United States
- Maryland > Baltimore (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California
- San Diego County > San Diego (0.04)
- Los Angeles County > Long Beach (0.04)
- Europe
- Ireland (0.04)
- Germany > Berlin (0.04)
- France (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.05)
- Asia
- South Korea > Seoul
- Seoul (0.04)
- China
- Liaoning Province > Shenyang (0.04)
- Shanghai > Shanghai (0.04)
- South Korea > Seoul
- Genre:
- Research Report (1.00)
- Technology: