MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation
Huang, Langlin, Bu, Mengyu, Feng, Yang
–arXiv.org Artificial Intelligence
MSC (Huang and Feng, 2024) argues that a byte should contribute to multiple neighboring Neural Machine Translation (NMT) is a consistently contexts, necessitating a multi-scale contextualization hot research topic, and recent years have approach. To this end, MSC groups hidden seen the growing significance of multilingual language state dimensions and assigns CNNs with different modeling (Zhang et al., 2023). The selection kernel sizes to each group. of tokenization and vocabulary is critical to Although MSC provides an effective framework multilingual language models, which plays an important for modeling multi-scale contextualization and role in vectorization of texts and discretization achieved state-of-the-art performance, it suffers of predicted hidden states. While some models from a significant limitation: the scales are manually (Costa-jussà et al., 2022; Dubey et al., 2024) predefined. This reduces the model's ability use large vocabularies to ensure word coverage, to generalize to multilingual scenarios, particularly others (Touvron et al., 2023; Jiang et al., 2023) opt in massively multilingual machine translation, for byte fallback strategy. This approach allows which may involve over 50 languages.
arXiv.org Artificial Intelligence
Nov-3-2024
- Country:
- Asia (1.00)
- Europe (1.00)
- North America > United States
- Minnesota (0.28)
- Genre:
- Research Report (0.82)
- Technology: