MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation

Nov-3-2024–arXiv.org Artificial Intelligence

MSC (Huang and Feng, 2024) argues that a byte should contribute to multiple neighboring Neural Machine Translation (NMT) is a consistently contexts, necessitating a multi-scale contextualization hot research topic, and recent years have approach. To this end, MSC groups hidden seen the growing significance of multilingual language state dimensions and assigns CNNs with different modeling (Zhang et al., 2023). The selection kernel sizes to each group. of tokenization and vocabulary is critical to Although MSC provides an effective framework multilingual language models, which plays an important for modeling multi-scale contextualization and role in vectorization of texts and discretization achieved state-of-the-art performance, it suffers of predicted hidden states. While some models from a significant limitation: the scales are manually (Costa-jussà et al., 2022; Dubey et al., 2024) predefined. This reduces the model's ability use large vocabularies to ensure word coverage, to generalize to multilingual scenarios, particularly others (Touvron et al., 2023; Jiang et al., 2023) opt in massively multilingual machine translation, for byte fallback strategy. This approach allows which may involve over 50 languages.

artificial intelligence, computational linguistic, natural language, (16 more...)

arXiv.org Artificial Intelligence

Nov-3-2024

arXiv.org PDF

Add feedback

Country:
- Asia (1.00)
- Europe (1.00)
- North America > United States
  - Minnesota (0.28)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)