Goto

Collaborating Authors

 Machine Translation


A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has achieved astonishing successes in many domains, especially with the recent breakthroughs in the development of foundational large models. These large models, leveraging their extensive training data, provide versatile solutions for a wide range of downstream tasks. However, as modern datasets become increasingly diverse and complex, the development of large AI models faces two major challenges: (1) the enormous consumption of computational resources and deployment difficulties, and (2) the difficulty in fitting heterogeneous and complex data, which limits the usability of the models. Mixture of Experts (MoE) models has recently attracted much attention in addressing these challenges, by dynamically selecting and activating the most relevant sub-models to process input data. It has been shown that MoEs can significantly improve model performance and efficiency with fewer resources, particularly excelling in handling large-scale, multimodal data. Given the tremendous potential MoE has demonstrated across various domains, it is urgent to provide a comprehensive summary of recent advancements of MoEs in many important fields. Existing surveys on MoE have their limitations, e.g., being outdated or lacking discussion on certain key areas, and we aim to address these gaps. In this paper, we first introduce the basic design of MoE, including gating functions, expert networks, routing mechanisms, training strategies, and system design. We then explore the algorithm design of MoE in important machine learning paradigms such as continual learning, meta-learning, multi-task learning, and reinforcement learning. Additionally, we summarize theoretical studies aimed at understanding MoE and review its applications in computer vision and natural language processing. Finally, we discuss promising future research directions.


Revisiting Noise in Natural Language Processing for Computational Social Science

arXiv.org Artificial Intelligence

Computational Social Science (CSS) is an emerging field driven by the unprecedented availability of human-generated content for researchers. This field, however, presents a unique set of challenges due to the nature of the theories and datasets it explores, including highly subjective tasks and complex, unstructured textual corpora. Among these challenges, one of the less well-studied topics is the pervasive presence of noise. This thesis aims to address this gap in the literature by presenting a series of interconnected case studies that examine different manifestations of noise in CSS. These include character-level errors following the OCR processing of historical records, archaic language, inconsistencies in annotations for subjective and ambiguous tasks, and even noise and biases introduced by large language models during content generation. This thesis challenges the conventional notion that noise in CSS is inherently harmful or useless. Rather, it argues that certain forms of noise can encode meaningful information that is invaluable for advancing CSS research, such as the unique communication styles of individuals or the culture-dependent nature of datasets and tasks. Further, this thesis highlights the importance of nuance in dealing with noise and the considerations CSS researchers must address when encountering it, demonstrating that different types of noise require distinct strategies.


Contextual Cues in Machine Translation: Investigating the Potential of Multi-Source Input Strategies in LLMs and NMT Systems

arXiv.org Artificial Intelligence

We explore the impact of multi-source input strategies on machine translation (MT) quality, comparing GPT-4o, a large language model (LLM), with a traditional multilingual neural machine translation (NMT) system. Using intermediate language translations as contextual cues, we evaluate their effectiveness in enhancing English and Chinese translations into Portuguese. Results suggest that contextual information significantly improves translation quality for domain-specific datasets and potentially for linguistically distant language pairs, with diminishing returns observed in benchmarks with high linguistic variability. Additionally, we demonstrate that shallow fusion, a multi-source approach we apply within the NMT system, shows improved results when using high-resource languages as context for other translation pairs, highlighting the importance of strategic context language selection.


Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

arXiv.org Artificial Intelligence

The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems using a single pre-trained Transformer decoder, while encoder-decoder architectures, which were the standard in earlier NMT models, have received relatively less attention. In this paper, we explore translation models that are universal, efficient, and easy to optimize, by marrying the world of LLMs with the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder unchanged. We also develop methods for adapting LLMs to work better with the NMT decoder. Furthermore, we construct a new dataset involving multiple tasks to assess how well the machine translation system generalizes across various tasks. Evaluations on the WMT and our datasets show that results using our method match or surpass a range of baselines in terms of translation quality, but achieve $2.4 \sim 6.5 \times$ inference speedups and a $75\%$ reduction in the memory footprint of the KV cache. It also demonstrates strong generalization across a variety of translation-related tasks.


Sign Language Translation using Frame and Event Stream: Benchmark Dataset and Algorithms

arXiv.org Artificial Intelligence

Accurate sign language understanding serves as a crucial communication channel for individuals with disabilities. Current sign language translation algorithms predominantly rely on RGB frames, which may be limited by fixed frame rates, variable lighting conditions, and motion blur caused by rapid hand movements. Inspired by the recent successful application of event cameras in other fields, we propose to leverage event streams to assist RGB cameras in capturing gesture data, addressing the various challenges mentioned above. Specifically, we first collect a large-scale RGB-Event sign language translation dataset using the DVS346 camera, termed VECSL, which contains 15,676 RGB-Event samples, 15,191 glosses, and covers 2,568 Chinese characters. These samples were gathered across a diverse range of indoor and outdoor environments, capturing multiple viewing angles, varying light intensities, and different camera motions. Due to the absence of benchmark algorithms for comparison in this new task, we retrained and evaluated multiple state-of-the-art SLT algorithms, and believe that this benchmark can effectively support subsequent related research. Additionally, we propose a novel RGB-Event sign language translation framework (i.e., M$^2$-SLT) that incorporates fine-grained micro-sign and coarse-grained macro-sign retrieval, achieving state-of-the-art results on the proposed dataset. Both the source code and dataset will be released on https://github.com/Event-AHU/OpenESL.


Transformers for molecular property prediction: Domain adaptation efficiently improves performance

arXiv.org Artificial Intelligence

Most of the current transformer-based chemical language models are pre-trained on millions to billions of molecules. However, the improvement from such scaling in dataset size is not confidently linked to improved molecular property prediction. The aim of this study is to investigate and overcome some of the limitations of transformer models in predicting molecular properties. Specifically, we examine the impact of pre-training dataset size and diversity on the performance of transformer models and investigate the use of domain adaptation as a technique for improving model performance. First, our findings indicate that increasing pretraining dataset size beyond 400K molecules from the GuacaMol dataset does not result in a significant improvement on four ADME endpoints, namely, solubility, permeability, microsomal stability, and plasma protein binding. Second, our results demonstrate that using domain adaptation by further training the transformer model on a small set of domain-relevant molecules, i.e., a few hundred to a few thousand, using multi-task regression of physicochemical properties was sufficient to significantly improve performance for three out of the four investigated ADME endpoints (P-value < 0.001). Finally, we observe that a model pre-trained on 400K molecules and domain adopted on a few hundred/thousand molecules performs similarly (P-value > 0.05) to more complicated transformer models like MolBERT(pre-trained on 1.3M molecules) and MolFormer (pre-trained on 100M molecules). A comparison to a random forest model trained on basic physicochemical properties showed similar performance to the examined transformer models. We believe that current transformer models can be improved through further systematic analysis of pre-training and downstream data, pre-training objectives, and scaling laws, ultimately leading to better and more helpful models.


Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences

arXiv.org Artificial Intelligence

The rise of generative artificial intelligence (AI) as a novel frontier that uniquely merges advanced levels of intelligence with revolutionary user experiences is redefining the AI landscape for future cellular networks. In particular, the transition towards 6G systems has introduced a myriad of challenges inherent to their AI-native network design, requiring innovative solutions to enable real-time network orchestration, intelligent decision-making, and adaptive dynamic configurations. Meanwhile, the envisioned user experiences for 6G are growing increasingly complex, exceeding the capabilities offered by vintage wireless technologies and conventional AI solutions to satisfy their advanced demands. With its disruptive impact evident across diverse fields, generative AI possesses immense potential to tackle these challenges, leveraging its exceptional capabilities to manage complex tasks, operate autonomously, and adapt seamlessly to scenarios beyond its training domain. Remarkably, generative AI provides a transformative opportunity for telecom and cellular networks to bridge this defined gap in 6G systems, thereby shifting towards a new era with cutting-edge AI innovations across the different system and user levels.


Assumed Identities: Quantifying Gender Bias in Machine Translation of Ambiguous Occupational Terms

arXiv.org Artificial Intelligence

Machine Translation (MT) systems frequently encounter ambiguous scenarios where they must assign gender to certain occupations when translating without explicit guidance or contextual cues. While individual translations in such cases may not be inherently biased, systematic patterns-such as the repeated association of certain professions with specific genders-can emerge, reflecting and perpetuating societal stereotypes. This ambiguity challenges traditional instance-level single-answer evaluation approaches, as no single gold standard translation exists. To address this, we propose an approach that evaluates gender bias through aggregated model responses. Specifically, we introduce a methodology to detect gender imbalances between source texts and translations, a benchmarking dataset with ambiguous English inputs, and probability-based metrics to quantify a model's divergence from normative standards or reference distributions.


Leveraging Domain Knowledge at Inference Time for LLM Translation: Retrieval versus Generation

arXiv.org Artificial Intelligence

While large language models (LLMs) have been increasingly adopted for machine translation (MT), their performance for specialist domains such as medicine and law remains an open challenge. Prior work has shown that LLMs can be domain-adapted at test-time by retrieving targeted few-shot demonstrations or terminologies for inclusion in the prompt. Meanwhile, for general-purpose LLM MT, recent studies have found some success in generating similarly useful domain knowledge from an LLM itself, prior to translation. Our work studies domain-adapted MT with LLMs through a careful prompting setup, finding that demonstrations consistently outperform terminology, and retrieval consistently outperforms generation. We find that generating demonstrations with weaker models can close the gap with larger model's zero-shot performance. Given the effectiveness of demonstrations, we perform detailed analyses to understand their value. We find that domain-specificity is particularly important, and that the popular multi-domain benchmark is testing adaptation to a particular writing style more so than to a specific domain.


Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation

arXiv.org Artificial Intelligence

The ability of generative large language models (LLMs) to perform in-context learning has given rise to a large body of research into how best to prompt models for various natural language processing tasks. Machine Translation (MT) has been shown to benefit from in-context examples, in particular when they are semantically similar to the sentence to translate. In this paper, we propose a new LLM-based translation paradigm, compositional translation, to replace naive few-shot MT with similarity-based demonstrations. An LLM is used to decompose a sentence into simpler phrases, and then to translate each phrase with the help of retrieved demonstrations. Finally, the LLM is prompted to translate the initial sentence with the help of the self-generated phrase-translation pairs. Our intuition is that this approach should improve translation because these shorter phrases should be intrinsically easier to translate and easier to match with relevant examples. This is especially beneficial in low-resource scenarios, and more generally whenever the selection pool is small or out of domain. We show that compositional translation boosts LLM translation performance on a wide range of popular MT benchmarks, including FLORES 200, NTREX 128 and TICO-19. Code and outputs are available at https://github.com/ArmelRandy/compositional-translation