Sun, Zewei
Motion Planning for Robotics: A Review for Sampling-based Planners
Zhang, Liding, Cai, Kuanqi, Sun, Zewei, Bing, Zhenshan, Wang, Chaoqun, Figueredo, Luis, Haddadin, Sami, Knoll, Alois
Recent advancements in robotics have transformed industries such as manufacturing, logistics, surgery, and planetary exploration. A key challenge is developing efficient motion planning algorithms that allow robots to navigate complex environments while avoiding collisions and optimizing metrics like path length, sweep area, execution time, and energy consumption. Among the available algorithms, sampling-based methods have gained the most traction in both research and industry due to their ability to handle complex environments, explore free space, and offer probabilistic completeness along with other formal guarantees. Despite their widespread application, significant challenges still remain. To advance future planning algorithms, it is essential to review the current state-of-the-art solutions and their limitations. In this context, this work aims to shed light on these challenges and assess the development and applicability of sampling-based methods. Furthermore, we aim to provide an in-depth analysis of the design and evaluation of ten of the most popular planners across various scenarios. Our findings highlight the strides made in sampling-based methods while underscoring persistent challenges. This work offers an overview of the important ongoing research in robotic motion planning.
Only 5\% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation
Liu, Zihan, Sun, Zewei, Cheng, Shanbo, Huang, Shujian, Wang, Mingxuan
Document-level Neural Machine Translation (DocNMT) has been proven crucial for handling discourse phenomena by introducing document-level context information. One of the most important directions is to input the whole document directly to the standard Transformer model. In this case, efficiency becomes a critical concern due to the quadratic complexity of the attention module. Existing studies either focus on the encoder part, which cannot be deployed on sequence-to-sequence generation tasks, e.g., Machine Translation (MT), or suffer from a significant performance drop. In this work, we keep the translation performance while gaining 20\% speed up by introducing extra selection layer based on lightweight attention that selects a small portion of tokens to be attended. It takes advantage of the original attention to ensure performance and dimension reduction to accelerate inference. Experimental results show that our method could achieve up to 95\% sparsity (only 5\% tokens attended) approximately, and save 93\% computation cost on the attention module compared with the original Transformer, while maintaining the performance.
Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation
Zhu, Yaoming, Sun, Zewei, Cheng, Shanbo, Huang, Luyang, Wu, Liwei, Wang, Mingxuan
Multimodal machine translation (MMT) aims to improve translation quality by incorporating information from other modalities, such as vision. Previous MMT systems mainly focus on better access and use of visual information and tend to validate their methods on image-related datasets. These studies face two challenges. First, they can only utilize triple data (bilingual texts with images), which is scarce; second, current benchmarks are relatively restricted and do not correspond to realistic scenarios. Therefore, this paper correspondingly establishes new methods and new datasets for MMT. First, we propose a framework 2/3-Triplet with two new approaches to enhance MMT by utilizing large-scale non-triple data: monolingual image-text data and parallel text-only data. Second, we construct an English-Chinese {e}-commercial {m}ulti{m}odal {t}ranslation dataset (including training and testing), named EMMT, where its test set is carefully selected as some words are ambiguous and shall be translated mistakenly without the help of images. Experiments show that our method is more suitable for real-world scenarios and can significantly improve translation performance by using more non-triple data. In addition, our model also rivals various SOTA models in conventional multimodal translation benchmarks.
BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation
Kang, Liyan, Huang, Luyang, Peng, Ningxin, Zhu, Peihao, Sun, Zewei, Cheng, Shanbo, Wang, Mingxuan, Huang, Degen, Su, Jinsong
The text inputs are often context to understand the world. From the simple and sufficient for translation tasks (Wu perspective of NMT, it is also much needed to et al., 2021). Take the widely used Multi30K as make use of such information to approach humanlevel an example. Multi30K consists of only 30K image translation abilities. To facilitate Multimodal captions, while typical text translation systems are Machine Translation (MMT) research, a number often trained with several million sentence pairs. of datasets have been proposed including imageguided We argue that studying the effects of visual contexts translation datasets (Elliott et al., 2016; in machine translation requires a large-scale Gella et al., 2019; Wang et al., 2022) and videoguided and diverse data set for training and a real-world translation datasets (Sanabria et al., 2018; and complex benchmark for testing.
Controlling Styles in Neural Machine Translation with Activation Prompt
Wang, Yifan, Sun, Zewei, Cheng, Shanbo, Zheng, Weiguo, Wang, Mingxuan
Controlling styles in neural machine translation (NMT) has attracted wide attention, as it is crucial for enhancing user experience. Earlier studies on this topic typically concentrate on regulating the level of formality and achieve some progress in this area. However, they still encounter two major challenges. The first is the difficulty in style evaluation. The style comprises various aspects such as lexis, syntax, and others that provide abundant information. Nevertheless, only formality has been thoroughly investigated. The second challenge involves excessive dependence on incremental adjustments, particularly when new styles are necessary. To address both challenges, this paper presents a new benchmark and approach. A multiway stylized machine translation (MSMT) benchmark is introduced, incorporating diverse categories of styles across four linguistic domains. Then, we propose a method named style activation prompt (StyleAP) by retrieving prompts from stylized monolingual corpus, which does not require extra fine-tuning. Experiments show that StyleAP could effectively control the style of translation and achieve remarkable performance.
Better Datastore, Better Translation: Generating Datastores from Pre-Trained Models for Nearest Neural Machine Translation
Li, Jiahuan, Cheng, Shanbo, Sun, Zewei, Wang, Mingxuan, Huang, Shujian
Nearest Neighbor Machine Translation (kNNMT) is a simple and effective method of augmenting neural machine translation (NMT) with a token-level nearest neighbor retrieval mechanism. The effectiveness of kNNMT directly depends on the quality of retrieved neighbors. However, original kNNMT builds datastores based on representations from NMT models, which would result in poor retrieval accuracy when NMT models are not good enough, leading to sub-optimal translation performance. In this paper, we propose PRED, a framework that leverages Pre-trained models for Datastores in kNN-MT. Better representations from pre-trained models allow us to build datastores of better quality. We also design a novel contrastive alignment objective to mitigate the representation gap between the NMT model and pre-trained models, enabling the NMT model to retrieve from better datastores. We conduct extensive experiments on both bilingual and multilingual translation benchmarks, including WMT17 English $\leftrightarrow$ Chinese, WMT14 English $\leftrightarrow$ German, IWSLT14 German $\leftrightarrow$ English, and IWSLT14 multilingual datasets. Empirical results demonstrate the effectiveness of PRED.