AITopics | Cheng, Shanbo

Collaborating Authors

Cheng, Shanbo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

G-DIG: Towards Gradient-based Diverse and High-quality Instruction Data Selection for Machine Translation

Pan, Xingyuan, Huang, Luyang, Kang, Liyan, Liu, Zhicheng, Lu, Yu, Cheng, Shanbo

arXiv.org Artificial IntelligenceJul-7-2024

Large Language Models (LLMs) have demonstrated remarkable abilities in general scenarios. Instruction finetuning empowers them to align with humans in various tasks. Nevertheless, the Diversity and Quality of the instruction data remain two main challenges for instruction finetuning. With regard to this, in this paper, we propose a novel gradient-based method to automatically select high-quality and diverse instruction finetuning data for machine translation. Our key innovation centers around analyzing how individual training examples influence the model during training. Specifically, we select training examples that exert beneficial influences on the model as high-quality ones by means of Influence Function plus a small high-quality seed dataset. Moreover, to enhance the diversity of the training data we maximize the variety of influences they have on the model by clustering on their gradients and resampling. Extensive experiments on WMT22 and FLORES translation tasks demonstrate the superiority of our methods, and in-depth analysis further validates their effectiveness and generalization.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2405.12915

Country: Asia (0.14)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs

Cao, Zhiwei, Cao, Qian, Lu, Yu, Peng, Ningxin, Huang, Luyang, Cheng, Shanbo, Su, Jinsong

arXiv.org Artificial IntelligenceJun-17-2024

The growing popularity of Large Language Models has sparked interest in context compression for Large Language Models (LLMs). However, the performance of previous methods degrades dramatically as compression ratios increase, sometimes even falling to the closed-book level. This decline can be attributed to the loss of key information during the compression process. Our preliminary study supports this hypothesis, emphasizing the significance of retaining key information to maintain model performance under high compression ratios. As a result, we introduce Query-Guided Compressor (QGC), which leverages queries to guide the context compression process, effectively preserving key information within the compressed context. Additionally, we employ a dynamic compression strategy. We validate the effectiveness of our proposed QGC on the Question Answering task, including NaturalQuestions, TriviaQA, and HotpotQA datasets. Experimental results show that QGC can consistently perform well even at high compression ratios, which also offers significant benefits in terms of inference cost and throughput.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.02376

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

MT-PATCHER: Selective and Extendable Knowledge Distillation from Large Language Models for Machine Translation

Li, Jiahuan, Cheng, Shanbo, Huang, Shujian, Chen, Jiajun

arXiv.org Artificial IntelligenceApr-1-2024

Large Language Models (LLM) have demonstrated their strong ability in the field of machine translation (MT), yet they suffer from high computational cost and latency. Therefore, transferring translation knowledge from giant LLMs to medium-sized machine translation models is a promising research direction. However, traditional knowledge distillation methods do not take the capability of student and teacher models into consideration, therefore repeatedly teaching student models on the knowledge they have learned, and failing to extend to novel contexts and knowledge. In this paper, we propose a framework called MT-Patcher, which transfers knowledge from LLMs to existing MT models in a selective, comprehensive and proactive manner. Considering the current translation ability of student MT models, we only identify and correct their translation errors, instead of distilling the whole translation from the teacher. Leveraging the strong language abilities of LLMs, we instruct LLM teachers to synthesize diverse contexts and anticipate more potential errors for the student. Experiment results on translating both specific language phenomena and general MT benchmarks demonstrate that finetuning the student MT model on about 10% examples can achieve comparable results to the traditional knowledge distillation method, and synthesized potential errors and diverse contexts further improve translation performances on unseen contexts and words.

large language model, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

2403.09522

Country:

North America > Canada (0.14)
North America > United States (0.14)
Europe > Italy (0.14)
(2 more...)

Genre: Research Report (1.00)

Industry: Education (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Speech Translation with Large Language Models: An Industrial Practice

Huang, Zhichao, Ye, Rong, Ko, Tom, Dong, Qianqian, Cheng, Shanbo, Wang, Mingxuan, Li, Hang

arXiv.org Artificial IntelligenceDec-21-2023

Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long audio inputs. Furthermore, our findings indicate that the implementation of Chain-of-Thought (CoT) prompting can yield advantages in the context of LLM-ST.

large language model, machine learning, translation, (15 more...)

arXiv.org Artificial Intelligence

2312.13585

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Only 5\% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation

Liu, Zihan, Sun, Zewei, Cheng, Shanbo, Huang, Shujian, Wang, Mingxuan

arXiv.org Artificial IntelligenceSep-25-2023

Document-level Neural Machine Translation (DocNMT) has been proven crucial for handling discourse phenomena by introducing document-level context information. One of the most important directions is to input the whole document directly to the standard Transformer model. In this case, efficiency becomes a critical concern due to the quadratic complexity of the attention module. Existing studies either focus on the encoder part, which cannot be deployed on sequence-to-sequence generation tasks, e.g., Machine Translation (MT), or suffer from a significant performance drop. In this work, we keep the translation performance while gaining 20\% speed up by introducing extra selection layer based on lightweight attention that selects a small portion of tokens to be attended. It takes advantage of the original attention to ensure performance and dimension reduction to accelerate inference. Experimental results show that our method could achieve up to 95\% sparsity (only 5\% tokens attended) approximately, and save 93\% computation cost on the attention module compared with the original Transformer, while maintaining the performance.

artificial intelligence, computational linguistic, natural language, (14 more...)

arXiv.org Artificial Intelligence

2309.14174

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation

Zhu, Yaoming, Sun, Zewei, Cheng, Shanbo, Huang, Luyang, Wu, Liwei, Wang, Mingxuan

arXiv.org Artificial IntelligenceSep-2-2023

Multimodal machine translation (MMT) aims to improve translation quality by incorporating information from other modalities, such as vision. Previous MMT systems mainly focus on better access and use of visual information and tend to validate their methods on image-related datasets. These studies face two challenges. First, they can only utilize triple data (bilingual texts with images), which is scarce; second, current benchmarks are relatively restricted and do not correspond to realistic scenarios. Therefore, this paper correspondingly establishes new methods and new datasets for MMT. First, we propose a framework 2/3-Triplet with two new approaches to enhance MMT by utilizing large-scale non-triple data: monolingual image-text data and parallel text-only data. Second, we construct an English-Chinese {e}-commercial {m}ulti{m}odal {t}ranslation dataset (including training and testing), named EMMT, where its test set is carefully selected as some words are ambiguous and shall be translated mistakenly without the help of images. Experiments show that our method is more suitable for real-world scenarios and can significantly improve translation performance by using more non-triple data. In addition, our model also rivals various SOTA models in conventional multimodal translation benchmarks.

artificial intelligence, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2212.10313

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Law (0.67)
Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation

Kang, Liyan, Huang, Luyang, Peng, Ningxin, Zhu, Peihao, Sun, Zewei, Cheng, Shanbo, Wang, Mingxuan, Huang, Degen, Su, Jinsong

arXiv.org Artificial IntelligenceJul-3-2023

The text inputs are often context to understand the world. From the simple and sufficient for translation tasks (Wu perspective of NMT, it is also much needed to et al., 2021). Take the widely used Multi30K as make use of such information to approach humanlevel an example. Multi30K consists of only 30K image translation abilities. To facilitate Multimodal captions, while typical text translation systems are Machine Translation (MMT) research, a number often trained with several million sentence pairs. of datasets have been proposed including imageguided We argue that studying the effects of visual contexts translation datasets (Elliott et al., 2016; in machine translation requires a large-scale Gella et al., 2019; Wang et al., 2022) and videoguided and diverse data set for training and a real-world translation datasets (Sanabria et al., 2018; and complex benchmark for testing.

machine learning, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2305.18326

Country: Asia > China > Fujian Province (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.46)
Leisure & Entertainment > Sports (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions

Li, Jiahuan, Zhou, Hao, Huang, Shujian, Cheng, Shanbo, Chen, Jiajun

arXiv.org Artificial IntelligenceJun-29-2023

Large-scale Pretrained Language Models (LLMs), such as ChatGPT and GPT4, have shown strong abilities in multilingual translations, without being explicitly trained on parallel corpora. It is interesting how the LLMs obtain their ability to carry out translation instructions for different languages. In this paper, we present a detailed analysis by finetuning a multilingual pretrained language model, XGLM-7B, to perform multilingual translation following given instructions. Firstly, we show that multilingual LLMs have stronger translation abilities than previously demonstrated. For a certain language, the performance depends on its similarity to English and the amount of data used in the pretraining phase. Secondly, we find that LLMs' ability to carry out translation instructions relies on the understanding of translation instructions and the alignment among different languages. With multilingual finetuning, LLMs could learn to perform the translation task well even for those language pairs unseen during the instruction tuning phase.

artificial intelligence, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2305.15083

Country:

Europe (0.46)
Asia (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Controlling Styles in Neural Machine Translation with Activation Prompt

Wang, Yifan, Sun, Zewei, Cheng, Shanbo, Zheng, Weiguo, Wang, Mingxuan

arXiv.org Artificial IntelligenceMay-28-2023

Controlling styles in neural machine translation (NMT) has attracted wide attention, as it is crucial for enhancing user experience. Earlier studies on this topic typically concentrate on regulating the level of formality and achieve some progress in this area. However, they still encounter two major challenges. The first is the difficulty in style evaluation. The style comprises various aspects such as lexis, syntax, and others that provide abundant information. Nevertheless, only formality has been thoroughly investigated. The second challenge involves excessive dependence on incremental adjustments, particularly when new styles are necessary. To address both challenges, this paper presents a new benchmark and approach. A multiway stylized machine translation (MSMT) benchmark is introduced, incorporating diverse categories of styles across four linguistic domains. Then, we propose a method named style activation prompt (StyleAP) by retrieving prompts from stylized monolingual corpus, which does not require extra fine-tuning. Experiments show that StyleAP could effectively control the style of translation and achieve remarkable performance.

artificial intelligence, natural language, translation, (13 more...)

arXiv.org Artificial Intelligence

2212.08909

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Visual Information Matters for ASR Error Correction

Kumar, Vanya Bannihatti, Cheng, Shanbo, Peng, Ningxin, Zhang, Yuchen

arXiv.org Artificial IntelligenceMay-26-2023

Aiming to improve the Automatic Speech Recognition (ASR) outputs with a post-processing step, ASR error correction (EC) techniques have been widely developed due to their efficiency in using parallel text data. Previous works mainly focus on using text or/ and speech data, which hinders the performance gain when not only text and speech information, but other modalities, such as visual information are critical for EC. The challenges are mainly two folds: one is that previous work fails to emphasize visual information, thus rare exploration has been studied. The other is that the community lacks a high-quality benchmark where visual information matters for the EC models. Therefore, this paper provides 1) simple yet effective methods, namely gated fusion and image captions as prompts to incorporate visual information to help EC; 2) large-scale benchmark datasets, namely Visual-ASR-EC, where each item in the training data consists of visual, speech, and text information, and the test data are carefully selected by human annotators to ensure that even humans could make mistakes when visual information is missing. Experimental results show that using captions as prompts could effectively use the visual information and surpass state-of-the-art methods by upto 1.2% in Word Error Rate(WER), which also indicates that visual information is critical in our proposed Visual-ASR-EC dataset

information, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.1016

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

Add feedback