AITopics

2409.19507

Country: Oceania > Australia > New South Wales > Sydney (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study > Negative Result (0.34)

Industry: Law (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Zetsu, Tatsuya, Arase, Yuki, Kajiwara, Tomoyuki

Edit-Constrained Decoding for Sentence Simplification

arXiv.org Artificial IntelligenceSep-28-2024

We propose edit operation based lexically constrained decoding for sentence simplification. In sentence simplification, lexical paraphrasing is one of the primary procedures for rewriting complex sentences into simpler correspondences. While previous studies have confirmed the efficacy of lexically constrained decoding on this task, their constraints can be loose and may lead to sub-optimal generation. We address this problem by designing constraints that replicate the edit operations conducted in simplification and defining stricter satisfaction conditions. Our experiments indicate that the proposed method consistently outperforms the previous studies on three English simplification corpora commonly used in this task.

artificial intelligence, constraint, natural language, (17 more...)

2409.19247

Country:

Asia > Singapore (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)

arXiv.org Artificial IntelligenceSep-28-2024

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Du, Yexing, Ma, Ziyang, Yang, Yifan, Deng, Keqi, Chen, Xie, Yang, Bo, Xiang, Yang, Liu, Ming, Qin, Bing

Speech Language Models (SLMs) have demonstrated impressive performance on speech translation tasks. However, existing research primarily focuses on direct instruction fine-tuning and often overlooks the inherent reasoning capabilities of SLMs. In this paper, we introduce a three-stage training framework designed to activate the chain-of-thought (CoT) capabilities of SLMs. We propose CoT-ST, a speech translation model that utilizes multimodal CoT to decompose speech translation into sequential steps of speech recognition and translation. We validated the effectiveness of our method on two datasets: the CoVoST-2 dataset and MuST-C dataset. The experimental results demonstrate that CoT-ST outperforms previous state-of-the-art methods, achieving higher BLEU scores (CoVoST-2 en-ja: 30.5->30.8, en-zh: 45.2->47.7, MuST-C en-zh: 19.6->21.2). This work is open sourced at https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/st_covost2 .

arxiv preprint arxiv, speech recognition, translation, (12 more...)

2409.1951

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Belgium (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-27-2024

Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Aycock, Seth, Stap, David, Wu, Di, Monz, Christof, Sima'an, Khalil

Extremely low-resource (XLR) languages lack substantial corpora for training NLP models, motivating the use of all available resources such as dictionaries and grammar books. Machine Translation from One Book (Tanzer et al., 2024) suggests prompting long-context LLMs with one grammar book enables English-Kalamang translation, an unseen XLR language - a noteworthy case of linguistic knowledge helping an NLP task. We investigate whether the book's grammatical explanations or its parallel examples are most effective for learning XLR translation, finding almost all improvement stems from the parallel examples. Further, we find similar results for Nepali, a seen low-resource language, and achieve performance comparable to an LLM with a grammar book by simply fine-tuning an encoder-decoder translation model. We then investigate where grammar books help by testing two linguistic tasks, grammaticality judgment and gloss prediction, and we explore what kind of grammatical knowledge helps by introducing a typological feature prompt that achieves leading results on these more relevant tasks. We thus emphasise the importance of task-appropriate data for XLR languages: parallel examples for translation, and grammatical data for linguistic tasks. As we find no evidence that long-context LLMs can make effective use of grammatical explanations for XLR translation, we suggest data collection for multilingual XLR tasks such as translation is best focused on parallel data over linguistic description.

large language model, machine learning, natural language, (17 more...)

2409.19151

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(22 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceSep-27-2024

HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models

Zhou, Yu, Wu, Xingyu, Wu, Jibin, Feng, Liang, Tan, Kay Chen

Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability. It has gained popularity in large pretrained model development due to its ability to bypass the need for original training data and further training processes. However, most existing model merging approaches focus solely on exploring the parameter space, merging models with identical architectures. Merging within the architecture space, despite its potential, remains in its early stages due to the vast search space and the challenges of layer compatibility. This paper marks a significant advance toward more flexible and comprehensive model merging techniques by modeling the architecture-space merging process as a reinforcement learning task. We train policy and value networks using offline sampling of weight vectors, which are then employed for the online optimization of merging strategies. Moreover, a multi-objective optimization paradigm is introduced to accommodate users' diverse task preferences, learning the Pareto front of optimal models to offer customized merging suggestions. Experimental results across multiple tasks, including text translation, mathematical reasoning, and code generation, validate the effectiveness and superiority of the proposed framework in model merging. The code will be made publicly available after the review process.

machine learning, natural language, reinforcement learning, (18 more...)

2409.18893

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Alastruey, Belen, Gállego, Gerard I., Costa-jussà, Marta R.

Unveiling the Role of Pretraining in Direct Speech Translation

Direct speech-to-text translation systems encounter an important drawback in data scarcity. A common solution consists on pretraining the encoder on automatic speech recognition, hence losing efficiency in the training process. In this study, we compare the training dynamics of a system using a pretrained encoder, the conventional approach, and one trained from scratch. We observe that, throughout the training, the randomly initialized model struggles to incorporate information from the speech inputs for its predictions. Hence, we hypothesize that this issue stems from the difficulty of effectively training an encoder for direct speech translation. While a model trained from scratch needs to learn acoustic and semantic modeling simultaneously, a pretrained one can just focus on the latter. Based on these findings, we propose a subtle change in the decoder cross-attention to integrate source information from earlier steps in training. We show that with this change, the model trained from scratch can achieve comparable performance to the pretrained one, while reducing the training time.

encoder, information, translation, (14 more...)

2409.18044

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Yue, Richard, Ortega, John E., Church, Kenneth Ward

On Translating Technical Terminology: A Translation Workflow for Machine-Translated Acronyms

The typical workflow for a professional translator to translate a document from its source language (SL) to a target language (TL) is not always focused on what many language models in natural language processing (NLP) do - predict the next word in a series of words. While high-resource languages like English and French are reported to achieve near human parity using common metrics for measurement such as BLEU and COMET, we find that an important step is being missed: the translation of technical terms, specifically acronyms. Some state-of-the art machine translation systems like Google Translate which are publicly available can be erroneous when dealing with acronyms - as much as 50% in our findings. This article addresses acronym disambiguation for MT systems by proposing an additional step to the SL-TL (FR-EN) translation workflow where we first offer a new acronym corpus for public consumption and then experiment with a search-based thresholding algorithm that achieves nearly 10% increase when compared to Google Translate and OpusMT.

acronym, translation, verification, (14 more...)

2409.17943

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.05)
Europe > Finland > Uusimaa > Helsinki (0.05)
North America > United States > California > Santa Clara County > San Jose (0.04)

Genre:

Workflow (1.00)
Research Report > New Finding (0.34)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Yue, Richard, Ortega, John E.

Predicting Anchored Text from Translation Memories for Machine Translation Using Deep Learning Methods

Translation memories (TMs) are the backbone for professional translation tools called computer-aided translation (CAT) tools. In order to perform a translation using a CAT tool, a translator uses the TM to gather translations similar to the desired segment to translate (s'). Many CAT tools offer a fuzzy-match algorithm to locate segments (s) in the TM that are close in distance to s'. After locating two similar segments, the CAT tool will present parallel segments (s, t) that contain one segment in the source language along with its translation in the target language. Additionally, CAT tools contain fuzzy-match repair (FMR) techniques that will automatically use the parallel segments from the TM to create new TM entries containing a modified version of the original with the idea in mind that it will be the translation of s'. Most FMR techniques use machine translation as a way of "repairing" those words that have to be modified. In this article, we show that for a large part of those words which are anchored, we can use other techniques that are based on machine learning approaches such as Word2Vec. BERT, and even ChatGPT. Specifically, we show that for anchored words that follow the continuous bag-of-words (CBOW) paradigm, Word2Vec, BERT, and GPT-4 can be used to achieve similar and, for some cases, better results than neural machine translation for translating anchored words from French to English.

machine translation, translation, word2vec, (15 more...)

2409.17939

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
(12 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Uhlig, Kaden, Wuebker, Joern, Reinauer, Raphael, DeNero, John

Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization

Reinforcement Learning from Human Feedback (RLHF) and derivative techniques like Direct Preference Optimization (DPO) are task-alignment algorithms used to repurpose general, foundational models for specific tasks. We show that applying task-alignment to neural machine translation (NMT) addresses an existing task--data mismatch in NMT, leading to improvements across all languages of a multilingual model, even when task-alignment is only applied to a subset of those languages. We do so by introducing Direct Quality Optimization (DQO), a variant of DPO leveraging a pre-trained translation quality estimation model as a proxy for human preferences, and verify the improvements with both automatic metrics and human evaluation.

computational linguistic, proceedings, translation, (11 more...)

2409.17673

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(21 more...)

Genre: Research Report > New Finding (0.67)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

arXiv.org Artificial IntelligenceSep-25-2024

On Extending Direct Preference Optimization to Accommodate Ties

Chen, Jinghong, Yang, Guangyu, Lin, Weizhe, Mei, Jingbiao, Byrne, Bill

We derive and investigate two DPO variants that explicitly model the possibility of declaring a tie in pair-wise comparisons. We replace the Bradley-Terry model in DPO with two well-known modeling extensions, by Rao and Kupper and by Davidson, that assign probability to ties as alternatives to clear preferences. Our experiments in neural machine translation and summarization show that explicitly labeled ties can be added to the datasets for these DPO variants without the degradation in task performance that is observed when the same tied pairs are presented to DPO. We find empirically that the inclusion of ties leads to stronger regularization with respect to the reference policy as measured by KL divergence, and we see this even for DPO in its original form. These findings motivate and enable the inclusion of tied pairs in preference optimization as opposed to simply discarding them.

direct preference optimization, dpo, probability, (12 more...)

2409.17431

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(12 more...)

Genre: Research Report > New Finding (0.92)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)