AITopics

2410.17532

Country:

Asia > China (0.05)
Asia > Japan > Honshū > Tōhoku (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
Europe > Switzerland (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.87)

Industry:

Education (0.92)
Information Technology > Services (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-22-2024

Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!

Chung, Jiwan, Lim, Seungwon, Jeon, Jaehyun, Lee, Seungbeen, Yu, Youngjae

Humans possess multimodal literacy, allowing them to actively integrate information from various modalities to form reasoning. Faced with challenges like lexical ambiguity in text, we supplement this with other modalities, such as thumbnail images or textbook illustrations. Is it possible for machines to achieve a similar multimodal understanding capability? In response, we present Understanding Pun with Image Explanations (UNPIE), a novel benchmark designed to assess the impact of multimodal inputs in resolving lexical ambiguities. Puns serve as the ideal subject for this evaluation due to their intrinsic ambiguity. Our dataset includes 1,000 puns, each accompanied by an image that explains both meanings. We pose three multimodal challenges with the annotations to assess different aspects of multimodal literacy; Pun Grounding, Disambiguation, and Reconstruction. The results indicate that various Socratic Models and Visual-Language Models improve over the text-only models when given visual context, particularly as the complexity of the tasks increases.

large language model, machine learning, natural language, (20 more...)

2410.01023

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
(2 more...)

arXiv.org Artificial IntelligenceOct-22-2024

Context-Aware LLM Translation System Using Conversation Summarization and Dialogue History

Sung, Mingi, Lee, Seungmin, Kim, Jiwon, Kim, Sejoon

Translating conversational text, particularly in customer support contexts, presents unique challenges due to its informal and unstructured nature. We propose a context-aware LLM translation system that leverages conversation summarization and dialogue history to enhance translation quality for the English-Korean language pair. Our approach incorporates the two most recent dialogues as raw data and a summary of earlier conversations to manage context length effectively. We demonstrate that this method significantly improves translation accuracy, maintaining coherence and consistency across conversations. This system offers a practical solution for customer support translation tasks, addressing the complexities of conversational text.

large language model, natural language, translation, (15 more...)

2410.16775

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)
Asia > South Korea > Seoul > Seoul (0.05)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Subword Embedding from Bytes Gains Privacy without Sacrificing Accuracy and Complexity

Zhang, Mengjiao, Xu, Jia

While NLP models significantly impact our lives, there are rising concerns about privacy invasion. Although federated learning enhances privacy, attackers may recover private training data by exploiting model parameters and gradients. Therefore, protecting against such embedding attacks remains an open challenge. To address this, we propose Subword Embedding from Bytes (SEB) and encode subwords to byte sequences using deep neural networks, making input text recovery harder. Importantly, our method requires a smaller memory with $256$ bytes of vocabulary while keeping efficiency with the same input length. Thus, our solution outperforms conventional approaches by preserving privacy without sacrificing efficiency or accuracy. Our experiments show SEB can effectively protect against embedding-based attacks from recovering original sentences in federated learning. Meanwhile, we verify that SEB obtains comparable and even better results over standard subword embedding methods in machine translation, sentiment analysis, and language modeling with even lower time and space complexity.

artificial intelligence, machine learning, natural language, (18 more...)

2410.1641

Country:

North America > United States > New York (0.04)
North America > Mexico (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)

Pengpun, Parinthapat, Tiankanon, Krittamate, Chinkamol, Amrest, Kinchagawat, Jiramet, Chairuengjitjaras, Pitchaya, Supholkhan, Pasit, Aussavavirojekul, Pubordee, Boonnag, Chiraphat, Veerakanjana, Kanyakorn, Phimsiri, Hirunkul, Sae-jia, Boonthicha, Sataudom, Nattawach, Ittichaiwong, Piyalitt, Limkonchotiwat, Peerat

On Creating an English-Thai Code-switched Machine Translation in Medical Domain

Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medical terminologies. Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English within the translated text through code-switched (CS) translation. We developed a method to produce CS medical translation data, fine-tuned a CS translation model with this data, and evaluated its performance against strong baselines, such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model demonstrated competitive performance in automatic metrics and was highly favored in human preference evaluations. Our evaluation result also shows that medical professionals significantly prefer CS translations that maintain critical English terms accurately, even if it slightly compromises fluency. Our code and test set are publicly available https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.

machine learning, natural language, translation, (18 more...)

doi: 10.18653/v1/2024.findings-emnlp.351

2410.16221

Country:

Europe > Portugal > Lisbon > Lisbon (0.14)
Asia > Singapore (0.05)
North America > United States > Pennsylvania (0.04)
(11 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (1.00)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

CA*: Addressing Evaluation Pitfalls in Computation-Aware Latency for Simultaneous Speech Translation

Xu, Xi, Xu, Wenda, Ouyang, Siqi, Li, Lei

Simultaneous speech translation (SimulST) systems must balance translation quality with response time, making latency measurement crucial for evaluating their real-world performance. However, there has been a longstanding belief that current metrics yield unrealistically high latency measurements in unsegmented streaming settings. In this paper, we investigate this phenomenon, revealing its root cause in a fundamental misconception underlying existing latency evaluation approaches. We demonstrate that this issue affects not only streaming but also segment-level latency evaluation across different metrics. Furthermore, we propose a modification to correctly measure computation-aware latency for SimulST systems, addressing the limitations present in existing metrics.

artificial intelligence, natural language, translation, (15 more...)

2410.16011

Country:

North America > United States > Mississippi (0.05)
Asia > Thailand > Bangkok > Bangkok (0.05)
North America > Canada > Ontario > Toronto (0.05)
(8 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Kim, Sejoon, Sung, Mingi, Lee, Jeonghwan, Lim, Hyunkuk, Perez, Jorge Froilan Gimenez

Efficient Terminology Integration for LLM-based Translation in Specialized Domains

Traditional machine translation methods typically involve training models directly on large parallel corpora, with limited emphasis on specialized terminology. However, In specialized fields such as patent, finance, or biomedical domains, terminology is crucial for translation, with many terms that needs to be translated following agreed-upon conventions. In this paper we introduce a methodology that efficiently trains models with a smaller amount of data while preserving the accuracy of terminology translation. We achieve this through a systematic process of term extraction and glossary creation using the Trie Tree algorithm, followed by data reconstruction to teach the LLM how to integrate these specialized terms. This methodology enhances the model's ability to handle specialized terminology and ensures high-quality translations, particularly in fields where term consistency is crucial. Our approach has demonstrated exceptional performance, achieving the highest translation score among participants in the WMT patent task to date, showcasing its effectiveness and broad applicability in specialized translation domains where general methods often fall short.

large language model, machine learning, translation, (16 more...)

2410.1569

Country:

Asia > Singapore (0.05)
North America > Canada > Ontario > Toronto (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Zaranis, Emmanouil, Guerreiro, Nuno M., Martins, André F. T.

Analyzing Context Contributions in LLM-based Machine Translation

Large language models (LLMs) have achieved state-of-the-art performance in machine translation (MT) and demonstrated the ability to leverage in-context learning through few-shot examples. However, the mechanisms by which LLMs use different parts of the input context remain largely unexplored. In this work, we provide a comprehensive analysis of context utilization in MT, studying how LLMs use various context parts, such as few-shot examples and the source text, when generating translations. We highlight several key findings: (1) the source part of few-shot examples appears to contribute more than its corresponding targets, irrespective of translation direction; (2) finetuning LLMs with parallel data alters the contribution patterns of different context parts; and (3) there is a positional bias where earlier few-shot examples have higher contributions to the translated sequence. Finally, we demonstrate that inspecting anomalous context contributions can potentially uncover pathological translations, such as hallucinations. Our findings shed light on the internal workings of LLM-based MT which go beyond those known for standard encoder-decoder MT models.

large language model, machine learning, natural language, (19 more...)

2410.16246

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
Europe > Austria > Salzburg > Salzburg (0.04)
Asia > Singapore (0.04)
(23 more...)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Sports > Soccer (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Learning from others' mistakes: Finetuning machine translation models with span-level error annotations

Zhang, Lily H., Dadkhahi, Hamid, Finkelstein, Mara, Trabelsi, Firas, Luo, Jiaming, Freitag, Markus

L EARNING FROM OTHERS ' MISTAKES: F INETUNING MACHINE TRANSLATION MODELS WITH SPAN-LEVEL ERROR ANNOTATIONS Lily H. Zhang 2 Hamid Dadkhahi 1 Mara Finkelstein 1 Firas Trabelsi 1 Jiaming Luo 1 Markus Freitag 1 1 Google 2 New Y ork University A BSTRACT Despite growing interest in incorporating feedback to improve language models, most efforts focus only on sequence-level annotations. In this work, we explore the potential of utilizing fine-grained span-level annotations from offline datasets to improve model quality. We develop a simple finetuning algorithm, called Training with Annotations (TW A), to directly train machine translation models on such annotated data. TW A utilizes targeted span-level error information while also flexibly learning what to penalize within a span. Moreover, TW A considers the overall trajectory of a sequence when deciding which non-error spans to utilize as positive signals. Experiments on English-German and Chinese-English machine translation show that TW A outperforms baselines such as Supervised FineTuning on sequences filtered for quality and Direct Preference Optimization on pairs constructed from the same data. Such data, coupled with techniques to learn from it (Christiano et al., 2017; Rafailov et al., 2023; Gulcehre et al., 2023; Dong et al., 2023), have yielded impressive results for many top language models. Most efforts, however, consider only sequence-level labels, usually in the form of a scalar score assigned to the entire output. In contrast, this work investigates the potential of using fine-grained span-level annotations from offline datasets to enhance language model training.

annotation, error span, information, (14 more...)

2410.16509

Country:

Asia > Singapore (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Dang, Thao Anh, Raviv, Limor, Galke, Lukas

Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5

Morphology is a crucial factor for multilingual language modeling as it poses direct challenges for tokenization. Here, we seek to understand how tokenization influences the morphological knowledge encoded in multilingual language models. Specifically, we capture the impact of tokenization by contrasting two multilingual language models: mT5 and ByT5. The two models share the same architecture, training objective, and training data and only differ in their tokenization strategies: subword tokenization vs.\@ character-level tokenization. Probing the morphological knowledge encoded in these models on four tasks and 17 languages, our analyses show that the models learn the morphological systems of some languages better than others and that morphological information is encoded in the middle and late layers. Finally, we show that languages with more irregularities benefit more from having a higher share of the pre-training data.

computational linguistic, language model, proceedings, (13 more...)

2410.11627

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Denmark > Southern Denmark (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)