AITopics

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Neural Information Processing SystemsOct-3-2024, 06:38:43 GMT

Attention is All you Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 Englishto-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature.

arxiv preprint arxiv, sequence, transformer, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-3-2024, 06:00:24 GMT

Neural Program Meta-Induction

Jacob Devlin, Rudy R. Bunel, Rishabh Singh, Matthew Hausknecht, Pushmeet Kohli

Most recently proposed methods for Neural Program Induction work under the assumption of having a large set of input/output (I/O) examples for learning any underlying input-output mapping. This paper aims to address the problem of data and computation efficiency of program induction by leveraging information from related tasks. Specifically, we propose two approaches for cross-task knowledge transfer to improve program induction in limited-data scenarios. In our first proposal, portfolio adaptation, a set of induction models is pretrained on a set of related tasks, and the best model is adapted towards the new task using transfer learning. In our second approach, meta program induction, a k-shot learning approach is used to make a model generalize to new tasks without additional training.

induction, new task, program induction, (15 more...)

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Neural Information Processing SystemsOct-3-2024, 01:19:42 GMT

Style Transfer from Non-Parallel Text by Cross-Alignment

Tianxiao Shen, Tao Lei, Regina Barzilay, Tommi Jaakkola

This paper focuses on style transfer on the basis of non-parallel text. This is an instance of a broad family of problems including machine translation, decipherment, and sentiment modification. The key challenge is to separate the content from other aspects such as style. We assume a shared latent content distribution across different text corpora, and propose a method that leverages refined alignment of latent representations to perform style transfer. The transferred sentences from one style should match example sentences from the other style as a population. We demonstrate the effectiveness of this cross-alignment method on three tasks: sentiment modification, decipherment of word substitution ciphers, and recovery of word order.

arxiv preprint arxiv, evaluation, style transfer, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)

Neural Information Processing SystemsOct-3-2024, 00:38:46 GMT

Decoding with Value Networks for Neural Machine Translation

Di He, Hanqing Lu, Yingce Xia, Tao Qin, Liwei Wang, Tie-Yan Liu

Neural Machine Translation (NMT) has become a popular technology in recent years, and beam search is its de facto decoding method due to the shrunk search space and reduced computational complexity. However, since it only searches for local optima at each time step through one-step forward looking, it usually cannot output the best target sentence.

machine translation, translation, value network, (15 more...)

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceOct-3-2024

Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning

Hu, Tianxiang, Zhang, Pei, Yang, Baosong, Xie, Jun, Wong, Derek F., Wang, Rui

Achieving consistent high-quality machine translation (MT) across diverse domains remains a significant challenge, primarily due to the limited and imbalanced parallel training data available in various domains. While large language models (LLMs) have demonstrated impressive general understanding and generation abilities, their potential in multi-domain MT is under-explored. We establish a comprehensive benchmark for multi-domain translation, featuring 25 German$\Leftrightarrow$English and 22 Chinese$\Leftrightarrow$English test sets respectively covering 15 domains. Our evaluation of prominent LLMs reveals a discernible performance gap against traditional MT systems, highlighting domain overfitting and catastrophic forgetting issues after fine-tuning on domain-limited corpora. To mitigate this, we propose a domain Chain of Thought (CoT) fine-tuning technique that utilizes the intrinsic multi-domain intelligence of LLMs to improve translation performance. This method inspires the LLM to perceive domain information from the source text, which then serves as a helpful hint to guide the translation process. Despite being trained on a small dataset of four domains, our CoT fine-tune approach achieves notable enhancements in translation accuracy and domain robustness than traditional fine-tuning, as evidenced by an average 1.53 BLEU score increase in over 20 German$\rightarrow$English distinct out-of-domain tests.

machine translation, translation, wmt22, (14 more...)

2410.02631

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Macao (0.04)
Asia > Singapore (0.04)
(10 more...)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

García-Sierra, Óscar, Cesteros, Ana Fernández-Pampillón, Ortega-Martín, Miguel

Morphological evaluation of subwords vocabulary used by BETO language model

arXiv.org Artificial IntelligenceOct-3-2024

Subword tokenization algorithms used by Large Language Models are significantly more efficient and can independently build the necessary vocabulary of words and subwords without human intervention. However, those subwords do not always align with real morphemes, potentially impacting the models' performance, though it remains uncertain when this might occur. In previous research, we proposed a method to assess the morphological quality of vocabularies, focusing on the overlap between these vocabularies and the morphemes of a given language. Our evaluation method was built on three quality measures, relevance, cohesion, and morphological accuracy, and a procedure for their assessment. By applying this method to vocabularies created by three subword tokenization algorithms, BPE, Wordpiece, and Unigram, we concluded that these vocabularies generally exhibit very low morphological quality. In this article, we apply this evaluation to the tokenizer of BETO, a BERT language model trained on large Spanish corpora. This evaluation, along with our previous results, helped us conclude that its vocabulary has a low morphological quality, and we also found that training the tokenizer in a larger corpus does not improve the morphological quality of the generated vocabulary. Additionally, this evaluation helps clarify the algorithm used by the tokenizer, that is, Wordpiece, given the inconsistencies between the authors' claims and the model's configuration.

beto, tokenizer, vocabulario, (15 more...)

2410.02283

Country:

Europe > Spain > Galicia > Madrid (0.04)
North America > United States > California > Los Angeles County > El Segundo (0.04)
Europe > Portugal > Guarda > Guarda (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)

Neural Information Processing SystemsOct-2-2024, 22:01:21 GMT

Learned in Translation: Contextualized Word Vectors

Bryan McCann, James Bradbury, Caiming Xiong, Richard Socher

Computer vision has benefited from initializing multiple deep layers with weights pretrained on large supervised training sets like ImageNet. Natural language processing (NLP) typically sees initialization of only the lowest layer of deep models with pretrained word vectors. In this paper, we use a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation (MT) to contextualize word vectors. We show that adding these context vectors (CoVe) improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks: sentiment analysis (SST, IMDb), question classification (TREC), entailment (SNLI), and question answering (SQuAD). For fine-grained sentiment analysis and entailment, CoVe improves performance of our baseline models to the state of the art.

representation, sequence, vector, (14 more...)

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-1-2024

Creative and Context-Aware Translation of East Asian Idioms with GPT-4

Tang, Kenan, Song, Peiyang, Qin, Yao, Yan, Xifeng

As a type of figurative language, an East Asian idiom condenses rich cultural background into only a few characters. Translating such idioms is challenging for human translators, who often resort to choosing a context-aware translation from an existing list of candidates. However, compiling a dictionary of candidate translations demands much time and creativity even for expert translators. To alleviate such burden, we evaluate if GPT-4 can help generate high-quality translations. Based on automatic evaluations of faithfulness and creativity, we first identify Pareto-optimal prompting strategies that can outperform translation engines from Google and DeepL. Then, at a low cost, our context-aware translations can achieve far more high-quality translations per idiom than the human baseline. We open-source all code and data to facilitate further research.

idiom, lawsuit, translation, (15 more...)

2410.00988

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > South Korea > Seoul > Seoul (0.05)
Asia > Singapore (0.05)
(7 more...)

Genre: Research Report (0.40)

Industry: Law (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-1-2024

Efficient Technical Term Translation: A Knowledge Distillation Approach for Parenthetical Terminology Translation

Myung, Jiyoon, Park, Jihyeon, Son, Jungki, Lee, Kyungro, Han, Joohyung

This paper addresses the challenge of accurately translating technical terms, which are crucial for clear communication in specialized fields. We introduce the Parenthetical Terminology Translation (PTT) task, designed to mitigate potential inaccuracies by displaying the original term in parentheses alongside its translation. To implement this approach, we generated a representative PTT dataset using a collaborative approach with large language models and applied knowledge distillation to fine-tune traditional Neural Machine Translation (NMT) models and small-sized Large Language Models (sLMs). Additionally, we developed a novel evaluation metric to assess both overall translation accuracy and the correct parenthetical presentation of terms. Our findings indicate that sLMs did not consistently outperform NMT models, with fine-tuning proving more effective than few-shot prompting, particularly in models with continued pre-training in the target language. These insights contribute to the advancement of more reliable terminology translation methodologies.

dataset, neural task-driven modeling, translation, (10 more...)

2410.00683

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(6 more...)

Genre: Research Report > New Finding (0.66)

Industry: Government (0.60)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)