AITopics | Cross, James

Collaborating Authors

Cross, James

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages

Sun, Simeng, Elbayad, Maha, Sun, Anna, Cross, James

arXiv.org Artificial IntelligenceFeb-7-2023

With multilingual machine translation (MMT) models continuing to grow in size and number of supported languages, it is natural to reuse and upgrade existing models to save computation as data becomes available in more languages. However, adding new languages requires updating the vocabulary, which complicates the reuse of embeddings. The question of how to reuse existing models while also making architectural changes to provide capacity for both old and new languages has also not been closely studied. In this work, we introduce three techniques that help speed up effective learning of the new languages and alleviate catastrophic forgetting despite vocabulary and architecture mismatches. Our results show that by (1) carefully initializing the network, (2) applying learning rate scaling, and (3) performing data up-sampling, it is possible to exceed the performance of a same-sized baseline model with 30% computation and recover the performance of a larger model trained from scratch with over 50% reduction in computation. Furthermore, our analysis reveals that the introduced techniques help learn the new directions more effectively and alleviate catastrophic forgetting at the same time. We hope our work will guide research into more efficient approaches to growing languages for these MMT models and ultimately maximize the reuse of existing models.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.03528

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.54)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multilingual Machine Translation with Hyper-Adapters

Baziotis, Christos, Artetxe, Mikel, Cross, James, Bhosale, Shruti

arXiv.org Artificial IntelligenceDec-5-2022

Multilingual machine translation suffers from negative interference across languages. A common solution is to relax parameter sharing with language-specific modules like adapters. However, adapters of related languages are unable to transfer information, and their total number of parameters becomes prohibitively expensive as the number of languages grows. In this work, we overcome these drawbacks using hyper-adapters -- hyper-networks that generate adapters from language and layer embeddings. While past work had poor results when scaling hyper-networks, we propose a rescaling fix that significantly improves convergence and enables training larger hyper-networks. We find that hyper-adapters are more parameter efficient than regular adapters, reaching the same performance with up to 12 times less parameters. When using the same number of parameters and FLOPS, our approach consistently outperforms regular adapters. Also, hyper-adapters converge faster than alternative approaches and scale better than regular dense networks. Our analysis shows that hyper-adapters learn to encode language relatedness, enabling positive transfer across languages.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2205.10835

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learn to Talk via Proactive Knowledge Transfer

Sun, Qing, Cross, James

arXiv.org Artificial IntelligenceAug-23-2020

Knowledge Transfer has been applied in solving a wide variety of problems. For example, knowledge can be transferred between tasks (e.g., learning to handle novel situations by leveraging prior knowledge) or between agents (e.g., learning from others without direct experience). Without loss of generality, we relate knowledge transfer to KL-divergence minimization, i.e., matching the (belief) distributions of learners and teachers. The equivalence gives us a new perspective in understanding variants of the KL-divergence by looking at how learners structure their interaction with teachers in order to acquire knowledge. In this paper, we provide an in-depth analysis of KL-divergence minimization in Forward and Backward orders, which shows that learners are reinforced via on-policy learning in Backward. In contrast, learners are supervised in Forward. Moreover, our analysis is gradient-based, so it can be generalized to arbitrary tasks and help to decide which order to minimize given the property of the task. By replacing Forward with Backward in Knowledge Distillation, we observed +0.7-1.1 BLEU gains on the WMT'17 De-En and IWSLT'15 Th-En machine translation tasks.

artificial intelligence, learner, machine translation, (16 more...)

arXiv.org Artificial Intelligence

2008.10077

Genre: Research Report (0.64)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Knowledge Management > Knowledge Engineering (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback