AITopics

doi: 10.1007/978-3-031-20059-5_1

2204.09817

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(12 more...)

Genre:

Research Report > Experimental Study (0.48)
Research Report > New Finding (0.48)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

arXiv.org Artificial IntelligenceJul-19-2022

MoEC: Mixture of Expert Clusters

Xie, Yuan, Huang, Shaohan, Chen, Tianyu, Wei, Furu

Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead. MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated. However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation. Such problems are especially severe on tasks with limited data, thus hindering the progress for MoE models to improve performance by scaling up. In this work, we propose Mixture of Expert Clusters - a general approach to enable expert layers to learn more diverse and appropriate knowledge by imposing variance-based constraints on the routing stage. We further propose a cluster-level expert dropout strategy specifically designed for the expert cluster structure. Our experiments reveal that MoEC could improve performance on machine translation and natural language understanding tasks, and raise the performance upper bound for scaling up experts under limited data. We also verify that MoEC plays a positive role in mitigating overfitting and sparse data allocation.

dropout, moe model, moec, (14 more...)

2207.09094

Country: Asia > China (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

#artificialintelligenceJul-18-2022, 16:30:17 GMT

Amazon's Sockeye 3: Neural Machine Translation With PyTorch That Is 126% Faster on GPUs

Amazon has introduced the latest version of their Sockeye toolkit for the efficient training of stronger and faster neural machine translation (NMT) models. Sockeye 3 achieves speeds up to 126 percent faster than other PyTorch implementations on GPUs and up to 292 percent faster on CPUs.

machine learning, natural language, neural machine translation, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

#artificialintelligenceJul-18-2022, 02:40:45 GMT

Top 10 Popular Machine Learning Applications and Examples

The latest buzzword in the business world is machine learning. Machine learning has captured the imagination of many, conjuring up images of futuristic self-learning AIs and robots. Machine learning has opened up new avenues for technology and tools in the industry that were impossible just a few years ago. It powers breakthrough innovations, from prediction engines to online streaming TV live streaming, that supports modern lifestyles. Before we dive into the different machine learning applications, let's first understand What Machine learning is.

artificial intelligence, machine learning, natural language, (11 more...)

Industry:

Health & Medicine (1.00)
Information Technology > Services > e-Commerce Services (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.30)

arXiv.org Artificial IntelligenceJul-18-2022

MAD for Robust Reinforcement Learning in Machine Translation

Donato, Domenic, Yu, Lei, Ling, Wang, Dyer, Chris

We introduce a new distributed policy gradient algorithm and show that it outperforms existing reward-aware training procedures such as REINFORCE, minimum risk training (MRT) and proximal policy optimization (PPO) in terms of training stability and generalization performance when optimizing machine translation models. Our algorithm, which we call MAD (on account of using the mean absolute deviation in the importance weighting calculation), has distributed data generators sampling multiple candidates per source sentence on worker nodes, while a central learner updates the policy. MAD depends crucially on two variance reduction strategies: (1) a conditional reward normalization method that ensures each source sentence has both positive and negative reward translation examples and (2) a new robust importance weighting scheme that acts as a conditional entropy regularizer. Experiments on a variety of translation tasks show that policies learned using the MAD algorithm perform very well when using both greedy decoding and beam search, and that the learned policies are sensitive to the specific reward used during training.

machine learning, natural language, reinforcement learning, (16 more...)

2207.08583

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(14 more...)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.87)

Pang, Richard Yuanzhe, He, He, Cho, Kyunghyun

Amortized Noisy Channel Neural Machine Translation

arXiv.org Artificial IntelligenceJul-18-2022

Noisy channel models have been especially effective in neural machine translation (NMT). However, recent approaches like "beam search and rerank" (BSR) incur significant computation overhead during inference, making real-world application infeasible. We aim to study if it is possible to build an amortized noisy channel NMT model such that when we do greedy decoding during inference, the translation accuracy matches that of BSR in terms of reward (based on the source-to-target log probability and the target-to-source log probability) and quality (based on BLEU and BLEURT). We attempt three approaches to train the new model: knowledge distillation, one-step-deviation imitation learning, and Q learning. The first approach obtains the noisy channel signal from a pseudo-corpus, and the latter two approaches aim to optimize toward a noisy-channel MT reward directly. For all three approaches, the generated translations fail to achieve rewards comparable to BSR, but the translation quality approximated by BLEU and BLEURT is similar to the quality of BSR-produced translations. Additionally, all three approaches speed up inference by 1-2 orders of magnitude.

beam search, trajectory, translation, (14 more...)

2112.0867

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(8 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

#artificialintelligenceJul-17-2022, 02:25:22 GMT

The best Machine Learning Translation Tool?

There are several possibilities when one wants to quickly translate something into another language. The Google Translator is possibly the most famous solution herefore. But besides that there are some good alternatives like DeepL, which according to some sources is supposed to be the best translator online [1][2][3].

artificial intelligence, machine translation, natural language, (1 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.40)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

arXiv.org Artificial IntelligenceJul-17-2022

IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages

Bugliarello, Emanuele, Liu, Fangyu, Pfeiffer, Jonas, Reddy, Siva, Elliott, Desmond, Ponti, Edoardo Maria, Vulić, Ivan

Reliable evaluation benchmarks designed for replicability and comprehensiveness have driven progress in machine learning. Due to the lack of a multilingual benchmark, however, vision-and-language research has mostly focused on English language tasks. To fill this gap, we introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together - by both aggregating pre-existing datasets and creating new ones - visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. Our benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups. Based on the evaluation of the available state-of-the-art models, we find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks. Moreover, downstream performance is partially explained by the amount of available unlabelled textual data for pretraining, and only weakly by the typological distance of target-source languages. We hope to encourage future research efforts in this area by releasing the benchmark to the community.

computational linguistic, dataset, proceedings, (15 more...)

2201.11732

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(24 more...)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.70)

#artificialintelligenceJul-16-2022, 07:30:46 GMT

Behind No language Left Behind

What if you didn't need English to translate? Meta's new and improved open source AI model'NLLB-200' is capable of translating 200 languages without English! "Communicating across languages is one superpower that AI provides, but as we keep advancing our AI work it's improving everything we do--from showing the most interesting content on Facebook and Instagram, to recommending more relevant ads, to keeping our services safe for everyone", says Mark Zuckerberg, CEO, Meta. Accessibility through language ensures that the benefits of the advancement of technology reach everyone, no matter what language they may speak. Tech companies are assuming a proactive role in attempting to bridge this gap.

artificial intelligence, natural language, translation, (18 more...)

Industry: Information Technology > Services (0.56)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Pratapa, Adithya, Gupta, Rishubh, Mitamura, Teruko

Multilingual Event Linking to Wikidata

arXiv.org Artificial IntelligenceJul-16-2022

We present a task of multilingual linking of events to a knowledge base. We automatically compile a large-scale dataset for this task, comprising of 1.8M mentions across 44 languages referring to over 10.9K events from Wikidata. We propose two variants of the event linking task: 1) multilingual, where event descriptions are from the same language as the mention, and 2) crosslingual, where all event descriptions are in English. On the two proposed tasks, we compare multiple event linking systems including BM25+ (Lv and Zhai, 2011) and multilingual adaptations of the biencoder and crossencoder architectures from BLINK (Wu et al., 2020). In our experiments on the two task variants, we find both biencoder and crossencoder models significantly outperform the BM25+ baseline. Our results also indicate that the crosslingual task is in general more challenging than the multilingual task. To test the out-of-domain generalization of the proposed linking systems, we additionally create a Wikinews-based evaluation set. We present qualitative analysis highlighting various aspects captured by the proposed dataset, including the need for temporal reasoning over context and tackling diverse event descriptions across languages.

computational linguistic, dataset, wikidata, (14 more...)

2204.06535

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(40 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports > Olympic Games (1.00)
Government > Military (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)