Machine Translation
Making the Most of Text Semantics to Improve Biomedical Vision--Language Processing
Boecking, Benedikt, Usuyama, Naoto, Bannur, Shruthi, Castro, Daniel C., Schwaighofer, Anton, Hyland, Stephanie, Wetscherek, Maria, Naumann, Tristan, Nori, Aditya, Alvarez-Valle, Javier, Poon, Hoifung, Oktay, Ozan
Multi-modal data abounds in biomedicine, such as radiology images and reports. Interpreting this data at scale is essential for improving clinical care and accelerating clinical research. Biomedical text with its complex semantics poses additional challenges in vision--language modelling compared to the general domain, and previous work has used insufficiently adapted models that lack domain-specific language understanding. In this paper, we show that principled textual semantic modelling can substantially improve contrastive learning in self-supervised vision--language processing. We release a language model that achieves state-of-the-art results in radiology natural language inference through its improved vocabulary and novel language pretraining objective leveraging semantics and discourse characteristics in radiology reports. Further, we propose a self-supervised joint vision--language approach with a focus on better text modelling. It establishes new state of the art results on a wide range of publicly available benchmarks, in part by leveraging our new domain-specific language model. We release a new dataset with locally-aligned phrase grounding annotations by radiologists to facilitate the study of complex semantic modelling in biomedical vision--language processing. A broad evaluation, including on this new dataset, shows that our contrastive learning approach, aided by textual-semantic modelling, outperforms prior methods in segmentation tasks, despite only using a global-alignment objective.
MoEC: Mixture of Expert Clusters
Xie, Yuan, Huang, Shaohan, Chen, Tianyu, Wei, Furu
Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead. MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated. However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation. Such problems are especially severe on tasks with limited data, thus hindering the progress for MoE models to improve performance by scaling up. In this work, we propose Mixture of Expert Clusters - a general approach to enable expert layers to learn more diverse and appropriate knowledge by imposing variance-based constraints on the routing stage. We further propose a cluster-level expert dropout strategy specifically designed for the expert cluster structure. Our experiments reveal that MoEC could improve performance on machine translation and natural language understanding tasks, and raise the performance upper bound for scaling up experts under limited data. We also verify that MoEC plays a positive role in mitigating overfitting and sparse data allocation.
Top 10 Popular Machine Learning Applications and Examples
The latest buzzword in the business world is machine learning. Machine learning has captured the imagination of many, conjuring up images of futuristic self-learning AIs and robots. Machine learning has opened up new avenues for technology and tools in the industry that were impossible just a few years ago. It powers breakthrough innovations, from prediction engines to online streaming TV live streaming, that supports modern lifestyles. Before we dive into the different machine learning applications, let's first understand What Machine learning is.
MAD for Robust Reinforcement Learning in Machine Translation
Donato, Domenic, Yu, Lei, Ling, Wang, Dyer, Chris
We introduce a new distributed policy gradient algorithm and show that it outperforms existing reward-aware training procedures such as REINFORCE, minimum risk training (MRT) and proximal policy optimization (PPO) in terms of training stability and generalization performance when optimizing machine translation models. Our algorithm, which we call MAD (on account of using the mean absolute deviation in the importance weighting calculation), has distributed data generators sampling multiple candidates per source sentence on worker nodes, while a central learner updates the policy. MAD depends crucially on two variance reduction strategies: (1) a conditional reward normalization method that ensures each source sentence has both positive and negative reward translation examples and (2) a new robust importance weighting scheme that acts as a conditional entropy regularizer. Experiments on a variety of translation tasks show that policies learned using the MAD algorithm perform very well when using both greedy decoding and beam search, and that the learned policies are sensitive to the specific reward used during training.
Amortized Noisy Channel Neural Machine Translation
Pang, Richard Yuanzhe, He, He, Cho, Kyunghyun
Noisy channel models have been especially effective in neural machine translation (NMT). However, recent approaches like "beam search and rerank" (BSR) incur significant computation overhead during inference, making real-world application infeasible. We aim to study if it is possible to build an amortized noisy channel NMT model such that when we do greedy decoding during inference, the translation accuracy matches that of BSR in terms of reward (based on the source-to-target log probability and the target-to-source log probability) and quality (based on BLEU and BLEURT). We attempt three approaches to train the new model: knowledge distillation, one-step-deviation imitation learning, and Q learning. The first approach obtains the noisy channel signal from a pseudo-corpus, and the latter two approaches aim to optimize toward a noisy-channel MT reward directly. For all three approaches, the generated translations fail to achieve rewards comparable to BSR, but the translation quality approximated by BLEU and BLEURT is similar to the quality of BSR-produced translations. Additionally, all three approaches speed up inference by 1-2 orders of magnitude.
The best Machine Learning Translation Tool?
There are several possibilities when one wants to quickly translate something into another language. The Google Translator is possibly the most famous solution herefore. But besides that there are some good alternatives like DeepL, which according to some sources is supposed to be the best translator online [1][2][3].
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Bugliarello, Emanuele, Liu, Fangyu, Pfeiffer, Jonas, Reddy, Siva, Elliott, Desmond, Ponti, Edoardo Maria, Vulić, Ivan
Reliable evaluation benchmarks designed for replicability and comprehensiveness have driven progress in machine learning. Due to the lack of a multilingual benchmark, however, vision-and-language research has mostly focused on English language tasks. To fill this gap, we introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together - by both aggregating pre-existing datasets and creating new ones - visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. Our benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups. Based on the evaluation of the available state-of-the-art models, we find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks. Moreover, downstream performance is partially explained by the amount of available unlabelled textual data for pretraining, and only weakly by the typological distance of target-source languages. We hope to encourage future research efforts in this area by releasing the benchmark to the community.
Behind No language Left Behind
What if you didn't need English to translate? Meta's new and improved open source AI model'NLLB-200' is capable of translating 200 languages without English! "Communicating across languages is one superpower that AI provides, but as we keep advancing our AI work it's improving everything we do--from showing the most interesting content on Facebook and Instagram, to recommending more relevant ads, to keeping our services safe for everyone", says Mark Zuckerberg, CEO, Meta. Accessibility through language ensures that the benefits of the advancement of technology reach everyone, no matter what language they may speak. Tech companies are assuming a proactive role in attempting to bridge this gap.
Multilingual Event Linking to Wikidata
Pratapa, Adithya, Gupta, Rishubh, Mitamura, Teruko
We present a task of multilingual linking of events to a knowledge base. We automatically compile a large-scale dataset for this task, comprising of 1.8M mentions across 44 languages referring to over 10.9K events from Wikidata. We propose two variants of the event linking task: 1) multilingual, where event descriptions are from the same language as the mention, and 2) crosslingual, where all event descriptions are in English. On the two proposed tasks, we compare multiple event linking systems including BM25+ (Lv and Zhai, 2011) and multilingual adaptations of the biencoder and crossencoder architectures from BLINK (Wu et al., 2020). In our experiments on the two task variants, we find both biencoder and crossencoder models significantly outperform the BM25+ baseline. Our results also indicate that the crosslingual task is in general more challenging than the multilingual task. To test the out-of-domain generalization of the proposed linking systems, we additionally create a Wikinews-based evaluation set. We present qualitative analysis highlighting various aspects captured by the proposed dataset, including the need for temporal reasoning over context and tackling diverse event descriptions across languages.