Machine Translation
Four Ways Evolutionary AI Can Extend AI's Problem-Solving Capacity - Digitally Cognizant
Deep neural networks (DNN) have produced groundbreaking results in many complex applications of AI, such as natural language processing, facial recognition, sentiment analytics and object recognition. For instance, the accuracy of Google's machine translation system improved 60% using a DNN approach. Finding the right network architecture โ that is, the components of the network and how they are instantiated and connected โ is essential to this process. If the architecture is chosen based on history and convenience, the network will not reach its full potential. Much of the recent research in DNNs has focused on designing specialized architectures that excel with specific tasks.
VideoBERT: A Joint Model for Video and Language Representation Learning
Sun, Chen, Myers, Austin, Vondrick, Carl, Murphy, Kevin, Schmid, Cordelia
Self-supervised learning has become increasingly important Deep learning can benefit a lot from labeled data [23], to leverage the abundance of unlabeled data available but this is hard to acquire at scale. Consequently there has on platforms like YouTube. Whereas most existing been a lot of recent interest in "self supervised learning", approaches learn low-level representations, we propose a where we train a model on various "proxy tasks", which joint visual-linguistic model to learn high-level features we hope will result in the discovery of features or representations without any explicit supervision. In particular, inspired that can be used in downstream tasks (see e.g., by its recent success in language modeling, we build upon [22]). A wide variety of such proxy tasks have been proposed the BERT model to learn bidirectional joint distributions in the image and video domains. However, most of over sequences of visual and linguistic tokens, derived from these methods focus on low level features (e.g., textures) vector quantization of video data and off-the-shelf speech and short temporal scales (e.g., motion patterns that last a recognition outputs, respectively. We use this model in a second or less). We are interested in discovering high-level number of tasks, including action classification and video semantic features which correspond to actions and events captioning. We show that it can be applied directly to openvocabulary that unfold over longer time scales (e.g.
A Survey of Code-switched Speech and Language Processing
Sitaram, Sunayana, Chandu, Khyathi Raghavi, Rallabandi, Sai Krishna, Black, Alan W
Code-switching, the alternation of languages within a conversation or utterance, is a common communicative phenomenon that occurs in multilingual communities across the world. This survey reviews computational approaches for code-switched Speech and Natural Language Processing. We motivate why processing code-switched text and speech is essential for building intelligent agents and systems that interact with users in multilingual communities. As code-switching data and resources are scarce, we list what is available in various code-switched language pairs with the language processing tasks they can be used for. We review code-switching research in various Speech and NLP applications, including language processing tools and end-to-end systems. We conclude with future directions and open problems in the field.
Interpreting Black Box Models with Statistical Guarantees
Burns, Collin, Thomason, Jesse, Tansey, Wesley
While many methods for interpreting machine learning models have been proposed, they are frequently ad hoc, difficult to evaluate, and come with no statistical guarantees on the error rate. This is especially problematic in scientific domains, where interpretations must be accurate and reliable. In this paper, we cast black box model interpretation as a hypothesis testing problem. The task is to discover "important" features by testing whether the model prediction is significantly different from what would be expected if the features were replaced with randomly-sampled counterfactuals. We derive a multiple hypothesis testing framework for finding important features that enables control over the false discovery rate. We propose two testing methods, as well as analogs of one-sided and two-sided tests. In simulation, the methods have high power and compare favorably against existing interpretability methods. When applied to vision and language models, the framework selects features that intuitively explain model predictions.
Competence-based Curriculum Learning for Neural Machine Translation
Platanios, Emmanouil Antonios, Stretcu, Otilia, Neubig, Graham, Poczos, Barnabas, Mitchell, Tom M.
Current state-of-the-art NMT systems use large neural networks that are not only slow to train, but also often require many heuristics and optimization tricks, such as specialized learning rate schedules and large batch sizes. This is undesirable as it requires extensive hyperparameter tuning. In this paper, we propose a curriculum learning framework for NMT that reduces training time, reduces the need for specialized heuristics or large batch sizes, and results in overall better performance. Our framework consists of a principled way of deciding which training samples are shown to the model at different times during training, based on the estimated difficulty of a sample and the current competence of the model. Filtering training samples in this manner prevents the model from getting stuck in bad local optima, making it converge faster and reach a better solution than the common approach of uniformly sampling training examples. Furthermore, the proposed method can be easily applied to existing NMT models by simply modifying their input data pipelines. We show that our framework can help improve the training time and the performance of both recurrent neural network models and Transformers, achieving up to a 70% decrease in training time, while at the same time obtaining accuracy improvements of up to 2.2 BLEU.
Artificial Intelligence : from Research to Application ; the Upper-Rhine Artificial Intelligence Symposium (UR-AI 2019)
The TriRhenaTech alliance universities and their partners presented their competences in the field of artificial intelligence and their cross-border cooperations with the industry at the tri-national conference 'Artificial Intelligence : from Research to Application' on March 13th, 2019 in Offenburg. The TriRhenaTech alliance is a network of universities in the Upper Rhine Trinational Metropolitan Region comprising of the German universities of applied sciences in Furtwangen, Kaiserslautern, Karlsruhe, and Offenburg, the Baden-Wuerttemberg Cooperative State University Loerrach, the French university network Alsace Tech (comprised of 14 'grandes \'ecoles' in the fields of engineering, architecture and management) and the University of Applied Sciences and Arts Northwestern Switzerland. The alliance's common goal is to reinforce the transfer of knowledge, research, and technology, as well as the cross-border mobility of students.
Is Google's New Lingvo Framework a Big Deal for Machine Translation? Slator
Researchers in neural machine translation (NMT) and natural language processing (NLP) may want to keep an eye on a new framework from Google. Lingvo is specifically tailored toward sequence models and NLP, which includes speech recognition, language understanding, MT, and speech translation. The Google AI team claims there are already "dozens" of research papers in these areas based on Lingvo. In fact, they said this was one reason they decided to open-source the project: to support the research community and encourage reproducible results. Lingvo supports multiple neural network architectures -- from recurrent neural nets to Transformer models -- and comes with lots of documentation on common implementations across different tasks (i.e., NLP, NMT, speech synthesis).
The Missing Ingredient in Zero-Shot Neural Machine Translation
Arivazhagan, Naveen, Bapna, Ankur, Firat, Orhan, Aharoni, Roee, Johnson, Melvin, Macherey, Wolfgang
Multilingual Neural Machine Translation (NMT) models are capable of translating between multiple source and target languages. Despite various approaches to train such models, they have difficulty with zero-shot translation: translating between language pairs that were not together seen during training. In this paper we first diagnose why state-of-the-art multilingual NMT models that rely purely on parameter sharing, fail to generalize to unseen language pairs. We then propose auxiliary losses on the NMT encoder that impose representational invariance across languages. Our simple approach vastly improves zero-shot translation quality without regressing on supervised directions. For the first time, on WMT14 English-FrenchGerman, we achieve zero-shot performance that is on par with pivoting. We also demonstrate the easy scalability of our approach to multiple languages on the IWSLT 2017 shared task.
A Research Agenda: Dynamic Models to Defend Against Correlated Attacks
In this article I describe a research agenda for securing machine learning models against adversarial inputs at test time. This article does not present results but instead shares some of my thoughts about where I think that the field needs to go. Modern machine learning works very well on I.I.D. data: data for which each example is drawn {\em independently} and for which the distribution generating each example is {\em identical}. When these assumptions are relaxed, modern machine learning can perform very poorly. When machine learning is used in contexts where security is a concern, it is desirable to design models that perform well even when the input is designed by a malicious adversary. So far most research in this direction has focused on an adversary who violates the {\em identical} assumption, and imposes some kind of restricted worst-case distribution shift. I argue that machine learning security researchers should also address the problem of relaxing the {\em independence} assumption and that current strategies designed for robustness to distribution shift will not do so. I recommend {\em dynamic models} that change each time they are run as a potential solution path to this problem, and show an example of a simple attack using correlated data that can be mitigated by a simple dynamic defense. This is not intended as a real-world security measure, but as a recommendation to explore this research direction and develop more realistic defenses.
Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement
Kool, Wouter, van Hoof, Herke, Welling, Max
The well-known Gumbel-Max trick for sampling from a categorical distribution can be extended to sample $k$ elements without replacement. We show how to implicitly apply this 'Gumbel-Top-$k$' trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search. Even for exponentially large domains, the number of model evaluations grows only linear in $k$ and the maximum sampled sequence length. The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct low-variance estimators for expected sentence-level BLEU score and model entropy.