Machine Translation
Ten Machine Learning Algorithms You Should Know to Become a Data Scientist - ParallelDots
Let's say I am given an Excel sheet with data about various fruits and I have to tell which look like Apples. What I will do is ask a question "Which fruits are red and round?" and divide all fruits which answer yes and no to the question. Now, All Red and Round fruits might not be apples and all apples won't be red and round. So I will ask a question "Which fruits have red or yellow color hints on them? " on red and round fruits and will ask "Which fruits are green and round?" on not red and round fruits. Based on these questions I can tell with considerable accuracy which are apples. This cascade of questions is what a decision tree is. However, this is a decision tree based on my intuition.
What is the current biggest hurdle for AI innovation? Gengo AI
In a previous article, I discussed the current pace of AI innovation. The shortage of available AI training data is a huge blocker in AI innovation today, leaving some businesses frustrated. In recent years, some media channels hyped up that AI technology will advance exponentially at lightning speed, but so far that has not happened. We don't have enough AI training data because companies often underestimate the amount of data they need, and the time to collect that data. The few companies invested in data collection often refuse to make their data public, usually due to privacy concerns or fear of losing to their competitors.
Evaluating Text Output in NLP: BLEU at your own risk
One question I get fairly often from folks who are just getting into NLP is how to evaluate systems when the output of that system is text, rather than some sort of classification of the input text. These types of problems, where you put some text into your model and get some other text out of it, are known as sequence to sequence or string transduction problems. This sort of technology is right out of science fiction. With such a wide range of exciting applications, it's easy to see why sequence to sequence modeling is more popular than ever. What's not easy is actually evaluating these systems. Unfortunately for folks who are just getting started, there's no simple answer about what metric you should use to evaluate your model. Even worse, one of the most popular metrics for evaluating sequence to sequence tasks, BLEU, has major drawbacks, especially when applied to tasks that it was never intended to evaluate.
Artificial Intelligence Is Changing The Translation Industry. But Will It Work?
Artificial intelligence (AI) has infiltrated numerous aspects of our lives in recent years, thanks to improvements in the field of machine learning, where computers ostensibly program themselves. This drive towards digital self-learning has led to major breakthroughs in our day-to-day interactions with machines, most notably the rise of digital home assistants such as Amazon Echo, and the recently launched Google Lens, which identifies objects based on visual cues from your phone's camera. One of the most widely-discussed advances has been the use of AI in translation. Not unlike the Babel Fish from The Hitchhiker's Guide to the Galaxy, with AI translation, "you can instantly understand anything said to you in any form of language." The technology works by recognizing words individually and then, as MIT Technology Review puts it, "takes advantage of the fact that relationships between certain words…are similar across languages" to create its translations. It has already found its way into a number of our most commonly used websites and platforms, with even grander plans in the pipeline – but just how reliable is the technology?
XNet: GAN Latent Space Constraints
Sendik, Omry, Lischinski, Dani, CohenOr, Daniel
Recent GAN-based architectures have been able to deliver impressive performance on the general task of image-to-image translation. In particular, it was shown that a wide variety of image translation operators may be learned from two image sets, containing images from two different domains, without establishing an explicit pairing between the images. This was made possible by introducing clever regularizers to overcome the under-constrained nature of the unpaired translation problem. In this work, we introduce a novel architecture for unpaired image translation, and explore several new regularizers enabled by it. Specifically, our architecture comprises a pair of GANs, as well as a pair of translators between their respective latent spaces. These cross-translators enable us to impose several regularizing constraints on the learnt image translation operator, collectively referred to as latent cross-consistency. Our results show that our proposed architecture and latent cross-consistency constraints are able to outperform the existing state-of-the-art on a wide variety of image translation tasks.
Google Translate will help Wikipedia fill its non-English websites
Google is helping the Wikimedia Foundation achieve its goal of making Wikipedia articles available in a lot more languages. The Foundation has added Google Translate to its content translation tool, which human editors can use to add content to non-English Wikipedia websites. Those editors can take advantage of the new option -- "one of the most advanced machine translation systems available today," the foundation called it -- to generate an initial translation that they can then review and edit for readability in their language. The Foundation says volunteer Wikipedia editors have been asking for Google Translate integration for a long time now. According to VentureBeat, this move is an expansion of an earlier partnership, wherein Google promised to help Wikipedia make its English posts more accessible in Indonesia.
Computational Register Analysis and Synthesis
The study of register in computational language research has historically been divided into register analysis, seeking to determine the registerial character of a text or corpus, and register synthesis, seeking to generate a text in a desired register. This article surveys the different approaches to these disparate tasks. Register synthesis has tended to use more theoretically articulated notions of register and genre than analysis work, which often seeks to categorize on the basis of intuitive and somewhat incoherent notions of prelabeled 'text types'. I argue that an integration of computational register analysis and synthesis will benefit register studies as a whole, by enabling a new large-scale research program in register studies. It will enable comprehensive global mapping of functional language varieties in multiple languages, including the relationships between them. Furthermore, computational methods together with high coverage systematically collected and analyzed data will thus enable rigorous empirical validation and refinement of different theories of register, which will have also implications for our understanding of linguistic variation in general.
Ministry earmarks subsidies totaling ¥20 million to set up translation systems for foreign students at schools
The education ministry plans to set up a new subsidy system for prefectures and large cities that offer detailed support to foreign students attending public elementary and junior high schools and their parents through the use of multilingual translation systems. The subsidies will be offered to prefectural governments, ordinance-designated major cities and other core cities that use tablet computers with multilingual speech translation functions when teaching Japanese to students from abroad at school and providing school guidance to their parents. The ministry has set aside ¥20 million for the subsidy system, which is designed to cover one-third of related costs, under the government's fiscal 2019 budget. According to sources, 100 language support programs are likely to become eligible for the financial aid. The launch of the new subsidy system comes in line with the government's policy of allowing more foreign workers to enter the country.
Ministry earmarks subsidies totaling ¥20 million to set up translation systems for foreign students at schools
The education ministry plans to set up a new subsidy system for prefectures and large cities that offer detailed support to foreign students attending public elementary and junior high schools and their parents by using multilingual translation systems. The subsidies will be offered to prefectural governments, ordinance-designated major cities and other core cities that use tablet computers with multilingual speech translation functions in teaching Japanese to students from abroad at school and providing school guidance to their parents. The ministry has set aside ¥20 million for the subsidy system, which is designed to cover one-third of related costs, under the government's fiscal 2019 budget, with 100 language support programs likely to become eligible for the financial aid, informed sources said. The launch of the new subsidy system comes in line with the government's policy of allowing more foreign workers to come here. The number of foreign students in Japan needing Japanese language education totaled 43,947 in fiscal 2016, up 70 percent from 26,281 in fiscal 2006.
Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback
Zhang, Chicheng, Agarwal, Alekh, Daumé, Hal III, Langford, John, Negahban, Sahand N
We investigate the feasibility of learning from both fully-labeled supervised data and contextual bandit data. We specifically consider settings in which the underlying learning signal may be different between these two data sources. Theoretically, we state and prove no-regret algorithms for learning that is robust to divergences between the two sources. Empirically, we evaluate some of these algorithms on a large selection of datasets, showing that our approaches are feasible, and helpful in practice.