Machine Translation
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professional Post-Editing Towards More Effective MT Evaluation
Traditional automatic evaluation metrics for machine translation have been widely criticized by linguists due to their low accuracy, lack of transparency, focus on language mechanics rather than semantics, and low agreement with human quality evaluation. Human evaluations in the form of MQM-like scorecards have always been carried out in real industry setting by both clients and translation service providers (TSPs). However, traditional human translation quality evaluations are costly to perform and go into great linguistic detail, raise issues as to inter-rater reliability (IRR) and are not designed to measure quality of worse than premium quality translations. In this work, we introduce HOPE, a task-oriented and human-centric evaluation framework for machine translation output based on professional post-editing annotations. It contains only a limited number of commonly occurring error types, and use a scoring model with geometric progression of error penalty points (EPPs) reflecting error severity level to each translation unit. The initial experimental work carried out on English-Russian language pair MT outputs on marketing content type of text from highly technical domain reveals that our evaluation framework is quite effective in reflecting the MT output quality regarding both overall system-level performance and segment-level transparency, and it increases the IRR for error type interpretation. The approach has several key advantages, such as ability to measure and compare less than perfect MT output from different systems, ability to indicate human perception of quality, immediate estimation of the labor effort required to bring MT output to premium quality, low-cost and faster application, as well as higher IRR. Our experimental data is available at https://github.com/lHan87/HOPE.
African researchers aim to rescue languages that Western tech ignores
Computers have become amazingly precise at translating spoken words to text messages and scouring huge troves of information for answers to complex questions. At least, that is, so long as you speak English or another of the world's dominant languages. But try talking to your phone in Yoruba, Igbo or any number of widely spoken African languages and you'll find glitches that can hinder access to information, trade, personal communications, customer service and other benefits of the global tech economy. "We are getting to the point where if a machine doesn't understand your language it will be like it never existed," said Vukosi Marivate, chief of data science at the University of Pretoria in South Africa, in a call to action before a December virtual gathering of the world's artificial intelligence researchers. American tech giants don't have a great track record of making their language technology work well outside the wealthiest markets, a problem that's also made it harder for them to detect dangerous misinformation on their platforms.
Analyzing Scientific Publications using Domain-Specific Word Embedding and Topic Modelling
Singhal, Trisha, Liu, Junhua, Blessing, Lucienne T. M., Lim, Kwan Hui
The scientific world is changing at a rapid pace, with new technology being developed and new trends being set at an increasing frequency. This paper presents a framework for conducting scientific analyses of academic publications, which is crucial to monitor research trends and identify potential innovations. This framework adopts and combines various techniques of Natural Language Processing, such as word embedding and topic modelling. Word embedding is used to capture semantic meanings of domain-specific words. We propose two novel scientific publication embedding, i.e., PUB-G and PUB-W, which are capable of learning semantic meanings of general as well as domain-specific words in various research fields. Thereafter, topic modelling is used to identify clusters of research topics within these larger research fields. We curated a publication dataset consisting of two conferences and two journals from 1995 to 2020 from two research domains. Experimental results show that our PUB-G and PUB-W embeddings are superior in comparison to other baseline embeddings by a margin of ~0.18-1.03 based on topic coherence.
Investigating Effect of Dialogue History in Multilingual Task Oriented Dialogue Systems
Sun, Michael, Huang, Kaili, Moradshahi, Mehrad
While the English virtual assistants have achieved exciting performance with an enormous amount of training resources, the needs of non-English-speakers have not been satisfied well. Up to Dec 2021, Alexa, one of the most popular smart speakers around the world, is able to support 9 different languages [1], while there are thousands of languages in the world, 91 of which are spoken by more than 10 million people according to statistics published in 2019 [2]. However, training a virtual assistant in other languages than English is often more difficult, especially for those low-resource languages. The lack of high-quality training data restricts the performance of models, resulting in poor user satisfaction. Therefore, we devise an efficient and effective training solution for multilingual task-orientated dialogue systems, using the same dataset generation pipeline and end-to-end dialogue system architecture as BiToD[5], which adopted some key design choices for a minimalistic natural language design where formal dialogue states are used in place of natural language inputs. This reduces the room for error brought by weaker natural language models, and ensures the model can correctly extract the essential slot values needed to perform dialogue state tracking (DST). Our goal is to reduce the amount of natural language encoded at each turn, and the key parameter we investigate is the number of turns (H) to feed as history to model. We first explore the turning point where increasing H begins to yield limiting returns on the overall performance. Then we examine whether the examples a model with small H gets wrong can be categorized in a way for the model to do few-shot finetuning on. Lastly, will explore the limitations of this approach, and whether there is a certain type of examples that this approach will not be able to resolve.
Self-Distillation Mixup Training for Non-autoregressive Neural Machine Translation
Guo, Jiaxin, Wang, Minghan, Wei, Daimeng, Shang, Hengchao, Wang, Yuxia, Li, Zongyao, Yu, Zhengzhe, Wu, Zhanglin, Chen, Yimeng, Su, Chang, Zhang, Min, Lei, Lizhi, tao, shimin, Yang, Hao
Recently, non-autoregressive (NAT) models predict outputs in parallel, achieving substantial improvements in generation speed compared to autoregressive (AT) models. While performing worse on raw data, most NAT models are trained as student models on distilled data generated by AT teacher models, which is known as sequence-level Knowledge Distillation. An effective training strategy to improve the performance of AT models is Self-Distillation Mixup (SDM) Training, which pre-trains a model on raw data, generates distilled data by the pre-trained model itself and finally re-trains a model on the combination of raw data and distilled data. In this work, we aim to view SDM for NAT models, but find directly adopting SDM to NAT models gains no improvements in terms of translation quality. Through careful analysis, we observe the invalidation is correlated to Modeling Diversity and Confirmation Bias between the AT teacher model and the NAT student models. Based on these findings, we propose an enhanced strategy named SDMRT by adding two stages to classic SDM: one is Pre-Rerank on self-distilled data, the other is Fine-Tune on Filtered teacher-distilled data. Our results outperform baselines by 0.6 to 1.2 BLEU on multiple NAT models. As another bonus, for Iterative Refinement NAT models, our methods can outperform baselines within half iteration number, which means 2X acceleration.
Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models
Yu, Zhengzhe, Guo, Jiaxin, Wang, Minghan, Wei, Daimeng, Shang, Hengchao, Li, Zongyao, Wu, Zhanglin, Wang, Yuxia, Chen, Yimeng, Su, Chang, Zhang, Min, Lei, Lizhi, tao, shimin, Yang, Hao
Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but it reaches the upper bound of translation quality when the number of encoder layers exceeds 18. Worse still, deeper networks consume a lot of memory, making it impossible to train efficiently. In this paper, we present Symbiosis Networks, which include a full network as the Symbiosis Main Network (M-Net) and another shared sub-network with the same structure but less layers as the Symbiotic Sub Network (S-Net). We adopt Symbiosis Networks on Transformer-deep (m-n) architecture and define a particular regularization loss $\mathcal{L}_{\tau}$ between the M-Net and S-Net in NMT. We apply joint-training on the Symbiosis Networks and aim to improve the M-Net performance. Our proposed training strategy improves Transformer-deep (12-6) by 0.61, 0.49 and 0.69 BLEU over the baselines under classic training on WMT'14 EN->DE, DE->EN and EN->FR tasks. Furthermore, our Transformer-deep (12-6) even outperforms classic Transformer-deep (18-6).
Spiral Language Modeling
Cao, Yong, Feng, Yukun, Kuang, Shaohui, Xu, Gu
In almost all text generation applications, word sequences are constructed in a left-to-right (L2R) or right-to-left (R2L) manner, as natural language sentences are written either L2R or R2L. However, we find that the natural language written order is not essential for text generation. In this paper, we propose Spiral Language Modeling (SLM), a general approach that enables one to construct natural language sentences beyond the L2R and R2L order. SLM allows one to form natural language text by starting from an arbitrary token inside the result text and expanding the rest tokens around the selected ones. It makes the decoding order a new optimization objective besides the language model perplexity, which further improves the diversity and quality of the generated text. Furthermore, SLM makes it possible to manipulate the text construction process by selecting a proper starting token. SLM also introduces generation orderings as additional regularization to improve model robustness in low-resource scenarios. Experiments on 8 widely studied Neural Machine Translation (NMT) tasks show that SLM is constantly effective with up to 4.7 BLEU increase comparing to the conventional L2R decoding approach.
Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts
Sharami, Javad Pourmostafa Roshan, Shterionov, Dimitar, Spronck, Pieter
Continuously-growing data volumes lead to larger generic models. Specific use-cases are usually left out, since generic models tend to perform poorly in domain-specific cases. Our work addresses this gap with a method for selecting in-domain data from generic-domain (parallel text) corpora, for the task of machine translation. The proposed method ranks sentences in parallel general-domain data according to their cosine similarity with a monolingual domain-specific data set. We then select the top K sentences with the highest similarity score to train a new machine translation system tuned to the specific in-domain data. Our experimental results show that models trained on this in-domain data outperform models trained on generic or a mixture of generic and domain data. That is, our method selects high-quality domain-specific training instances at low computational cost and data size.
New model improves accuracy of machine learning in COVID-19 diagnosis while preserving privacy
Researchers in the UK and China have developed an artificial intelligence (AI) model that can diagnose COVID-19 as well as a panel of professional radiologists, while preserving the privacy of patient data. The international team, led by the University of Cambridge and the Huazhong University of Science and Technology, used a technique called federated learning to build their model. Using federated learning, an AI model in one hospital or country can be independently trained and verified using a dataset from another hospital or country, without data sharing. The researchers based their model on more than 9,000 CT scans from approximately 3,300 patients in 23 hospitals in the UK and China. Their results, reported in the journal Nature Machine Intelligence, provide a framework where AI techniques can be made more trustworthy and accurate, especially in areas such as medical diagnosis where privacy is vital.
AI 50 2021: America's Most Promising Artificial Intelligence Companies
The Covid-19 pandemic was devastating for many industries, but it only accelerated the use of artificial intelligence across the U.S. economy. Amid the crisis, companies scrambled to create new services for remote workers and students, beef up online shopping and dining options, make customer call centers more efficient and speed development of important new drugs. Even as applications of machine learning and perception platforms become commonplace, a thick layer of hype and fuzzy jargon clings to AI-enabled software.That makes it tough to identify the most compelling companies in the space--especially those finding new ways to use AI that create value by making humans more efficient, not redundant. With this in mind, Forbes has partnered with venture firms Sequoia Capital and Meritech Capital to create our third annual AI 50, a list of private, promising North American companies that are using artificial intelligence in ways that are fundamental to their operations. To be considered, businesses must be privately-held and utilizing machine learning (where systems learn from data to improve on tasks), natural language processing (which enables programs to "understand" written or spoken language) or computer vision (which relates to how machines "see"). AI companies incubated at, largely funded through or acquired by large tech, manufacturing or industrial firms aren't eligible for consideration. Our list was compiled through a submission process open to any AI company in the U.S. and Canada. The application asked companies to provide details on their technology, business model, customers and financials like funding, valuation and revenue history (companies had the option to submit information confidentially, to encourage greater transparency). Forbes received several hundred entries, of which nearly 400 qualified for consideration. From there, our data partners applied an algorithm to identify 100 companies with the highest quantitative scores--and that also made diversity a priority. Next, a panel of expert AI judges evaluated the finalists to find the 50 most compelling companies (they were precluded from judging companies in which they have a vested interest). Among trends this year are what Sequoia Capital's Konstantine Buhler calls AI workbench companies--building of platforms tailored to different enterprises, including Dataiku, DataRobot Domino Data and Databricks.