Goto

Collaborating Authors

 Machine Translation


Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models

arXiv.org Artificial Intelligence

Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but it reaches the upper bound of translation quality when the number of encoder layers exceeds 18. Worse still, deeper networks consume a lot of memory, making it impossible to train efficiently. In this paper, we present Symbiosis Networks, which include a full network as the Symbiosis Main Network (M-Net) and another shared sub-network with the same structure but less layers as the Symbiotic Sub Network (S-Net). We adopt Symbiosis Networks on Transformer-deep (m-n) architecture and define a particular regularization loss $\mathcal{L}_{\tau}$ between the M-Net and S-Net in NMT. We apply joint-training on the Symbiosis Networks and aim to improve the M-Net performance. Our proposed training strategy improves Transformer-deep (12-6) by 0.61, 0.49 and 0.69 BLEU over the baselines under classic training on WMT'14 EN->DE, DE->EN and EN->FR tasks. Furthermore, our Transformer-deep (12-6) even outperforms classic Transformer-deep (18-6).


Spiral Language Modeling

arXiv.org Artificial Intelligence

In almost all text generation applications, word sequences are constructed in a left-to-right (L2R) or right-to-left (R2L) manner, as natural language sentences are written either L2R or R2L. However, we find that the natural language written order is not essential for text generation. In this paper, we propose Spiral Language Modeling (SLM), a general approach that enables one to construct natural language sentences beyond the L2R and R2L order. SLM allows one to form natural language text by starting from an arbitrary token inside the result text and expanding the rest tokens around the selected ones. It makes the decoding order a new optimization objective besides the language model perplexity, which further improves the diversity and quality of the generated text. Furthermore, SLM makes it possible to manipulate the text construction process by selecting a proper starting token. SLM also introduces generation orderings as additional regularization to improve model robustness in low-resource scenarios. Experiments on 8 widely studied Neural Machine Translation (NMT) tasks show that SLM is constantly effective with up to 4.7 BLEU increase comparing to the conventional L2R decoding approach.


Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts

arXiv.org Artificial Intelligence

Continuously-growing data volumes lead to larger generic models. Specific use-cases are usually left out, since generic models tend to perform poorly in domain-specific cases. Our work addresses this gap with a method for selecting in-domain data from generic-domain (parallel text) corpora, for the task of machine translation. The proposed method ranks sentences in parallel general-domain data according to their cosine similarity with a monolingual domain-specific data set. We then select the top K sentences with the highest similarity score to train a new machine translation system tuned to the specific in-domain data. Our experimental results show that models trained on this in-domain data outperform models trained on generic or a mixture of generic and domain data. That is, our method selects high-quality domain-specific training instances at low computational cost and data size.


New model improves accuracy of machine learning in COVID-19 diagnosis while preserving privacy

#artificialintelligence

Researchers in the UK and China have developed an artificial intelligence (AI) model that can diagnose COVID-19 as well as a panel of professional radiologists, while preserving the privacy of patient data. The international team, led by the University of Cambridge and the Huazhong University of Science and Technology, used a technique called federated learning to build their model. Using federated learning, an AI model in one hospital or country can be independently trained and verified using a dataset from another hospital or country, without data sharing. The researchers based their model on more than 9,000 CT scans from approximately 3,300 patients in 23 hospitals in the UK and China. Their results, reported in the journal Nature Machine Intelligence, provide a framework where AI techniques can be made more trustworthy and accurate, especially in areas such as medical diagnosis where privacy is vital.


AI 50 2021: America's Most Promising Artificial Intelligence Companies

#artificialintelligence

The Covid-19 pandemic was devastating for many industries, but it only accelerated the use of artificial intelligence across the U.S. economy. Amid the crisis, companies scrambled to create new services for remote workers and students, beef up online shopping and dining options, make customer call centers more efficient and speed development of important new drugs. Even as applications of machine learning and perception platforms become commonplace, a thick layer of hype and fuzzy jargon clings to AI-enabled software.That makes it tough to identify the most compelling companies in the space--especially those finding new ways to use AI that create value by making humans more efficient, not redundant. With this in mind, Forbes has partnered with venture firms Sequoia Capital and Meritech Capital to create our third annual AI 50, a list of private, promising North American companies that are using artificial intelligence in ways that are fundamental to their operations. To be considered, businesses must be privately-held and utilizing machine learning (where systems learn from data to improve on tasks), natural language processing (which enables programs to "understand" written or spoken language) or computer vision (which relates to how machines "see"). AI companies incubated at, largely funded through or acquired by large tech, manufacturing or industrial firms aren't eligible for consideration. Our list was compiled through a submission process open to any AI company in the U.S. and Canada. The application asked companies to provide details on their technology, business model, customers and financials like funding, valuation and revenue history (companies had the option to submit information confidentially, to encourage greater transparency). Forbes received several hundred entries, of which nearly 400 qualified for consideration. From there, our data partners applied an algorithm to identify 100 companies with the highest quantitative scores--and that also made diversity a priority. Next, a panel of expert AI judges evaluated the finalists to find the 50 most compelling companies (they were precluded from judging companies in which they have a vested interest). Among trends this year are what Sequoia Capital's Konstantine Buhler calls AI workbench companies--building of platforms tailored to different enterprises, including Dataiku, DataRobot Domino Data and Databricks.


Learning and Analyzing Generation Order for Undirected Sequence Models

arXiv.org Artificial Intelligence

Undirected neural sequence models have achieved performance competitive with the state-of-the-art directed sequence models that generate monotonically from left to right in machine translation tasks. In this work, we train a policy that learns the generation order for a pre-trained, undirected translation model via reinforcement learning. We show that the translations decoded by our learned orders achieve higher BLEU scores than the outputs decoded from left to right or decoded by the learned order from Mansimov et al. (2019) on the WMT'14 German-English translation task. On examples with a maximum source and target length of 30 from De-En, WMT'16 English-Romanian, and WMT'21 English-Chinese translation tasks, our learned order outperforms all heuristic generation orders on four out of six tasks. We next carefully analyze the learned order patterns via qualitative and quantitative analysis. We show that our policy generally follows an outer-to-inner order, predicting the left-most and right-most positions first, and then moving toward the middle while skipping less important words at the beginning. Furthermore, the policy usually predicts positions for a single syntactic constituent structure in consecutive steps. We believe our findings could provide more insights on the mechanism of undirected generation models and encourage further research in this direction. Our code is publicly available at https://github.com/jiangycTarheel/undirected-generation


Adapting Document-Grounded Dialog Systems to Spoken Conversations using Data Augmentation and a Noisy Channel Model

arXiv.org Artificial Intelligence

This paper summarizes our submission to Task 2 of the second track of the 10th Dialog System Technology Challenge (DSTC10) "Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations". Similar to the previous year's iteration, the task consists of three subtasks: detecting whether a turn is knowledge seeking, selecting the relevant knowledge document and finally generating a grounded response. This year, the focus lies on adapting the system to noisy ASR transcripts. We explore different approaches to make the models more robust to this type of input and to adapt the generated responses to the style of spoken conversations. For the latter, we get the best results with a noisy channel model that additionally reduces the number of short and generic responses. Our best system achieved the 1st rank in the automatic and the 3rd rank in the human evaluation of the challenge.


Chinese TV introducing AI sign language presenter at the next Olympics

#artificialintelligence

Chinese TV will introduce the first AI sign language presenter in time for the 2022 Winter Olympics in Beijing. China Central Television (CCTV) and Baidu AI Cloud said the launch of the AI sign language presenter represents a huge leap forwards in'overcoming the barrier of sound with technology'. Nearly 28 million people in China are hearing impaired and about 430 million around the world also suffer from hearing loss. The launch of the AI presenter will allow the state broadcaster to include sign language services for viewers around the clock, and will start by giving updates of the Winter Olympics in Beijing early next year. The presenter achieves high-level sign language expression thanks to Baidu's natural action engine and their sign language translation engine.


Applying SoftTriple Loss for Supervised Language Model Fine Tuning

arXiv.org Artificial Intelligence

Natural language processing (NLP) is a rapidly growing area of machine learning with applications wherever a computer needs to operate on a text that involves capturing its semantics. It may include text classification, translation, text summarization, question answering, dialogues. All these tasks are upstream and depend on the quality of the text representation (White et al., 2015). Many models can produce such text representations, from Bag-Of-Word or Word2Vec word embedding to the state-of-the-art language representation model BERT with variations in most NLP tasks. The best performance on text classification tasks is obtained when the model is first trained on a general knowledge corpus to capture semantic relationships between words and then fine-tuned with an additional dense layer on a domain corpus with cross-entropy loss (Radford et al., 2019). We introduce a new loss function TripleEntropy to improve classification performance for fine-tuning general knowledge pre-trained language models based on cross-entropy loss and SoftTriple loss (Devlin et al., 2018; Qian et al., 2019). Triplet Loss transforms the embedding space so that vector representations from the same class can form separable subspaces, stabilizing, and generalizing the language model fine-tuning process. TripleEntropy can improve the fine-tuning process of the RoBERTa based models so the performance on downstream task increases by about (0.02% - 2.29%).


SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

arXiv.org Artificial Intelligence

Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-modality attention. However, these approaches do not utilize the rich structure of the scene and the interactions between objects which are essential in answering complex commonsense questions. We propose a Scene Graph Enhanced Image-Text Learning (SGEITL) framework to incorporate visual scene graphs in commonsense reasoning. To exploit the scene graph structure, at the model structure level, we propose a multihop graph transformer for regularizing attention interaction among hops. As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in the visual scene graph. Moreover, we introduce a method to train and generate domain-relevant visual scene graphs using textual annotations in a weakly-supervised manner. Extensive experiments on VCR and other tasks show a significant performance boost compared with the state-of-the-art methods and prove the efficacy of each proposed component.