AITopics

1910.00643

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

He, Junxian, Gu, Jiatao, Shen, Jiajun, Ranzato, Marc'Aurelio

Revisiting Self-Training for Neural Sequence Generation

arXiv.org Machine LearningSep-30-2019

Self-training is one of the earliest and simplest semi-supervised methods. The key idea is to augment the original labeled dataset with unlabeled data paired with the model's prediction (i.e. pseudo-parallel data). While self-training has been extensively studied on classification problems, in complex sequence generation tasks (e.g. machine translation) it is still unclear how self-training works due to the compositionality of the target space. In this work, we first empirically show that self-training is able to decently improve the supervised baseline on neural sequence generation tasks. Through careful examination of the performance gains, we find that the perturbation on the hidden states (i.e. dropout) is critical for self-training to benefit from the pseudo-parallel data, which acts as a regularizer and forces the model to yield close predictions for similar unlabeled inputs. Such effect helps the model correct some incorrect predictions on unlabeled data. To further encourage this mechanism, we propose to inject noise to the input space, resulting in a "noisy" version of self-training. Empirical study on standard machine translation and text summarization benchmarks shows that noisy self-training is able to effectively utilize unlabeled data and improve the performance of the supervised baseline by a large margin.

dataset, iteration, proceedings, (16 more...)

1909.13788

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Yamshchikov, Ivan P., Shibaev, Viacheslav, Nagaev, Aleksander, Jost, Jürgen, Tikhonov, Alexey

Decomposing Textual Information For Style Transfer

arXiv.org Artificial IntelligenceSep-26-2019

However, natural language generation with encoder-decoder based methods but depends is still a challenging task due to a number on the used architecture. Moreover, architectures of factors that include the absence of local information with higher quality of information decomposition continuity and non-smooth disentangled perform better in terms of the style transfer task.

artificial intelligence, machine learning, natural language, (20 more...)

doi: 10.18653/v1/D19-5613

1909.12928

Country:

Europe > Russia (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
Asia > Russia (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningSep-26-2019

Attention Forcing for Sequence-to-sequence Model Training

Dou, Qingyun, Lu, Yiting, Efiong, Joshua, Gales, Mark J. F.

Auto-regressive sequence-to-sequence models with attention mechanism have achieved state-of-the-art performance in many tasks such as machine translation and speech synthesis. These models can be difficult to train. The standard approach, teacher forcing, guides a model with reference output history during training. The problem is that the model is unlikely to recover from its mistakes during inference, where the reference output is replaced by generated output. Several approaches deal with this problem, largely by guiding the model with generated output history. To make training stable, these approaches often require a heuristic schedule or an auxiliary classifier. This paper introduces attention forcing, which guides the model with generated output history and reference attention. This approach can train the model to recover from its mistakes, in a stable fashion, without the need for a schedule or a classifier. In addition, it allows the model to generate output sequences aligned with the references, which can be important for cascaded systems like many speech synthesis systems. Experiments on speech synthesis show that attention forcing yields significant performance gain. Experiments on machine translation show that for tasks where various re-orderings of the output are valid, guiding the model with generated output history is challenging, while guiding the model with reference attention is beneficial.

arxiv preprint arxiv, output history, sequence, (11 more...)

1909.12289

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Fan, Angela, Grave, Edouard, Joulin, Armand

Reducing Transformer Depth on Demand with Structured Dropout

arXiv.org Machine LearningSep-25-2019

Overparameterized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering. These models contain hundreds of millions of parameters, necessitating a large amount of computation and making them prone to overfitting. In this work, we explore LayerDrop, a form of structured dropout, which has a regularization effect during training and allows for efficient pruning at inference time. In particular, we show that it is possible to select sub-networks of any depth from one large network without having to finetune them and with limited impact on performance. We demonstrate the effectiveness of our approach by improving the state of the art on machine translation, language modeling, summarization, question answering, and language understanding benchmarks. Moreover, we show that our approach leads to small BERT-like models of higher quality compared to training from scratch or using distillation.

arxiv preprint arxiv, layerdrop, pruning, (11 more...)

1909.11556

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.91)

Han, Nway Nway, Thida, Aye

Annotated Guidelines and Building Reference Corpus for Myanmar-English Word Alignment

arXiv.org Artificial IntelligenceSep-25-2019

Reference corpus for word alignment is an important resource for developing and evaluating word alignment methods. For Myanmar - English language pairs, there is no reference corpus to evaluate the word alignment tasks. Therefore, we created the guidelines f or Myanmar - English word alignment annotation between two languages over contrastive learning and built the Myanmar - English reference corpus consisting of verified alignments from Myanmar ALT of the Asian Language Treebank (ALT). This reference corpus conta ins confident labels sure (S) and possible (P) for word alignments which are used to test for the purpose of evaluation of the word alignments tasks. We discuss the most linking ambiguities to define consistent and systematic instructions to align manual w ords. We evaluated the results of annotators agreement using our reference corpus in terms of alignment error rate (AER) in word alignment tasks and discuss the words relationships in terms of BLEU scores. A bilingual corpus aligned at the level of sentences or words is a precious resource for developing machine translation systems. Word alignment is a fundamental step in extracting translation information from bilingual corpus and determines which words and phrases are translations of each other in the original and translated sentence. In most translation systems, translational correspondences are rather complex; for a language pair such as Myanmar and Eng lish that belong to the different word order languages.

alignment, myanmar, reference corpus, (14 more...)

1909.11288

Country:

Asia > Myanmar > Mandalay Region > Mandalay (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
(7 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

arXiv.org Artificial IntelligenceSep-25-2019

Scale MLPerf-0.6 models on Google TPU-v3 Pods

Kumar, Sameer, Bitorff, Victor, Chen, Dehao, Chou, Chiachen, Hechtman, Blake, Lee, HyoukJoong, Kumar, Naveen, Mattson, Peter, Wang, Shibo, Wang, Tao, Xu, Yuanzhong, Zhou, Zongwei

The recent submission of Google TPU-v3 Pods to the industry wide MLPerf v0.6 training benchmark demonstrates the scalability of a suite of industry relevant ML models. MLPerf defines a suite of models, datasets and rules to follow when benchmarking to ensure results are comparable across hardware, frameworks and companies. Using this suite of models, we discuss the optimizations and techniques including choice of optimizer, spatial partitioning and weight update sharding necessary to scale to 1024 TPU chips. Furthermore, we identify properties of models that make scaling them challenging, such as limited data parallelism and unscaled weights. These optimizations contribute to record performance in transformer, Resnet-50 and SSD in the Google MLPerf-0.6 submission.

batch size, benchmark, parallelism, (15 more...)

1909.09756

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Artetxe, Mikel, Schwenk, Holger

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

arXiv.org Artificial IntelligenceSep-25-2019

An increasingly popular approach to alleviate this issue is to first learn general language representations on unlabeled data, which are then integrated in task-specific downstream systems. This approach was first popularized by word embeddings (Mikolov et al., 2013b; This work was performed during an internship at Facebook AI Research. Pennington et al., 2014), but has recently been superseded by sentence-level representations (Peters et al., 2018; Devlin et al., 2019). Nevertheless, all these works learn a separate model for each language and are thus unable to leverage information across different languages, greatly limiting their potential performance for low-resource languages. In this work, we are interested in universal language agnostic sentence embeddings, that is, vector representations of sentences that are general with respect to two dimensions: the input language and the NLP task.

computational linguistic, proceedings, representation, (13 more...)

1812.10464

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
Oceania > Australia > Victoria > Melbourne (0.05)
(14 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

#artificialintelligenceSep-23-2019, 08:38:01 GMT

Interview with Nathan Bruzat: Data Scientist interview

During my studies in engineering school in computer science, I had the opportunity to launch two entrepreneurship projects related to Machine Learning. The first is a project on the translation of sign languages into written languages through bracelets and an automated translation system. This project continues under the name of SignBand. Following my departure, I started to train in Machine Learning. I have taken several online courses and participated in several competitions on Kaggle and hackathons.

computer science, data scientist interview, machine learning, (5 more...)

#artificialintelligence

Industry: Education > Educational Setting > Online (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.58)

Liu, Xinyi, Wangperawong, Artit

Transfer Learning Robustness in Multi-Class Categorization by Fine-Tuning Pre-Trained Contextualized Language Models

arXiv.org Machine LearningSep-21-2019

This study compares the effectiveness and robustness of multi-class categorization of Amazon product data using transfer learning on pre-trained contextualized language models. Specifically, we fine-tuned BERT and XLNet, two bidirectional models that have achieved state-of-the-art performance on many natural language tasks and benchmarks, including text classification. While existing classification studies and benchmarks focus on binary targets, with the exception of ordinal ranking tasks, here we examine the robustness of such models as the number of classes grows from 1 to 20. Our experiments demonstrate an approximately linear decrease in performance metrics (i.e., precision, recall, $F_1$ score, and accuracy) with the number of class labels. BERT consistently outperforms XLNet using identical hyperparameters on the entire range of class label quantities for categorizing products based on their textual descriptions. BERT is also more affordable than XLNet in terms of the computational cost (i.e., time and memory) required for training. In all cases studied, the performance degradation rates were estimated to be 1% per additional class label.

class label, classification, xlnet, (11 more...)

1909.03564

Country:

North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.61)