AITopics

Responses generated by neural conversational models tend to lack informativeness and diversity. We present Adversarial Information Maximization (AIM), an adversarial learning framework that addresses these two related but distinct problems. To foster response diversity, we leverage adversarial training that allows distributional matching of synthetic and real responses. To improve informativeness, our framework explicitly optimizes a variational lower bound on pairwise mutual information between query and response. Empirical results from automatic and human evaluations demonstrate that our methods significantly boost informativeness and diversity.

artificial intelligence, machine learning, natural language, (18 more...)

Country: North America (0.46)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Shah, Harshil, Barber, David

Generative Neural Machine Translation

We introduce Generative Neural Machine Translation (GNMT), a latent variable architecture which is designed to model the semantics of the source and target sentences. We modify an encoder-decoder translation model by adding a latent variable as a language agnostic representation which is encouraged to learn the meaning of the sentence. GNMT achieves competitive BLEU scores on pure translation tasks, and is superior when there are missing words in the source sentence. We augment the model to facilitate multilingual translation and semi-supervised learning without adding parameters. This framework significantly reduces overfitting when there is limited paired data available, and is effective for translating between pairs of languages not seen during training.

machine learning, natural language, source sentence, (20 more...)

Country: Asia (0.15)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

On the Dimensionality of Word Embedding

Yin, Zi, Shen, Yuanyuan

In this paper, we provide a theoretical understanding of word embedding and its dimensionality. Motivated by the unitary-invariance of word embedding, we propose the Pairwise Inner Product (PIP) loss, a novel metric on the dissimilarity between word embeddings. Using techniques from matrix perturbation theory, we reveal a fundamental bias-variance trade-off in dimensionality selection for word embeddings. This bias-variance trade-off sheds light on many empirical observations which were previously unexplained, for example the existence of an optimal dimensionality. Moreover, new insights and discoveries, like when and how word embeddings are robust to over-fitting, are revealed. By optimizing over the bias-variance trade-off of the PIP loss, we can explicitly answer the open question of dimensionality selection for word embedding.

artificial intelligence, machine learning, natural language, (20 more...)

Country: North America (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Sachan, Mrinmaya, Dubey, Kumar Avinava, Mitchell, Tom M., Roth, Dan, Xing, Eric P.

Learning Pipelines with Limited Data and Domain Knowledge: A Study in Parsing Physics Problems

As machine learning becomes more widely used in practice, we need new methods to build complex intelligent systems that integrate learning with existing software, and with domain knowledge encoded as rules. As a case study, we present such a system that learns to parse Newtonian physics problems in textbooks. This system, Nuts&Bolts, learns a pipeline process that incorporates existing code, pre-learned machine learning models, and human engineered rules. It jointly trains the entire pipeline to prevent propagation of errors, using a combination of labelled and unlabelled data. Our approach achieves a good performance on the parsing task, outperforming the simple pipeline and its variants. Finally, we also show how Nuts&Bolts can be used to achieve improvements on a relation extraction task and on the end task of answering Newtonian physics problems.

logic & formal reasoning, machine learning, natural language, (22 more...)

Country:

North America > United States (0.68)
North America > Canada > Quebec (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
(2 more...)

arXiv.org Artificial IntelligenceDec-26-2018

Learning to Refine Source Representations for Neural Machine Translation

Geng, Xinwei, Wang, Longyue, Wang, Xing, Qin, Bing, Liu, Ting, Tu, Zhaopeng

Neural machine translation (NMT) models generally adopt an encoder-decoder architecture for modeling the entire translation process. The encoder summarizes the representation of input sentence from scratch, which is potentially a problem if the sentence is ambiguous. When translating a text, humans often create an initial understanding of the source sentence and then incrementally refine it along the translation on the target side. Starting from this intuition, we propose a novel encoder-refiner-decoder framework, which dynamically refines the source representations based on the generated target-side information at each decoding step. Since the refining operations are time-consuming, we propose a strategy, leveraging the power of reinforcement learning models, to decide when to refine at specific decoding steps. Experimental results on both Chinese-English and English-German translation tasks show that the proposed approach significantly and consistently improves translation performance over the standard encoder-decoder framework. Furthermore, when refining strategy is applied, results still show reasonable improvement over the baseline without much decrease in decoding speed.

deep learning, neural network, representation, (20 more...)

arXiv.org Artificial Intelligence

1812.1023

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Downstream (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Pham, Hai, Liang, Paul Pu, Manzini, Thomas, Morency, Louis-Philippe, Poczos, Barnabas

Found in Translation: Learning Robust Joint Representations by Cyclic Translations Between Modalities

arXiv.org Machine LearningDec-19-2018

Multimodal sentiment analysis is a core research area that studies speaker sentiment expressed from the language, visual, and acoustic modalities. The central challenge in multimodal learning involves inferring joint representations that can process and relate information from these modalities. However, existing work learns joint representations by requiring all modalities as input and as a result, the learned representations may be sensitive to noisy or missing modalities at test time. With the recent success of sequence to sequence (Seq2Seq) models in machine translation, there is an opportunity to explore new ways of learning joint representations that may not require all input modalities at test time. In this paper, we propose a method to learn robust joint representations by translating between modalities. Our method is based on the key insight that translation from a source to a target modality provides a method of learning joint representations using only the source modality as input. We augment modality translations with a cycle consistency loss to ensure that our joint representations retain maximal information from all modalities. Once our translation model is trained with paired multimodal data, we only need data from the source modality at test time for final sentiment prediction. This ensures that our model remains robust from perturbations or missing information in the other modalities. We train our model with a coupled translation-prediction objective and it achieves new state-of-the-art results on multimodal sentiment analysis datasets: CMU-MOSI, ICT-MMMO, and YouTube. Additional experiments show that our model learns increasingly discriminative joint representations with more input modalities while maintaining robustness to missing or perturbed modalities.

artificial intelligence, machine learning, natural language, (23 more...)

arXiv.org Machine Learning

1812.07809

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.88)
(2 more...)

arXiv.org Artificial IntelligenceDec-17-2018

TechKG: A Large-Scale Chinese Technology-Oriented Knowledge Graph

Ren, Feiliang, Hou, Yining, Li, Yan, Pan, Linfeng, Zhang, Yi, Liang, Xiaobo, Liu, Yongkang, Guo, Yu, Zhao, Rongsheng, Ming, Ruicheng, Wu, Huiming

Knowledge graph is a kind of valuable knowledge base which would benefit lots of AI-related applications. Up to now, lots of large-scale knowledge graphs have been built. However, most of them are non-Chinese and designed for general purpose. In this work, we introduce TechKG, a large scale Chinese knowledge graph that is technology-oriented. It is built automatically from massive technical papers that are published in Chinese academic journals of different research domains. Some carefully designed heuristic rules are used to extract high quality entities and relations. Totally, it comprises of over 260 million triplets that are built upon more than 52 million entities which come from 38 research domains. Our preliminary ex-periments indicate that TechKG has high adaptability and can be used as a dataset for many diverse AI-related applications. We released TechKG at: http://www.techkg.cn.

artificial intelligence, natural language, relation, (22 more...)

arXiv.org Artificial Intelligence

1812.06722

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Srivastava, Siddhant, Shukla, Anupam, Tiwari, Ritu

Machine Translation : From Statistical to modern Deep-learning practices

arXiv.org Artificial IntelligenceDec-11-2018

Machine translation (MT) is an area of study in Natural Language processing which deals with the automatic translation of human language, from one language to another by the computer. Having a rich research history spanning nearly three decades, Machine translation is one of the most sought after area of research in the linguistics and computational community. In this paper, we investigate the models based on deep learning that have achieved substantial progress in recent years and becoming the prominent method in MT. We shall discuss the two main deep-learning based Machine Translation methods, one at component or domain level which leverages deep learning models to enhance the efficacy of Statistical Machine Translation (SMT) and end-to-end deep learning models in MT which uses neural networks to find correspondence between the source and target languages using the encoder-decoder architecture. We conclude this paper by providing a time line of the major research problems solved by the researchers and also provide a comprehensive overview of present areas of research in Neural Machine Translation.

machine learning, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

1812.04238

Genre: Overview (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningDec-11-2018

On the Dimensionality of Word Embedding

Yin, Zi, Shen, Yuanyuan

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

1812.04224

Country: North America (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Kumar, Sachin, Tsvetkov, Yulia

Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs

arXiv.org Machine LearningDec-10-2018

The Softmax function is used in the final layer of nearly all existing sequence-to-sequence models for language generation. However, it is usually the slowest layer to compute which limits the vocabulary size to a subset of most frequent types; and it has a large memory footprint. We propose a general technique for replacing the softmax layer with a continuous embedding layer. Our primary innovations are a novel probabilistic loss, and a training and inference procedure in which we generate a probability distribution over pre-trained word embeddings, instead of a multinomial distribution over the vocabulary obtained via softmax. We evaluate this new class of sequence-to-sequence models with continuous outputs on the task of neural machine translation. We show that our models obtain upto 2.5x speed-up in training time while performing on par with the state-of-the-art models in terms of translation quality. These models are capable of handling very large vocabularies without compromising on translation quality. They also produce more meaningful errors than in the softmax-based models, as these errors typically lie in a subspace of the vector space of the reference translations.

machine learning, natural language, translation, (20 more...)

arXiv.org Machine Learning

1812.04616

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)