AITopics | Dyer, Chris

Collaborating Authors

Dyer, Chris

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning and Evaluating General Linguistic Intelligence

Yogatama, Dani, d'Autume, Cyprien de Masson, Connor, Jerome, Kocisky, Tomas, Chrzanowski, Mike, Kong, Lingpeng, Lazaridou, Angeliki, Ling, Wang, Yu, Lei, Dyer, Chris, Blunsom, Phil

arXiv.org Machine LearningJan-31-2019

Advances in deep learning techniques (e.g., attention mechanisms, memory modules, and architecture search) have considerably improved natural language processing (NLP) models on many important tasks. For example, machine performance on both Chinese-English machine translation and document question answering on the Stanford question answering dataset (SQuAD; Rajpurkar et al., 2016) has been claimed to have surpassed human levels (Hassan et al., 2018; Devlin et al., 2018). While the tasks that initiated learning-based NLP models were motivated by external demands and are important applications in their own right (e.g., machine translation, question answering, automatic speech recognition, text to speech, etc.), there is a marked and troubling tendency for recent datasets to be set up to be easy to solve with little in the way of generalization or abstraction; for instance, ever larger datasets created by crowd-sourcing processes that may not well approximate the natural distributions they are intended to span, although there are some notable counterexamples (Kwiatkowski et al., 2019). When there exist multiple datasets that are representative of the exact same task from different domains (e.g., various question answering datasets), we rarely evaluate on all of them. This state of affairs promotes development of models that only work well for a specific purpose, overestimates our success at having solved the general task, fails to reward sample efficient generalization that requires the ability to discover and make use of rich linguistic structures, and ultimately limits progress.

dataset, deep learning, neural network, (15 more...)

arXiv.org Machine Learning

1901.11373

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neural Arithmetic Logic Units

Trask, Andrew, Hill, Felix, Reed, Scott E., Rae, Jack, Dyer, Chris, Blunsom, Phil

Neural Information Processing SystemsDec-31-2018

Neural networks can learn to represent and manipulate numerical information, but they seldom generalize well outside of the range of numerical values encountered during training. To encourage more systematic numerical extrapolation, we propose an architecture that represents numerical quantities as linear activations which are manipulated using primitive arithmetic operators, controlled by learned gates. We call this module a neural arithmetic logic unit (NALU), by analogy to the arithmetic logic unit in traditional processors. Experiments show that NALU-enhanced neural networks can learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images. In contrast to conventional architectures, we obtain substantially better generalization both inside and outside of the range of numerical values encountered during training, often extrapolating orders of magnitude beyond trained numerical ranges.

artificial intelligence, machine learning, representation, (19 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Canada (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unsupervised Text Style Transfer using Language Models as Discriminators

Yang, Zichao, Hu, Zhiting, Dyer, Chris, Xing, Eric P., Berg-Kirkpatrick, Taylor

Neural Information Processing SystemsDec-31-2018

Binary classifiers are employed as discriminators in GAN-based unsupervised style transfer models to ensure that transferred sentences are similar to sentences in the target domain. One difficulty with the binary discriminator is that error signal is sometimes insufficient to train the model to produce rich-structured language. In this paper, we propose a technique of using a target domain language model as the discriminator to provide richer, token-level feedback during the learning process. Because our language model scores sentences directly using a product of locally normalized probabilities, it offers more stable and more useful training signal to the generator. We train the generator to minimize the negative log likelihood (NLL) of generated sentences evaluated by a language model. By using continuous approximation of the discrete samples, our model can be trained using back-propagation in an end-to-end way. Moreover, we find empirically with a language model as a structured discriminator, it is possible to eliminate the adversarial training steps using negative samples, thus making training more stable. We compare our model with previous work using convolutional neural networks (CNNs) as discriminators and show our model outperforms them significantly in three tasks including word substitution decipherment, sentiment modification and related language translation.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Neural Arithmetic Logic Units

Trask, Andrew, Hill, Felix, Reed, Scott E., Rae, Jack, Dyer, Chris, Blunsom, Phil

Neural Information Processing SystemsDec-31-2018

deep learning, extrapolation, neural network, (19 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Canada (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unsupervised Text Style Transfer using Language Models as Discriminators

Yang, Zichao, Hu, Zhiting, Dyer, Chris, Xing, Eric P., Berg-Kirkpatrick, Taylor

Neural Information Processing SystemsDec-31-2018

Binary classifiers are often employed as discriminators in GAN-based unsupervised style transfer systems to ensure that transferred sentences are similar to sentences in the target domain. One difficulty with this approach is that the error signal provided by the discriminator can be unstable and is sometimes insufficient to train the generator to produce fluent language. In this paper, we propose a new technique that uses a target domain language model as the discriminator, providing richer and more stable token-level feedback during the learning process. We train the generator to minimize the negative log likelihood (NLL) of generated sentences, evaluated by the language model. By using a continuous approximation of discrete sampling under the generator, our model can be trained using back-propagation in an end-to-end fashion. Moreover, our empirical results show that when using a language model as a structured discriminator, it is possible to forgo adversarial steps during training, making the process more stable. We compare our model with previous work that uses convolutional networks (CNNs) as discriminators, as well as a broad set of other approaches. Results show that the proposed method achieves improved performance on three tasks: word substitution decipherment, sentiment modification, and related language translation.

deep learning, language model, neural network, (19 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sentence Encoding with Tree-constrained Relation Networks

Yu, Lei, d'Autume, Cyprien de Masson, Dyer, Chris, Blunsom, Phil, Kong, Lingpeng, Ling, Wang

arXiv.org Artificial IntelligenceNov-26-2018

The meaning of a sentence is a function of the relations that hold between its words. We instantiate this relational view of semantics in a series of neural models based on variants of relation networks (RNs) which represent a set of objects (for us, words forming a sentence) in terms of representations of pairs of objects. We propose two extensions to the basic RN model for natural language. First, building on the intuition that not all word pairs are equally informative about the meaning of a sentence, we use constraints based on both supervised and unsupervised dependency syntax to control which relations influence the representation. Second, since higher-order relations are poorly captured by a sum of pairwise relations, we use a recurrent extension of RNs to propagate information so as to form representations of higher order relations. Experiments on sentence classification, sentence pair classification, and machine translation reveal that, while basic RNs are only modestly effective for sentence representation, recurrent RNs with latent syntax are a reliably powerful representational device.

deep learning, neural network, representation, (21 more...)

arXiv.org Artificial Intelligence

1811.10475

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(2 more...)

Add feedback

Dynamic Integration of Background Knowledge in Neural NLU Systems

Weissenborn, Dirk, Kočiský, Tomáš, Dyer, Chris

arXiv.org Artificial IntelligenceAug-21-2018

Common-sense and background knowledge is required to understand natural language, but in most neural natural language understanding (NLU) systems, this knowledge must be acquired from training corpora during learning, and then it is static at test time. We introduce a new architecture for the dynamic integration of explicit background knowledge in NLU models. A general-purpose reading module reads background knowledge in the form of free-text statements (together with task-specific text inputs) and yields refined word representations to a task-specific NLU architecture that reprocesses the task inputs with these representations. Experiments on document question answering (DQA) and recognizing textual entailment (RTE) demonstrate the effectiveness and flexibility of the approach. Analysis shows that our model learns to exploit knowledge in a semantically appropriate way.

deep learning, knowledge, neural network, (22 more...)

arXiv.org Artificial Intelligence

1706.02596

Country: Europe (0.94)

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs

Swayamdipta, Swabha, Ballesteros, Miguel, Dyer, Chris, Smith, Noah A.

arXiv.org Artificial IntelligenceJul-4-2018

We present a transition-based parser that jointly produces syntactic and semantic dependencies. It learns a representation of the entire algorithm state, using stack long short-term memories. Our greedy inference algorithm has linear time, including feature extraction. On the CoNLL 2008--9 English shared tasks, we obtain the best published parsing performance among models that jointly learn syntax and semantics.

deep learning, dependency, neural network, (21 more...)

arXiv.org Artificial Intelligence

1606.08954

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Relational inductive biases, deep learning, and graph networks

Battaglia, Peter W., Hamrick, Jessica B., Bapst, Victor, Sanchez-Gonzalez, Alvaro, Zambaldi, Vinicius, Malinowski, Mateusz, Tacchetti, Andrea, Raposo, David, Santoro, Adam, Faulkner, Ryan, Gulcehre, Caglar, Song, Francis, Ballard, Andrew, Gilmer, Justin, Dahl, George, Vaswani, Ashish, Allen, Kelsey, Nash, Charles, Langston, Victoria, Dyer, Chris, Heess, Nicolas, Wierstra, Daan, Kohli, Pushmeet, Botvinick, Matt, Vinyals, Oriol, Li, Yujia, Pascanu, Razvan

arXiv.org Machine LearningJun-4-2018

Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, remain out of reach for current approaches. In particular, generalizing beyond one's experiences--a hallmark of human intelligence from infancy--remains a formidable challenge for modern AI. The following is part position paper, part review, and part unification. We argue that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Just as biology uses nature and nurture cooperatively, we reject the false choice between "hand-engineering" and "end-to-end" learning, and instead advocate for an approach which benefits from their complementary strengths. We explore how using relational inductive biases within deep learning architectures can facilitate learning about entities, relations, and rules for composing them. We present a new building block for the AI toolkit with a strong relational inductive bias--the graph network--which generalizes and extends various approaches for neural networks that operate on graphs, and provides a straightforward interface for manipulating structured knowledge and producing structured behaviors. We discuss how graph networks can support relational reasoning and combinatorial generalization, laying the foundation for more sophisticated, interpretable, and flexible patterns of reasoning.

deep learning, neural network, representation, (15 more...)

arXiv.org Machine Learning

1806.01261

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Pushing the bounds of dropout

Melis, Gábor, Blundell, Charles, Kočiský, Tomáš, Hermann, Karl Moritz, Dyer, Chris, Blunsom, Phil

arXiv.org Machine LearningMay-23-2018

We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that compute a power mean over the sampled dropout masks, and their less stochastic subvariants with tighter and higher lower bounds than the fully stochastic dropout objective. We argue that since the deterministic subvariant's bound is equal to its objective, and the highest amongst these models, the predominant view of it as a good approximation to MC averaging is misleading. Rather, deterministic dropout is the best available approximation to the true objective.

deep learning, dropout, neural network, (18 more...)

arXiv.org Machine Learning

1805.09208

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback