AITopics

2403.0553

Country:

Europe (1.00)
Asia > Middle East (0.67)
North America > United States > New York (0.27)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (0.92)
Personal (0.92)
Research Report > Experimental Study (0.67)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-15-2022

Continuous diffusion for categorical data

Dieleman, Sander, Sartran, Laurent, Roshannai, Arman, Savinov, Nikolay, Ganin, Yaroslav, Richemond, Pierre H., Doucet, Arnaud, Strudel, Robin, Dyer, Chris, Durkan, Conor, Hawthorne, Curtis, Leblond, Rémi, Grathwohl, Will, Adler, Jonas

Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous nature of diffusion models conveys many benefits, and in this work we endeavour to preserve it. We propose CDCD, a framework for modelling categorical data with diffusion models that are continuous both in time and input space. We demonstrate its efficacy on several language modelling tasks.

artificial intelligence, machine learning, natural language, (17 more...)

2211.15089

Country:

North America > United States (0.67)
Europe (0.45)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-6-2022

Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Sartran, Laurent, Barrett, Samuel, Kuncoro, Adhiguna, Stanojević, Miloš, Blunsom, Phil, Dyer, Chris

We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentence-level language modeling perplexity, as well as on multiple syntax-sensitive language modeling evaluation metrics. Additionally, we find that the recursive syntactic composition bottleneck which represents each sentence as a single vector harms perplexity on document-level language modeling, providing evidence that a different kind of memory mechanism -- one that is independent of composed syntactic representations -- plays an important role in current successful models of long text.

artificial intelligence, machine learning, natural language, (17 more...)

doi: 10.1162/tacl_a_00526

2203.00633

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceJul-18-2022

MAD for Robust Reinforcement Learning in Machine Translation

Donato, Domenic, Yu, Lei, Ling, Wang, Dyer, Chris

We introduce a new distributed policy gradient algorithm and show that it outperforms existing reward-aware training procedures such as REINFORCE, minimum risk training (MRT) and proximal policy optimization (PPO) in terms of training stability and generalization performance when optimizing machine translation models. Our algorithm, which we call MAD (on account of using the mean absolute deviation in the importance weighting calculation), has distributed data generators sampling multiple candidates per source sentence on worker nodes, while a central learner updates the policy. MAD depends crucially on two variance reduction strategies: (1) a conditional reward normalization method that ensures each source sentence has both positive and negative reward translation examples and (2) a new robust importance weighting scheme that acts as a conditional entropy regularizer. Experiments on a variety of translation tasks show that policies learned using the MAD algorithm perform very well when using both greedy decoding and beam search, and that the learned policies are sensitive to the specific reward used during training.

machine learning, natural language, reinforcement learning, (17 more...)

2207.08583

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.87)

arXiv.org Artificial IntelligenceDec-8-2021

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Rae, Jack W., Borgeaud, Sebastian, Cai, Trevor, Millican, Katie, Hoffmann, Jordan, Song, Francis, Aslanides, John, Henderson, Sarah, Ring, Roman, Young, Susannah, Rutherford, Eliza, Hennigan, Tom, Menick, Jacob, Cassirer, Albin, Powell, Richard, Driessche, George van den, Hendricks, Lisa Anne, Rauh, Maribeth, Huang, Po-Sen, Glaese, Amelia, Welbl, Johannes, Dathathri, Sumanth, Huang, Saffron, Uesato, Jonathan, Mellor, John, Higgins, Irina, Creswell, Antonia, McAleese, Nat, Wu, Amy, Elsen, Erich, Jayakumar, Siddhant, Buchatskaya, Elena, Budden, David, Sutherland, Esme, Simonyan, Karen, Paganini, Michela, Sifre, Laurent, Martens, Lena, Li, Xiang Lorraine, Kuncoro, Adhiguna, Nematzadeh, Aida, Gribovskaya, Elena, Donato, Domenic, Lazaridou, Angeliki, Mensch, Arthur, Lespiau, Jean-Baptiste, Tsimpoukelli, Maria, Grigorev, Nikolai, Fritz, Doug, Sottiaux, Thibault, Pajarskas, Mantas, Pohlen, Toby, Gong, Zhitao, Toyama, Daniel, d'Autume, Cyprien de Masson, Li, Yujia, Terzi, Tayfun, Mikulik, Vladimir, Babuschkin, Igor, Clark, Aidan, Casas, Diego de Las, Guy, Aurelia, Jones, Chris, Bradbury, James, Johnson, Matthew, Hechtman, Blake, Weidinger, Laura, Gabriel, Iason, Isaac, William, Lockhart, Ed, Osindero, Simon, Rimell, Laura, Dyer, Chris, Vinyals, Oriol, Ayoub, Kareem, Stanway, Jeff, Bennett, Lorrayne, Hassabis, Demis, Kavukcuoglu, Koray, Irving, Geoffrey

Natural language communication is core to intelligence, as it allows ideas to be efficiently shared between humans or artificially intelligent systems. The generality of language allows us to express many intelligence tasks as taking in natural language input and producing natural language output. Autoregressive language modelling -- predicting the future of a text sequence from its past -- provides a simple yet powerful objective that admits formulation of numerous cognitive tasks. At the same time, it opens the door to plentiful training data: the internet, books, articles, code, and other writing. However this training objective is only an approximation to any specific goal or application, since we predict everything in the sequence rather than only the aspects we care about. Yet if we treat the resulting models with appropriate caution, we believe they will be a powerful tool to capture some of the richness of human intelligence. Using language models as an ingredient towards intelligence contrasts with their original application: transferring text over a limited-bandwidth communication channel. Shannon's Mathematical Theory of Communication (Shannon, 1948) linked the statistical modelling of natural language with compression, showing that measuring the cross entropy of a language model is equivalent to measuring its compression rate.

consumer health, machine learning, natural language, (26 more...)

2112.11446

Country:

Europe (1.00)
Asia > Middle East > Syria (0.14)
Asia > Middle East > Iran (0.14)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Personal (1.00)
Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

arXiv.org Artificial IntelligenceJun-9-2021

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

Sachan, Devendra Singh, Reddy, Siva, Hamilton, William, Dyer, Chris, Yogatama, Dani

We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectation-maximization algorithm. We iteratively estimate the value of our latent variable (the set of relevant documents for a given question) and then use this estimate to update the retriever and reader parameters. We hypothesize that such end-to-end training allows training signals to flow to the reader and then to the retriever better than staged-wise training. This results in a retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate documents to generate an answer. Experiments on three benchmark datasets demonstrate that our proposed method outperforms all existing approaches of comparable size by 2-3% absolute exact match points, achieving new state-of-the-art results. Our results also demonstrate the feasibility of learning to retrieve to improve answer generation without explicit supervision of retrieval decisions.

artificial intelligence, computer game, retriever, (18 more...)

2106.05346

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningSep-3-2019

Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation

Huang, Po-Sen, Stanforth, Robert, Welbl, Johannes, Dyer, Chris, Yogatama, Dani, Gowal, Sven, Dvijotham, Krishnamurthy, Kohli, Pushmeet

Previous work has used adversarial training and data augmentation to partially mitigate such brittleness, but these are unlikely to find worst-case adversaries due to the complexity of the search space arising from discrete text perturbations. In this work, we approach the problem from the opposite direction: to formally verify a system's robustness against a predefined class of adversarial attacks. We study text classification under synonym replacements or character flip perturbations. We propose modeling these input perturbations as a simplex and then using Interval Bound Propagation - a formal model verification method. We modify the conventional log-likelihood training objective to train models that can be efficiently verified, which would otherwise come with exponential search complexity. The resulting models show only little difference in terms of nominal accuracy, but have much improved verified accuracy under perturbations and come with an efficiently computable formal guarantee on worst case adversaries. 1 Introduction Deep models have been shown to be vulnerable against adversarial input perturbations (Szegedy et al., 2013; Kurakin et al., 2016). Small, semantically invariant input alterations can lead to drastic changes in predictions, leading to poor performance on adversarially chosen samples. Recent work (Jia and Liang, 2017; Belinkov and Bisk, 2018; Ettinger et al., 2017) also exposed the vulnerabilities of neural NLP models, e.g. with small

accuracy, deep learning, speech recognition, (23 more...)

1909.01492

Country:

North America > United States > Louisiana (0.14)
North America > United States > Ohio (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningJul-12-2019

Compound Probabilistic Context-Free Grammars for Grammar Induction

Kim, Yoon, Dyer, Chris, Rush, Alexander M.

We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar. In contrast to traditional formulations which learn a single stochastic grammar, our context-free rule probabilities are modulated by a per-sentence continuous latent variable, which induces marginal dependencies beyond the traditional context-free assumptions. Inference in this grammar is performed by collapsed variational inference, in which an amortized variational posterior is placed on the continuous variable, and the latent trees are marginalized with dynamic programming. Experiments on English and Chinese show the effectiveness of our approach compared to recent state-of-the-art methods for grammar induction from words with neural language models.

deep learning, neural network, proceedings, (21 more...)

1906.10225

Country: North America > United States (1.00)

Genre: Research Report > Promising Solution (0.34)

Industry:

Law (1.00)
Banking & Finance > Trading (1.00)
Government > Military (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

arXiv.org Machine LearningApr-15-2019

Unsupervised Recurrent Neural Network Grammars

Kim, Yoon, Rush, Alexander M., Yu, Lei, Kuncoro, Adhiguna, Dyer, Chris, Melis, Gábor

Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNGs. Since directly marginalizing over the space of latent trees is intractable, we instead apply amortized variational inference. To maximize the evidence lower bound, we develop an inference network parameterized as a neural CRF constituency parser. On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese. On constituency grammar induction, they are competitive with recent neural language models that induce tree structures from words through attention mechanisms.

deep learning, neural network, proceedings, (19 more...)

1904.03746

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningApr-15-2019

An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search

Goyal, Kartik, Dyer, Chris, Berg-Kirkpatrick, Taylor

Globally normalized neural sequence models are considered superior to their locally normalized equivalents because they may ameliorate the effects of label bias. However, when considering high-capacity neural parametrizations that condition on the whole input sequence, both model classes are theoretically equivalent in terms of the distributions they are capable of representing. Thus, the practical advantage of global normalization in the context of modern neural methods remains unclear. In this paper, we attempt to shed light on this problem through an empirical study. We extend an approach for search-aware training via a continuous relaxation of beam search (Goyal et al., 2017b) in order to enable training of globally normalized recurrent sequence models through simple backpropagation. We then use this technique to conduct an empirical study of the interaction between global normalization, high-capacity encoders, and search-aware optimization. We observe that in the context of inexact search, globally normalized neural models are still more effective than their locally normalized counterparts. Further, since our training approach is sensitive to warm-starting with pre-trained models, we also propose a novel initialization strategy based on self-normalization for pre-training globally normalized models. We perform analysis of our approach on two tasks: CCG supertagging and Machine Translation, and demonstrate the importance of global normalization under different conditions while using search-aware training.

deep learning, neural network, normalized model, (20 more...)

1904.06834

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)