AITopics

2401.01952

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceJun-2-2023

Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute

de Jong, Michiel, Zemlyanskiy, Yury, FitzGerald, Nicholas, Ainslie, Joshua, Sanghai, Sumit, Sha, Fei, Cohen, William

Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks. However, they are also expensive, due to the need to encode a large number of retrieved passages. Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly. However, pre-encoding memory incurs a severe quality penalty as the memory representations are not conditioned on the current input. We propose LUMEN, a hybrid between these two extremes, pre-computing the majority of the retrieval representation and completing the encoding on the fly using a live encoder that is conditioned on the question and fine-tuned for the task. We show that LUMEN significantly outperforms pure memory on multiple question-answering tasks while being much cheaper than FiD, and outperforms both for any given compute budget. Moreover, the advantage of LUMEN over FiD increases with model size.

artificial intelligence, machine learning, natural language, (17 more...)

2301.10448

Country:

Europe (0.68)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Louisiana (0.14)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

arXiv.org Artificial IntelligenceJun-2-2023

FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference

de Jong, Michiel, Zemlyanskiy, Yury, Ainslie, Joshua, FitzGerald, Nicholas, Sanghai, Sumit, Sha, Fei, Cohen, William

Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard T5 model, which our analysis shows to be highly suboptimal for a retrieval-augmented model. In particular, FiD allocates the bulk of FLOPs to the encoder, while the majority of inference time results from memory bandwidth constraints in the decoder. We propose two simple changes to the FiD architecture to alleviate memory bandwidth constraints, and speed up inference by 7x. This allows us to use a much larger decoder at modest cost. We denote FiD with the above modifications as FiDO, and show that it strongly improves performance over existing FiD models for a wide range of inference budgets. For example, FiDO-Large-XXL performs faster inference than FiD-Base and achieves better performance than FiD-Large.

decoder, machine learning, natural language, (16 more...)

2212.08153

Country:

Europe (0.68)
North America > United States > California (0.46)
North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJan-23-2023

Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering

Chen, Wenhu, Verga, Pat, de Jong, Michiel, Wieting, John, Cohen, William

Retrieval augmented language models have recently become the standard for knowledge intensive tasks. Rather than relying purely on latent semantics within the parameters of large neural models, these methods enlist a semi-parametric memory to encode an index of knowledge for the model to retrieve over. Most prior work has employed text passages as the unit of knowledge, which has high coverage at the cost of interpretability, controllability, and efficiency. The opposite properties arise in other methods which have instead relied on knowledge base (KB) facts. At the same time, more recent work has demonstrated the effectiveness of storing and retrieving from an index of Q-A pairs derived from text \citep{lewis2021paq}. This approach yields a high coverage knowledge representation that maintains KB-like properties due to its representations being more atomic units of information. In this work we push this line of research further by proposing a question-answer augmented encoder-decoder model and accompanying pretraining strategy. This yields an end-to-end system that not only outperforms prior QA retrieval methods on single-hop QA tasks but also enables compositional reasoning, as demonstrated by strong performance on two multi-hop QA datasets. Together, these methods improve the ability to interpret and control the model while narrowing the performance gap with passage retrieval systems.

information retrieval, machine learning, question answering, (19 more...)

2204.04581

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

arXiv.org Artificial IntelligenceOct-12-2021

Mention Memory: incorporating textual knowledge into Transformers through entity mention attention

de Jong, Michiel, Zemlyanskiy, Yury, FitzGerald, Nicholas, Sha, Fei, Cohen, William

Natural language understanding tasks such as open-domain question answering often require retrieving and assimilating factual information from multiple sources. We propose to address this problem by integrating a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge. Specifically, our method represents knowledge with `mention memory', a table of dense vector representations of every entity mention in a corpus. The proposed model - TOME - is a Transformer that accesses the information through internal memory layers in which each entity mention in the input passage attends to the mention memory. This approach enables synthesis of and reasoning over many disparate sources of information within a single Transformer model. In experiments using a memory of 150 million Wikipedia mentions, TOME achieves strong performance on several open-domain knowledge-intensive tasks, including the claim verification benchmarks HoVer and FEVER and several entity-based QA benchmarks. We also show that the model learns to attend to informative mentions without any direct supervision. Finally we demonstrate that the model can generalize to new unseen entities by updating the memory without retraining.

artificial intelligence, natural language, text processing, (17 more...)

2110.06176

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.93)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.89)

AAAI ConferencesMar-6-2015

Never-Ending Learning

Whereas people learn many different types of knowledge from diverse experiences over many years, most current machine learning systems acquire just a single function or data model from just a single data set. We propose a never-ending learning paradigm for machine learning, to better reflect the more ambitious and encompassing type of learning performed by humans. As a case study, we describe the Never-Ending Language Learner (NELL), which achieves some of the desired properties of a never-ending learner, and we discuss lessons learned. NELL has been learning to read the web 24 hours/day since January 2010, and so far has acquired a knowledge base with over 80 million confidence-weighted beliefs (e.g., servedWith(tea, biscuits) ). NELL has also learned millions of features and parameters that enable it to read these beliefs from the web. Additionally, it has learned to reason over these beliefs to infer new beliefs, and is able to extend its ontology by synthesizing new relational predicates. NELL can be tracked online at http://rtw.ml.cmu.edu, and followed on Twitter at @CMUNELL.

constraint, inductive learning, neural network, (24 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.24)
Europe > United Kingdom > Scotland (0.14)

Industry:

Education (1.00)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.35)

AAAI ConferencesNov-5-2012

Invited Talks

Clark, Timothy W. (Harvard University) | Cohen, William (Carnegie Mellon University) | Hunter, Lawrence (University of Colorado, Denver) | Lintott, Chris (Cornell University) | Shavlik, Jude (University of Wisconsin, Madison)

His informatics group built the reusable software platform for Stembook Despite the fact that we now have access to almost all peer reviewed (www.stembook.org), William Cohen exchanged and is orthogonal to any specific biomedical domain The growing size of the scientific literature has led to a number of ontology. We believe this approach will be extremely useful in attempts to automatically extract entities and relationships from drug discovery to break down information silos, increase information scientific papers, and then to populate databases with this extracted awareness and sharing, and integrate terminologies and information. In my group we have been exploring techniques data with documents and text, both public and private. We will for using this sort of extracted information for specific tasks, discuss applications we are currently developing in collaboration including "bootstrapping" to improve the coverage of an extraction with a major pharma.

neurology, semantic web, university, (20 more...)

2012 AAAI Fall Symposium Series

Country:

North America > United States > Wisconsin (0.15)
North America > United States > Colorado (0.15)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.95)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)
(2 more...)

arXiv.org Machine LearningJul-12-2012

A Hierarchical Graphical Model for Record Linkage

Ravikumar, Pradeep, Cohen, William

The task of matching co-referent records is known among other names as rocord linkage. For large record-linkage problems, often there is little or no labeled data available, but unlabeled data shows a reasonable clear structure. For such problems, unsupervised or semi-supervised methods are preferable to supervised methods. In this paper, we describe a hierarchical graphical model framework for the linakge-problem in an unsupervised setting. In addition to proposing new methods, we also cast existing unsupervised probabilistic record-linkage methods in this framework. Some of the techniques we propose to minimize overfitting in the above model are of interest in the general graphical model setting. We describe a method for incorporating monotinicity constraints in a graphical model. We also outline a bootstrapping approach of using "single-field" classifiers to noisily label latent variables in a hierarchical model. Experimental results show that our proposed unsupervised methods perform quite competitively even with fully supervised record-linkage methods.

artificial intelligence, constraint, machine learning, (18 more...)

arXiv.org Machine Learning

1207.418

Country:

North America > United States > Texas (0.14)
North America > Canada > British Columbia (0.14)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

AAAI ConferencesNov-5-2010

Towards a Computational Model of Why Some Students Learn Faster than Others

Li, Nan (Carnegie Mellon University) | Matsuda, Noboru (Carnegie Mellon University) | Cohen, William (Carnegie Mellon University) | Koedinger, Kenneth

Learners that have better metacognition acquire knowledge faster than others who do not. If we had better models of such learning, we would be able to build a better metacognitive educational system. In this paper, we propose a computational model that uses a probabilistic context free grammar induction algorithm yielding metacognitive learning by acquiring deep features to assist future learning. We discuss the challenges of integrating this model into a synthetic student, and possible future studies in using this model to better understand human learning. Preliminary results suggest that both stronger prior knowledge and a better learning strategy can speed up the learning process. Some model variations generate human-like error pattern.

educational setting, inductive learning, simstudent, (16 more...)

2010 AAAI Fall Symposium Series

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre:

Overview (0.46)
Research Report > New Finding (0.34)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

AAAI ConferencesJul-15-2010

Integrating Transfer Learning in Synthetic Student

Li, Nan (Carnegie Mellon University) | Cohen, William (Carnegie Mellon University) | Koedinger, Ken (Carnegie Mellon University)

Building an intelligent agent, which simulates human-level learning appropriate for learning math, science, or a second language, could potentially benefit both education in understanding human learning, and artificial intelligence in creating human-level intelligence. Recently, we have proposed an efficient approach to acquiring procedural knowledge using transfer learning. However, it operated as a separate module. In this paper, we describe how to integrate this module into a machine-learning agent, SimStudent, that learns procedural knowledge from examples and through problem solving. We illustrate this method in the domain of algebra, after which we consider directions for future research in this area.

artificial intelligence, natural language, simstudent, (14 more...)

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.16)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.65)