AITopics | Cottrell, Garrison W.

Collaborating Authors

Cottrell, Garrison W.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ReZero is All You Need: Fast Convergence at Large Depth

Bachlechner, Thomas, Majumder, Bodhisattwa Prasad, Mao, Huanru Henry, Cottrell, Garrison W., McAuley, Julian

arXiv.org Machine LearningMar-10-2020

Deep networks have enabled significant performance gains across domains, but they often suffer from vanishing/exploding gradients. This is especially true for Transformer architectures where depth beyond 12 layers is difficult to train without large datasets and computational budgets. In general, we find that inefficient signal propagation impedes learning in deep networks. In Transformers, multi-head self-attention is the main cause of this poor signal propagation. To facilitate deep signal propagation, we propose ReZero, a simple change to the architecture that initializes an arbitrary layer as the identity map, using a single additional learned parameter per layer. We apply this technique to language modeling and find that we can easily train ReZero-Transformer networks over a hundred layers. When applied to 12 layer Transformers, ReZero converges 56% faster on enwiki8. ReZero applies beyond Transformers to other residual networks, enabling 1,500% faster convergence for deep fully connected networks and 32% faster convergence for a ResNet-56 trained on CIFAR 10.

deep learning, neural network, transformer, (17 more...)

arXiv.org Machine Learning

2003.04887

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Neural Story Generation by Targeted Common Sense Grounding

Mao, Huanru Henry, Majumder, Bodhisattwa Prasad, McAuley, Julian, Cottrell, Garrison W.

arXiv.org Machine LearningAug-25-2019

Stories generated with neural language models have shown promise in grammatical and stylistic consistency. However, the generated stories are still lacking in common sense reasoning, e.g., they often contain sentences deprived of world knowledge. W e propose a simple multi-task learning scheme to achieve quantitatively better common sense reasoning in language models by leveraging auxiliary training signals from datasets designed to provide common sense grounding. When combined with our two-stage fine-tuning pipeline, our method achieves improved common sense reasoning and state-of-the-art perplexity on the Writing-Prompts ( Fan et al., 2018) story generation dataset.

artificial intelligence, commonsense reasoning, dataset, (19 more...)

arXiv.org Machine Learning

1908.09451

Country: North America > United States > California (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

LakhNES: Improving multi-instrumental music generation with cross-domain pre-training

Donahue, Chris, Mao, Huanru Henry, Li, Yiting Ethan, Cottrell, Garrison W., McAuley, Julian

arXiv.org Machine LearningJul-10-2019

We are interested in the task of generating multi-instrumental music scores. The Transformer architecture has recently shown great promise for the task of piano score generation; here we adapt it to the multi-instrumental setting. Transformers are complex, high-dimensional language models which are capable of capturing long-term structure in sequence data, but require large amounts of data to fit. Their success on piano score generation is partially explained by the large volumes of symbolic data readily available for that domain. We leverage the recently-introduced NES-MDB dataset of four-instrument scores from an early video game sound synthesis chip (the NES), which we find to be well-suited to training with the Transformer architecture. To further improve the performance of our model, we propose a pre-training technique to leverage the information in a large collection of heterogeneous music, namely the Lakh MIDI dataset. Despite differences between the two corpora, we find that this transfer learning procedure improves both quantitative and qualitative performance for our primary task.

computer game, deep learning, instrument, (21 more...)

arXiv.org Machine Learning

1907.04868

Country: Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Example Selection For Dictionary Learning

Tsuchida, Tomoki, Cottrell, Garrison W.

arXiv.org Artificial IntelligenceMar-31-2015

In unsupervised learning, an unbiased uniform sampling strategy is typically used, in order that the learned features faithfully encode the statistical structure of the training data. In this work, we explore whether active example selection strategies - algorithms that select which examples to use, based on the current estimate of the features - can accelerate learning. Specifically, we investigate effects of heuristic and saliency-inspired selection algorithms on the dictionary learning task with sparse activations. We show that some selection algorithms do improve the speed of learning, and we speculate on why they might work.

algorithm, artificial intelligence, inductive learning, (17 more...)

arXiv.org Artificial Intelligence

1412.6177

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Recursive ICA

Shan, Honghao, Zhang, Lingyun, Cottrell, Garrison W.

Neural Information Processing SystemsDec-31-2007

Independent Component Analysis (ICA) is a popular method for extracting independent featuresfrom visual data. However, as a fundamentally linear technique, there is always nonlinear residual redundancy that is not captured by ICA. Hence there have been many attempts to try to create a hierarchical version of ICA, but so far none of the approaches have a natural way to apply them more than once. Here we show that there is a relatively simple technique that transforms the absolute values ofthe outputs of a previous application of ICA into a normal distribution, to which ICA maybe applied again. This results in a recursive ICA algorithm that may be applied any number of times in order to extract higher order structure from previous layers.

artificial intelligence, health & medicine, marginal distribution, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Diego County (0.14)

Industry: Health & Medicine (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

The Early Word Catches the Weights

Smith, Mark A., Cottrell, Garrison W., Anderson, Karen L.

Neural Information Processing SystemsDec-31-2001

The strong correlation between the frequency of words and their naming latency has been well documented. However, as early as 1973, the Age of Acquisition (AoA) of a word was alleged to be the actual variable of interest, but these studies seem to have been ignored in most of the literature. Recently, there has been a resurgence of interest in AoA. While some studies have shown that frequency has no effect when AoA is controlled for, more recent studies have found independent contributions of frequency and AoA. Connectionist models have repeatedly shown strong effects of frequency, but little attention has been paid to whether they can also show AoA effects. Indeed, several researchers have explicitly claimed that they cannot show AoA effects. In this work, we explore these claims using a simple feed forward neural network. We find a significant contribution of AoA to naming latency, as well as conditions under which frequency provides an independent contribution.

artificial intelligence, frequency, neural network, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report > New Finding (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Early Word Catches the Weights

Smith, Mark A., Cottrell, Garrison W., Anderson, Karen L.

Neural Information Processing SystemsDec-31-2001

artificial intelligence, frequency, neural network, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report > New Finding (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Early Word Catches the Weights

Smith, Mark A., Cottrell, Garrison W., Anderson, Karen L.

Neural Information Processing SystemsDec-31-2001

The strong correlation between the frequency of words and their naming latency has been well documented. However, as early as 1973, the Age of Acquisition (AoA) of a word was alleged to be the actual variable of interest, but these studies seem to have been ignored in most of the literature. Recently,there has been a resurgence of interest in AoA. While some studies have shown that frequency has no effect when AoA is controlled for,more recent studies have found independent contributions of frequency and AoA. Connectionist models have repeatedly shown strong effects of frequency, but little attention has been paid to whether they can also show AoA effects. Indeed, several researchers have explicitly claimed that they cannot show AoA effects. In this work, we explore these claims using a simple feed forward neural network. We find a significant contributionof AoA to naming latency, as well as conditions under which frequency provides an independent contribution.

artificial intelligence, frequency, neural network, (20 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report > New Finding (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Facial Memory Is Kernel Density Estimation (Almost)

Dailey, Matthew N., Cottrell, Garrison W., Busey, Thomas A.

Neural Information Processing SystemsDec-31-1999

We compare the ability of three exemplar-based memory models, each using three different face stimulus representations, to account for the probability a human subject responded "old" in an old/new facial memory experiment. The models are 1) the Generalized Context Model, 2) SimSample, a probabilistic sampling model, and 3) MMOM, a novel model related to kernel density estimation that explicitly encodes stimulus distinctiveness. The representations are 1) positions of stimuli in MDS "face space," 2) projections of test faces onto the "eigenfaces" of the study set, and 3) a representation based on response to a grid of Gabor filter jets. Of the 9 model/representation combinations, only the distinctiveness model in MDS space predicts the observed "morph familiarity inversion" effect, in which the subjects' false alarm rate for morphs between similar faces is higher than their hit rate for many of the studied faces. This evidence is consistent with the hypothesis that human memory for faces is a kernel density estimation task, with the caveat that distinctive faces require larger kernels than do typical faces.

artificial intelligence, machine learning, representation, (14 more...)

Neural Information Processing Systems

Country: