AITopics | Sussillo, David

Collaborating Authors

Sussillo, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Input Switched Affine Networks: An RNN Architecture Designed for Interpretability

Foerster, Jakob N., Gilmer, Justin, Chorowski, Jan, Sohl-Dickstein, Jascha, Sussillo, David

arXiv.org Artificial IntelligenceJun-12-2017

There exist many problem domains where the interpretability of neural network models is essential for deployment. Here we introduce a recurrent architecture composed of input-switched affine transformations - in other words an RNN without any explicit nonlinearities, but with input-dependent recurrent weights. This simple form allows the RNN to be analyzed via straightforward linear methods: we can exactly characterize the linear contribution of each input to the model predictions; we can use a change-of-basis to disentangle input, output, and computational hidden unit subspaces; we can fully reverse-engineer the architecture's solution to a simple task. Despite this ease of interpretation, the input switched affine network achieves reasonable performance on a text modeling tasks, and allows greater computational efficiency than networks with standard nonlinearities.

contribution, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

1611.09434

Country: North America > United States (0.46)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Capacity and Trainability in Recurrent Neural Networks

Collins, Jasmine, Sohl-Dickstein, Jascha, Sussillo, David

arXiv.org Machine LearningMar-3-2017

Two potential bottlenecks on the expressiveness of recurrent neural networks (RNNs) are their ability to store information about the task in their parameters, and to store information about the input history in their units. We show experimentally that all common RNN architectures achieve nearly the same per-task and per-unit capacity bounds with careful training, for a variety of tasks and stacking depths. They can store an amount of task information which is linear in the number of parameters, and is approximately 5 bits per parameter. They can additionally store approximately one real number from their input history per hidden unit. We further find that for several tasks it is the per-task parameter capacity bound that determines performance. These results suggest that many previous results comparing RNN architectures are driven primarily by differences in training effectiveness, rather than differences in capacity. Supporting this observation, we compare training difficulty for several architectures, and show that vanilla RNNs are far more difficult to train, yet have slightly higher capacity. Finally, we propose two novel RNN architectures, one of which is easier to train than the LSTM or GRU for deeply stacked architectures.

architecture, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1611.09913

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Online Sequence-to-Sequence Model Using Partial Conditioning

Jaitly, Navdeep, Le, Quoc V., Vinyals, Oriol, Sutskever, Ilya, Sussillo, David, Bengio, Samy

Neural Information Processing SystemsDec-31-2016

Sequence-to-sequence models have achieved impressive results on various tasks. However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences. This is because they generate an output sequence conditioned on an entire input sequence. In this paper, we present a Neural Transducer that can make incremental predictions as more input arrives, without redoing the entire computation. Unlike sequence-to-sequence models, the Neural Transducer computes the next-step distribution conditioned on the partially observed input sequence and the partially generated sequence. At each time step, the transducer can decide to emit zero to many output symbols. The data can be processed using an encoder and presented as input to the transducer. The discrete decision to emit a symbol at every time step makes it difficult to learn with conventional backpropagation. It is however possible to train the transducer by using a dynamic programming algorithm to generate target discrete decisions. Our experiments show that the Neural Transducer works well in settings where it is required to produce output predictions as data come in. We also find that the Neural Transducer performs well for long sequences even when attention mechanisms are not used.

alignment, deep learning, neural network, (20 more...)

Neural Information Processing Systems

Country: Europe > Spain (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)

Add feedback

Making brain-machine interfaces robust to future neural variability

Sussillo, David, Stavisky, Sergey D., Kao, Jonathan C., Ryu, Stephen I., Shenoy, Krishna V.

arXiv.org Machine LearningOct-19-2016

A major hurdle to clinical translation of brain-machine interfaces (BMIs) is that current decoders, which are trained from a small quantity of recent data, become ineffective when neural recording conditions subsequently change. We tested whether a decoder could be made more robust to future neural variability by training it to handle a variety of recording conditions sampled from months of previously collected data as well as synthetic training data perturbations. We developed a new multiplicative recurrent neural network BMI decoder that successfully learned a large variety of neural-to- kinematic mappings and became more robust with larger training datasets. When tested with a non-human primate preclinical BMI model, this decoder was robust under conditions that disabled a state-of-the-art Kalman filter based decoder. These results validate a new BMI strategy in which accumulated data history is effectively harnessed, and may facilitate reliable daily BMI use by reducing decoder retraining downtime.

decoder, deep learning, neural network, (23 more...)

arXiv.org Machine Learning

doi: 10.1038/ncomms13749

1610.05872

Country: North America > United States > California > Santa Clara County > Palo Alto (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

LFADS - Latent Factor Analysis via Dynamical Systems

Sussillo, David, Jozefowicz, Rafal, Abbott, L. F., Pandarinath, Chethan

arXiv.org Machine LearningAug-22-2016

Neuroscience is experiencing a data revolution in which many hundreds or thousands of neurons are recorded simultaneously. Currently, there is little consensus on how such data should be analyzed. Here we introduce LFADS (Latent Factor Analysis via Dynamical Systems), a method to infer latent dynamics from simultaneously recorded, single-trial, high-dimensional neural spiking data. LFADS is a sequential model based on a variational auto-encoder. By making a dynamical systems hypothesis regarding the generation of the observed data, LFADS reduces observed spiking to a set of low-dimensional temporal factors, per-trial initial conditions, and inferred inputs. We compare LFADS to existing methods on synthetic data and show that it significantly out-performs them in inferring neural firing rates and latent dynamics.

lfad, neural network, neurology, (20 more...)

arXiv.org Machine Learning

1608.06315

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Random Walk Initialization for Training Very Deep Feedforward Networks

Sussillo, David, Abbott, L. F.

arXiv.org Machine LearningFeb-27-2015

Training very deep networks is an important open problem in machine learning. One of many difficulties is that the norm of the back-propagated error gradient can grow or decay exponentially. Here we show that training very deep feed-forward networks (FFNs) is not as difficult as previously thought. Unlike when back-propagation is applied to a recurrent network, application to an FFN amounts to multiplying the error gradient by a different random matrix at each layer. We show that the successive application of correctly scaled random matrices to an initial vector results in a random walk of the log of the norm of the resulting vectors, and we compute the scaling that makes this walk unbiased. The variance of the random walk grows only linearly with network depth and is inversely proportional to the size of each layer. Practically, this implies a gradient whose log-norm scales with the square root of the network depth and shows that the vanishing gradient problem can be mitigated by increasing the width of the layers. Mathematical analyses and experimental results using stochastic gradient descent to optimize tasks related to the MNIST and TIMIT datasets are provided to support these claims. Equations for the optimal matrix scaling are provided for the linear and ReLU cases.

deep learning, equation, neural network, (20 more...)

arXiv.org Machine Learning

1412.6558

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback