AITopics | Deep Learning

On-line handwriting recognition is unusual among sequence labelling tasks in that the underlying generator of the observed data, i.e. the movement of the pen, is recorded directly. However, the raw data can be difficult to interpret because each letter is spread over many pen locations. As a consequence, sophisticated pre-processing is required to obtain inputs suitable for conventional sequence labelling algorithms, such as HMMs. In this paper we describe a system capable of directly transcribing raw on-line handwriting data. The system consists of a recurrent neural network trained for sequence labelling, combined with a probabilistic language model. In experiments on an unconstrained on-line database, we record excellent results using either raw or pre-processed data, well outperforming a benchmark HMM in both cases.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland (0.15)
North America > United States (0.14)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Greedy Layer-Wise Training of Deep Networks

Bengio, Yoshua, Lamblin, Pascal, Popovici, Dan, Larochelle, Hugo

Neural Information Processing SystemsDec-31-2007

Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly nonlinear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

algorithm, dbn, representation, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Greedy Layer-Wise Training of Deep Networks

Bengio, Yoshua, Lamblin, Pascal, Popovici, Dan, Larochelle, Hugo

Neural Information Processing SystemsDec-31-2007

Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly nonlinear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

algorithm, dbn, representation, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Greedy Layer-Wise Training of Deep Networks

Bengio, Yoshua, Lamblin, Pascal, Popovici, Dan, Larochelle, Hugo

Neural Information Processing SystemsDec-31-2007

Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly nonlinear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions.

algorithm, artificial intelligence, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Representing Part-Whole Relationships in Recurrent Neural Networks

Jain, Viren, Zhigulin, Valentin, Seung, H. S.

Neural Information Processing SystemsDec-31-2006

There is little consensus about the computational function of top-down synaptic connections in the visual system. Here we explore the hypothesis that top-down connections, like bottom-up connections, reflect partwhole relationships. We analyze a recurrent network with bidirectional synaptic interactions between a layer of neurons representing parts and a layer of neurons representing wholes. Within each layer, there is lateral inhibition. When the network detects a whole, it can rigorously enforce part-whole relationships by ignoring parts that do not belong.

neuron, part-whole relationship, w-neuron, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Representing Part-Whole Relationships in Recurrent Neural Networks

Jain, Viren, Zhigulin, Valentin, Seung, H. S.

Neural Information Processing SystemsDec-31-2006

There is little consensus about the computational function of top-down synaptic connections in the visual system. Here we explore the hypothesis that top-down connections, like bottom-up connections, reflect partwhole relationships. We analyze a recurrent network with bidirectional synaptic interactions between a layer of neurons representing parts and a layer of neurons representing wholes. Within each layer, there is lateral inhibition. When the network detects a whole, it can rigorously enforce part-whole relationships by ignoring parts that do not belong.

neuron, part-whole relationship, w-neuron, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Representing Part-Whole Relationships in Recurrent Neural Networks

Jain, Viren, Zhigulin, Valentin, Seung, H. S.

Neural Information Processing SystemsDec-31-2006

Abstract Missing

artificial intelligence, deep learning, representing part-whole relationship, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks

Bertschinger, Nils, Natschläger, Thomas, Legenstein, Robert A.

Neural Information Processing SystemsDec-31-2005

In this paper we analyze the relationship between the computational capabilities of randomly connected networks of threshold gates in the timeseries domain and their dynamical properties. In particular we propose a complexity measure which we find to assume its highest values near the edge of chaos, i.e. the transition from ordered to chaotic dynamics. Furthermore we show that the proposed complexity measure predicts the computational capabilities very well: only near the edge of chaos are such networks able to perform complex computations on time series. Additionally a simple synaptic scaling rule for self-organized criticality is presented and analyzed.

chaos, computation, computational capability, (13 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Styria > Graz (0.05)
Europe > Germany > Saxony > Leipzig (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks

Bertschinger, Nils, Natschläger, Thomas, Legenstein, Robert A.

Neural Information Processing SystemsDec-31-2005

In this paper we analyze the relationship between the computational capabilities of randomly connected networks of threshold gates in the timeseries domain and their dynamical properties. In particular we propose a complexity measure which we find to assume its highest values near the edge of chaos, i.e. the transition from ordered to chaotic dynamics. Furthermore we show that the proposed complexity measure predicts the computational capabilities very well: only near the edge of chaos are such networks able to perform complex computations on time series. Additionally a simple synaptic scaling rule for self-organized criticality is presented and analyzed.

chaos, computation, computational capability, (13 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Styria > Graz (0.05)
Europe > Germany > Saxony > Leipzig (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks

Bertschinger, Nils, Natschläger, Thomas, Legenstein, Robert A.

Neural Information Processing SystemsDec-31-2005

In this paper we analyze the relationship between the computational capabilities ofrandomly connected networks of threshold gates in the timeseries domain and their dynamical properties. In particular we propose a complexity measure which we find to assume its highest values near the edge of chaos, i.e. the transition from ordered to chaotic dynamics. Furthermore we show that the proposed complexity measure predicts the computational capabilities very well: only near the edge of chaos are such networks able to perform complex computations on time series. Additionally asimple synaptic scaling rule for self-organized criticality is presented and analyzed.

artificial intelligence, computational capability, machine learning, (15 more...)

Neural Information Processing Systems

Country: Europe > Austria (0.14)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

Filters

Collaborating Authors

Deep Learning

Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks

Representing Part-Whole Relationships in Recurrent Neural Networks

Representing Part-Whole Relationships in Recurrent Neural Networks

Representing Part-Whole Relationships in Recurrent Neural Networks

At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks

At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks

At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks