Goto

Collaborating Authors

Deep Learning, NLP, and Representations - colah's blog

#artificialintelligence

In the last few years, deep neural networks have dominated pattern recognition. They blew the previous state of the art out of the water for many computer vision tasks. Voice recognition is also moving that way. But despite the results, we have to wonder… why do they work so well? In doing so, I hope to make accessible one promising answer as to why deep neural networks work. I think it's a very elegant perspective. A neural network with a hidden layer has universality: given enough hidden units, it can approximate any function. This is a frequently quoted – and even more frequently, misunderstood and applied – theorem.


Deep Learning, NLP, and Representations - colah's blog

#artificialintelligence

In the last few years, deep neural networks have dominated pattern recognition. They blew the previous state of the art out of the water for many computer vision tasks. Voice recognition is also moving that way. But despite the results, we have to wonder… why do they work so well? In doing so, I hope to make accessible one promising answer as to why deep neural networks work. I think it's a very elegant perspective. A neural network with a hidden layer has universality: given enough hidden units, it can approximate any function. This is a frequently quoted – and even more frequently, misunderstood and applied – theorem.


Visualizing Representations: Deep Learning and Human Beings - colah's blog

#artificialintelligence

Imagine training a neural network and watching its representations wander through this space. You can see how your representations compare to other "landmark" representations from past experiments. If your model's first layer representation is in the same place a really successful model's was during training, that's a good sign! If it's veering off towards a cluster you know had too high learning rates, you know you should lower it. This can give us qualitative feedback during neural network training.


Semantic Vector Machines

arXiv.org Artificial Intelligence

We first present our work in machine translation, during which we used aligned sentences to train a neural network to embed n-grams of different languages into an $d$-dimensional space, such that n-grams that are the translation of each other are close with respect to some metric. Good n-grams to n-grams translation results were achieved, but full sentences translation is still problematic. We realized that learning semantics of sentences and documents was the key for solving a lot of natural language processing problems, and thus moved to the second part of our work: sentence compression. We introduce a flexible neural network architecture for learning embeddings of words and sentences that extract their semantics, propose an efficient implementation in the Torch framework and present embedding results comparable to the ones obtained with classical neural language models, while being more powerful.


Demystifying Word2Vec

#artificialintelligence

Research into word embeddings is one of the most interesting in the deep learning world at the moment, even though they were introduced as early as 2003 by Bengio, et al. Most prominently among these new techniques has been a group of related algorithm commonly referred to as Word2Vec which came out of google research.[ In particular we are going to examine some desired properties of word embeddings and the shortcomings of other popular approaches centered around the concept of a Bag of Words (henceforth referred to simply as Bow) such as Latent Semantic Analysis. This shall motivate a detailed exposition of how and why Word2Vec works and whether the word embeddings derived from this method can remedy some of the shortcomings of BoW based approaches. Word2Vec and the concept of word embeddings originate in the domain of NLP, however as we shall see the idea of words in the context of a sentence or a surrounding word window can be generalized to any problem domain dealing with sequences or sets of related data points.