Undirected Networks
AI, NLP: Way Deeper Than Microsoft's Marketing
Welcome to the natural language processing (NLP) edition of my "under-the-hood" series on AI (artificial intelligence) technology. Most major tech companies are announcing initiatives, introducing products (e.g., Amazon (NASDAQ:AMZN) with its Echo device and platform) or experiments in public (e.g., Microsoft's (NASDAQ:MSFT) Tay debacle). On Seeking Alpha (and I am sure in many other places as well), the topic of natural language processing seems to be confusing many people because they are not clear on what new technologies here can do and what they should expect from them in the immediate future. Even though I have touched on the subject before - e.g. on my explanation of IBM's (NYSE:IBM) Watson - I feel like I have not explained the specific topic of natural language processing comprehensively. There is a specific reason why the topic is getting so much attention and it has to do with the convergence of various subfields of computer science and statistics.
Towards Bayesian Deep Learning: A Survey
As another example, to achieve high accuracy in recommender systems [45], [60], we need to fully understand the content of items (e.g., documents and movies), analyze the profile and preference of users, and evaluate the similarity among users. Deep learning is good at the first subtask while PGM excels at the other two. Besides the fact that better understanding of item content would help with the analysis of user profiles, the estimated similarity among users could provide valuable information for understanding item content in return. In order to fully utilize this bidirectional effect to boost recommendation accuracy, we might wish to unify deep learning and PGM in one single principled probabilistic framework, as done in [60]. Besides recommender systems, the need for Bayesian deep learning may also arise when we are dealing with control of nonlinear dynamical systems with raw images as input. Consider controlling a complex dynamical system according to the live video stream received from a camera. This problem can be transformed into iteratively performing two tasks, perception from raw images and control based on dynamic models. The perception task can be taken care of using multiple layers of simple nonlinear transformation (deep learning) while the control task usually needs more sophisticated models like hidden Markov models and Kalman filters [21], [38]. The feedback loop is then completed by the fact that actions chosen by the control model can affect the received video stream in return.
On the Geometry of Message Passing Algorithms for Gaussian Reciprocal Processes
Reciprocal processes are acausal generalizations of Markov processes introduced by Bernstein in 1932. In the literature, a significant amount of attention has been focused on developing dynamical models for reciprocal processes. Recently, probabilistic graphical models for reciprocal processes have been provided. This opens the way to the application of efficient inference algorithms in the machine learning literature to solve the smoothing problem for reciprocal processes. Such algorithms are known to converge if the underlying graph is a tree. This is not the case for a reciprocal process, whose associated graphical model is a single loop network. The contribution of this paper is twofold. First, we introduce belief propagation for Gaussian reciprocal processes. Second, we establish a link between convergence analysis of belief propagation for Gaussian reciprocal processes and stability theory for differentially positive systems.
Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
Serban, Iulian V., Sordoni, Alessandro, Bengio, Yoshua, Courville, Aaron, Pineau, Joelle
We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models. Generative models produce system responses that are autonomously generated word-by-word, opening up the possibility for realistic, flexible interactions. In support of this goal, we extend the recently proposed hierarchical recurrent encoder-decoder neural network to the dialogue domain, and demonstrate that this model is competitive with state-of-the-art neural language models and back-off n-gram models. We investigate the limitations of this and similar approaches, and show how its performance can be improved by bootstrapping the learning from a larger question-answer pair corpus and from pretrained word embeddings.
"The Five Tribes of Machine Learning (And What You Can Learn from Each)," Pedro Domingos
There are five main schools of thought in machine learning, and each has its own master algorithm โ a general-purpose learner that can in principle be applied to any domain. The symbolists have inverse deduction, the connectionists have backpropagation, the evolutionaries have genetic programming, the Bayesians have probabilistic inference, and the analogizers have support vector machines. What we really need, however, is a single algorithm combining the key features of all of them. In this webinar I will summarize the five paradigms and describe my work toward unifying them, including in particular Markov logic networks. I will conclude by speculating on the new applications that a universal learner will enable, and how society will change as a result.
Jiaconda/Home-Security
The first step to before being able to do Image or audio analyses would be to extract relevant frames from the video streams in real time. This is crucial to a smart interactive device and requires extensive down sizing of the data to run the models on the features identified most relevant. One also needs the device to identify and react to certain events (owner coming home, break-in etc) through a frame by frame comparative analysis. Let us start with the event that there is a disturbance and the image frames and audio data is fed into the trained model to classify the event into pre-defined classes (simplest cast being intrusion vs non intrusion). Given a frame, let us start with the features that we would extract from it to first look for faces within the scenario and then if we find one, to match it with the available "registered" face repository. Given pictures of the home-owner/family, we will have an extensively pre trained model.
A Latent Variable Recurrent Neural Network for Discourse Relation Language Models
Ji, Yangfeng, Haffari, Gholamreza, Eisenstein, Jacob
This paper presents a novel latent variable recurrent neural network architecture for jointly modeling sequences of words and (possibly latent) discourse relations between adjacent sentences. A recurrent neural network generates individual words, thus reaping the benefits of discriminatively-trained vector representations. The discourse relations are represented with a latent variable, which can be predicted or marginalized, depending on the task. The resulting model can therefore employ a training objective that includes not only discourse relation classification, but also word prediction. As a result, it outperforms state-of- the-art alternatives for two tasks: implicit discourse relation classification in the Penn Discourse Treebank, and dialog act classification in the Switchboard corpus. Furthermore, by marginalizing over latent discourse relations at test time, we obtain a discourse informed language model, which improves over a strong LSTM baseline.
K-Means Clustering - Lazy Programmer
K-means clustering is one of the simplest clustering algorithms one can use to find natural groupings of an unlabeled data set. Another way of stating this is that k-means clustering is an unsupervised learning algorithm. "learning the structure of X without being given Y". K-means clustering finds "k" different means (surprise surprise) which represent the centers of k clusters and assigns each data point to one of these clusters. The cluster it is assigned to is the one where the distance (usually Euclidean) from the point to the mean is smallest.
Quadratization and Roof Duality of Markov Logic Networks
de Nijs, Roderick Sebastiaan, Landsiedel, Christian, Wollherr, Dirk, Buss, Martin
This article discusses the quadratization of Markov Logic Networks, which enables efficient approximate MAP computation by means of maximum flows. The procedure relies on a pseudo-Boolean representation of the model, and allows handling models of any order. The employed pseudo-Boolean representation can be used to identify problems that are guaranteed to be solvable in low polynomial-time. Results on common benchmark problems show that the proposed approach finds optimal assignments for most variables in excellent computational time and approximate solutions that match the quality of ILP-based solvers.
Is deep learning a Markov chain in disguise?
Andrej Karpathy's post "The Unreasonable Effectiveness of Recurrent Neural Networks" made splashes last year. The basic premise is that you can create a recurrent neural network to learn language features character-by-character. But is the resultant model any different from a Markov chain built for the same purpose? I implemented a character-by-character Markov chain in R to find out. First, let's play a variation of the Imitation Game with generated text from Karpathy's tinyshakespeare dataset.