Goto

Collaborating Authors

 Markov Models


Optimising The Input Window Alignment in CD-DNN Based Phoneme Recognition for Low Latency Processing

arXiv.org Machine Learning

A key factor that determines the usability of applications based on speech recognition is the latency or lag of the system. In dialogue systems, e.g., long latencies may disrupt the natural turntaking in the human-machine conversation. In other specific applications the lag may even be more critical. A typical example involves systems that use ASR to drive the lip movements of an avatar in real time to support telepresence [3, 4, 5]. The latency in a typical speech recogniser based on a hybrid between Neural Networks (NNs) and Hidden Markov Models (HMMs) is determined by a number of factors: - the hardware (sound card) introduces some lag in digitising the speech samples and making them available to the drivers. Typical values are in the order of milliseconds; - the speech samples are returned by the driver in buffers of a certain size (this could be as long as half a second, but can be reduced to a few ms); - in spectral based feature extraction, speech samples are grouped into windows (frames) often around 25-40 ms in length; - many methods for feature extraction also compute time derivatives of the features, which require a number of frames in the past and the future.


Imitation neurones, genuine potential

#artificialintelligence

This structural design can support calculations being made upon thousands of layers, and it was this aspect of the architecture that gave rise to the name'deep learning'. Marchand-Maillet explains: "Each artificial neurone is assigned an input value, which it computes using a mathematical function, only firing if the output exceeds a pre-defined threshold." In this way, it reproduces the behaviour of real neurones, which only fire and transmit information when the input signal (the potential difference across the entire neural circuit) reaches a certain level. In the artificial model, the results of a single layer are weighted, added up and then sent as the input signal to the following layer, which processes that input using different functions, and so on and so forth. For example, if a system is trained with great quantities of photos of apples and watermelons, it will progressively learn to distinguish them on the basis of diameter, says Marchand-Maillet. If it cannot decide (e.g., when processing a picture of a tiny watermelon), the subsequent layers take over by analysing the colours or textures of the fruit in the photo, and so on.


Deep Learning Libraries by Language

#artificialintelligence

Gensim is deep learning toolkit implemented in python programming language intended for handling large text collections, using efficient algorithms.


Modeling Group Dynamics Using Probabilistic Tensor Decompositions

arXiv.org Machine Learning

In this paper, we consider the problem of modeling discrete social network data and learning the underlying group dynamics. The goal is to develop probabilistic profiles of large collections of data while preserving the essential temporal relationships that provide insights for various applications of interest. For example, in social network analysis, we want to analyze relationships between social agents and their behaviors over time and on various social media sites (i.e., Facebook, Twitter, Instagram, Google, etc.). In web advertising analysis, we want to analyze the relationships between customers and the types of products they buy from different shopping sites to capture customers' buying behaviors and learn the intrinsic factors that effect their buying decision process. In the study of scientific collaboration, using co-authorship networks from multiple journals on related subjects, one can analyze relationships between subjects and authors.


Explainable Restricted Boltzmann Machines for Collaborative Filtering

arXiv.org Machine Learning

Most accurate recommender systems are black-box models, hiding the reasoning behind their recommendations. Yet explanations have been shown to increase the user's trust in the system in addition to providing other benefits such as scrutability, meaning the ability to verify the validity of recommendations. This gap between accuracy and transparency or explainability has generated an interest in automated explanation generation methods. Restricted Boltzmann Machines (RBM) are accurate models for CF that also lack interpretability. In this paper, we focus on RBM based collaborative filtering recommendations, and further assume the absence of any additional data source, such as item content or user attributes. We thus propose a new Explainable RBM technique that computes the top-n recommendation list from items that are explainable. Experimental results show that our method is effective in generating accurate and explainable recommendations.


Visualizing Dynamics: from t-SNE to SEMI-MDPs

arXiv.org Machine Learning

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in many challenging problems such as playing Atari, solving Go and controlling robots. While DRL agents perform well in practice we are still missing the tools to analayze their performance and visualize the temporal abstractions that they learn. In this paper, we present a novel method that automatically discovers an internal Semi Markov Decision Process (SMDP) model in the Deep Q Network's (DQN) learned representation. We suggest a novel visualization method that represents the SMDP model by a directed graph and visualize it above a t-SNE map. We show how can we interpret the agent's policy and give evidence for the hierarchical state aggregation that DQNs are learning automatically. Our algorithm is fully automatic, does not require any domain specific knowledge and is evaluated by a novel likelihood based evaluation criteria.


Graph based manifold regularized deep neural networks for automatic speech recognition

arXiv.org Machine Learning

ABSTRACT Deep neural networks (DNNs) have been successfully applied to a wide variety of acoustic modeling tasks in recent years. These include the applications of DNNs either in a discriminative feature extraction or in a hybrid acoustic modeling scenario. Despite the rapid progress in this area, a number of challenges remain in training DNNs. This paper presents an effective way of training DNNs using a manifold learning based regularization framework. In this framework, the parameters of the network are optimized to preserve underlying manifold based relationships between speech feature vectors while minimizing a measure of loss between network outputs and targets. This is achieved by incorporating manifold based locality constraints in the objective criterion of DNNs. Empirical evidence is provided to demonstrate that training a network with manifold constraints preserves structural compactness in the hidden layers of the network. Manifold regularization is applied to train bottleneck DNNs for feature extraction in hidden Markov model (HMM) based speech recognition. The experiments in this work are conducted on the Aurora-2 spoken digits and the Aurora-4 read news large vocabulary continuous speech recognition tasks. The performance is measured in terms of word error rate (WER) on these tasks. It is shown that the manifold regularized DNNs result in up to 37% reduction in WER relative to standard DNNs. Index Terms-- manifold learning, deep neural networks, manifold regularization, manifold regularized deep neural networks, speech recognition 1. INTRODUCTION Recently there has been a resurgence of research in the area of deep neural networks (DNNs) for acoustic modeling in automatic speech recognition (ASR) [1-6]. Much of this research has been concentrated on techniques for regularization of the algorithms used for DNN parameter estimation [7-9]. At the same time, there has also been a great deal of research on graph based techniques that facilitate the preservation of local neighborhood relationships among feature vectors for parameter estimation in a number of application areas [10-13]. Algorithms that preserve these local relationships are often referred to as having the effect of applying manifold based constraints.



Unsupervised Risk Estimation Using Only Conditional Independence Structure

arXiv.org Machine Learning

We show how to estimate a model's test error from unlabeled data, on distributions very different from the training distribution, while assuming only that certain conditional independencies are preserved between train and test. We do not need to assume that the optimal predictor is the same between train and test, or that the true distribution lies in any parametric family. We can also efficiently differentiate the error estimate to perform unsupervised discriminative learning. Our technical tool is the method of moments, which allows us to exploit conditional independencies in the absence of a fully-specified model. Our framework encompasses a large family of losses including the log and exponential loss, and extends to structured output settings such as hidden Markov models.


Spectral decomposition method of dialog state tracking via collective matrix factorization

arXiv.org Machine Learning

The task of dialog management is commonly decomposed into two sequential subtasks: dialog state tracking and dialog policy learning. In an end-to-end dialog system, the aim of dialog state tracking is to accurately estimate the true dialog state from noisy observations produced by the speech recognition and the natural language understanding modules. The state tracking task is primarily meant to support a dialog policy. From a probabilistic perspective, this is achieved by maintaining a posterior distribution over hidden dialog states composed of a set of context dependent variables. Once a dialog policy is learned, it strives to select an optimal dialog act given the estimated dialog state and a defined reward function. This paper introduces a novel method of dialog state tracking based on a bilinear algebric decomposition model that provides an efficient inference schema through collective matrix factorization. We evaluate the proposed approach on the second Dialog State Tracking Challenge (DSTC-2) dataset and we show that the proposed tracker gives encouraging results compared to the state-of-the-art trackers that participated in this standard benchmark. Finally, we show that the prediction schema is computationally efficient in comparison to the previous approaches.