Goto

Collaborating Authors

 Markov Models


Beyond similarity assessment: Selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm

arXiv.org Machine Learning

Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy.We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criteria (FIC), which is widely utilised in model selection for probabilistic models with hidden variables. Our simulations indicated this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies.


On better training the infinite restricted Boltzmann machines

arXiv.org Machine Learning

The infinite restricted Boltzmann machine (iRBM) is an extension of the classic RBM. It enjoys a good property of automatically deciding the size of the hidden layer according to specific training data. With sufficient training, the iRBM can achieve a competitive performance with that of the classic RBM. However, the convergence of learning the iRBM is slow, due to the fact that the iRBM is sensitive to the ordering of its hidden units, the learned filters change slowly from the left-most hidden unit to right. To break this dependency between neighboring hidden units and speed up the convergence of training, a novel training strategy is proposed. The key idea of the proposed training strategy is randomly regrouping the hidden units before each gradient descent step. Potentially, a mixing of infinite many iRBMs with different permutations of the hidden units can be achieved by this learning method, which has a similar effect of preventing the model from over-fitting as the dropout. The original iRBM is also modified to be capable of carrying out discriminative training. To evaluate the impact of our method on convergence speed of learning and the model's generalization ability, several experiments have been performed on the binarized MNIST and CalTech101 Silhouettes datasets. Experimental results indicate that the proposed training strategy can greatly accelerate learning and enhance generalization ability of iRBMs.


Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning

arXiv.org Machine Learning

Arkov decision processes (MDPs) have been widely used as a mathematical framework to solve stochastic sequential decision problems, such as autonomous driving [1], path planning [2], and quadrotor control [3]. In general, the goal of an MDP is to find the optimal policy function which maximizes the expected return. The expected return is a performance measure of a policy function and it is often defined as the expected sum of discounted rewards. An MDP is often used to formulate reinforcement learning (RL) [4], which aims to find the optimal policy without the explicit specification of stochasticity of an environment, and inverse reinforcement learning (IRL) [5], whose goal is to search the proper reward function that can explain the behavior of an expert who follows the underlying optimal policy. While the optimal solution of an MDP is a deterministic policy, it is not desirable to apply an MDP to the problems with multiple optimal actions. In perspective of RL, the knowledge of multiple optimal actions makes it possible to cope with unexpected situations. For example, suppose that an autonomous vehicle has multiple optimal routes to reach a given goal. If a traffic accident occurs at the currently selected optimal route, it is possible to avoid the accident by choosing another safe optimal route without additional computation of a new optimal route.


A Dynamic Edge Exchangeable Model for Sparse Temporal Networks

arXiv.org Machine Learning

We propose a dynamic edge exchangeable network model that can capture sparse connections observed in real temporal networks, in contrast to existing models which are dense. The model achieved superior link prediction accuracy on multiple data sets when compared to a dynamic variant of the blockmodel, and is able to extract interpretable time-varying community structures from the data. In addition to sparsity, the model accounts for the effect of social influence on vertices' future behaviours. Compared to the dynamic blockmodels, our model has a smaller latent space. The compact latent space requires a smaller number of parameters to be estimated in variational inference and results in a computationally friendly inference algorithm.


Deep learning for speech processing

#artificialintelligence

Net D-AE DBN DBM AEPerceptron RBM?GMM BayesNP SVM Supervised Supervised Unsupervised Sparse Coding SP Boosting DecisionTree Deep Neural Net RNN?Bayes Nets Modified from 16. 16 Signal Processing Information Processing Signals Processing Audio/Music Speech Image/ Animation/ Graphics Video Text/ Language Coding/ Compression Audio Coding Speech Coding Image Coding Video Coding Document Compression/ Summary Communication Voice over IP, DAB,etc 4G/5G Networks, DVB, Home Networking, etc Security Multimedia watermarking, encryption, etc. Enhancement/ Analysis De-noising/ Source separation Speech Enhancement/ Feature extraction Image/video enhancement (Clear Type), Segmentation, feature extraction Grammar checking, Text Parsing Synthesis/ Rendering Computer Music Speech Synthesis (text-to-speech) Computer Graphics/ Video Synthesis Natural Language Generation User-Interface Multi-Modal Human Computer Interaction (HCI --- Input Methods) Recognition Auditory Scene Analysis (Computer audition; e.g.


A Guide For Time Series Prediction Using Recurrent Neural Networks (LSTMs)

@machinelearnbot

Note: The Statsbot team has already published the article about using time series analysis for anomaly detection. Today, we'd like to discuss time series prediction with a long short-term memory model (LSTMs). We asked a data scientist, Neelabh Pant, to tell you about his experience of forecasting exchange rates using recurrent neural networks. As an Indian guy living in the US, I have a constant flow of money from home to me and vice versa. If the USD is stronger in the market, then the Indian rupee (INR) goes down, hence, a person from India buys a dollar for more rupees.


The Linear Programming Approach to Reach-Avoid Problems for Markov Decision Processes

Journal of Artificial Intelligence Research

One of the most fundamental problems in Markov decision processes is analysis and control synthesis for safety and reachability specifications. We consider the stochastic reach-avoid problem, in which the objective is to synthesize a control policy to maximize the probability of reaching a target set at a given time, while staying in a safe set at all prior times. We characterize the solution to this problem through an infinite dimensional linear program. We then develop a tractable approximation to the infinite dimensional linear program through finite dimensional approximations of the decision space and constraints. For a large class of Markov decision processes modeled by Gaussian mixtures kernels we show that through a proper selection of the finite dimensional space, one can further reduce the computational complexity of the resulting linear program. We validate the proposed method and analyze its potential with numerical case studies.


Duality of Graphical Models and Tensor Networks

arXiv.org Machine Learning

In this article we show the duality between tensor networks and undirected graphical models with discrete variables. We study tensor networks on hypergraphs, which we call tensor hypernetworks. We show that the tensor hypernetwork on a hypergraph exactly corresponds to the graphical model given by the dual hypergraph. We translate various notions under duality. For example, marginalization in a graphical model is dual to contraction in the tensor network. Algorithms also translate under duality. We show that belief propagation corresponds to a known algorithm for tensor network contraction. This article is a reminder that the research areas of graphical models and tensor networks can benefit from interaction.


A Deterministic and Generalized Framework for Unsupervised Learning with Restricted Boltzmann Machines

arXiv.org Machine Learning

Restricted Boltzmann machines (RBMs) are energy-based neural-networks which are commonly used as the building blocks for deep architectures neural architectures. In this work, we derive a deterministic framework for the training, evaluation, and use of RBMs based upon the Thouless-Anderson-Palmer (TAP) mean-field approximation of widely-connected systems with weak interactions coming from spin-glass theory. While the TAP approach has been extensively studied for fully-visible binary spin systems, our construction is generalized to latent-variable models, as well as to arbitrarily distributed real-valued spin systems with bounded support. In our numerical experiments, we demonstrate the effective deterministic training of our proposed models and are able to show interesting features of unsupervised learning which could not be directly observed with sampling. Additionally, we demonstrate how to utilize our TAP-based framework for leveraging trained RBMs as joint priors in denoising problems.


Toward Automated Story Generation with Markov Chain Monte Carlo Methods and Deep Neural Networks

AAAI Conferences

In this paper, we introduce an approach to automated story generation using Markov Chain Monte Carlo (MCMC) sampling. This approach uses a sampling algorithm based on Metropolis-Hastings to generate a probability distribution which can be used to generate stories via random sampling that adhere to criteria learned by recurrent neural networks. We show the applicability of our technique through a case study where we generate novel stories using an acceptance criteria learned from a set of movie plots taken from Wikipedia. This study shows that stories generated using this approach adhere to this criteria 85%-86% of the time.