Undirected Networks
Towards reduction of autocorrelation in HMC by machine learning
In this paper we propose new algorithm to reduce autocorrelation in Markov chain Monte-Carlo algorithms for euclidean field theories on the lattice. Our proposing algorithm is the Hybrid Monte-Carlo algorithm (HMC) with restricted Boltzmann machine. We examine the validity of the algorithm by employing the phi-fourth theory in three dimension. We observe reduction of the autocorrelation both in symmetric and broken phase as well. Our proposing algorithm provides consistent central values of expectation values of the action density and one-point Green's function with ones from the original HMC in both the symmetric phase and broken phase within the statistical error. On the other hand, two-point Green's functions have slight difference between one calculated by the HMC and one by our proposing algorithm in the symmetric phase. Furthermore, near the criticality, the distribution of the one-point Green's function differs from the one from HMC. We discuss the origin of discrepancies and its improvement.
Variational approach for learning Markov processes from time series data
Inference, prediction and control of complex dynamical systems from time series is important in many areas, including financial markets, power grid management, climate and weather modeling, or molecular dynamics. The analysis of such highly nonlinear dynamical systems is facilitated by the fact that we can often find a (generally nonlinear) transformation of the system coordinates to features in which the dynamics can be excellently approximated by a linear Markovian model. Moreover, the large number of system variables often change collectively on large time- and length-scales, facilitating a low-dimensional analysis in feature space. In this paper, we introduce a variational approach for Markov processes (VAMP) that allows us to find optimal feature mappings and optimal Markovian models of the dynamics from given time series data. The key insight is that the best linear model can be obtained from the top singular components of the Koopman operator. This leads to the definition of a family of score functions called VAMP-r which can be calculated from data, and can be employed to optimize a Markovian model. In addition, based on the relationship between the variational scores and approximation errors of Koopman operators, we propose a new VAMP-E score, which can be applied to cross-validation for hyper-parameter optimization and model selection in VAMP. VAMP is valid for both reversible and nonreversible processes and for stationary and non-stationary processes or realizations.
Online Factorization and Partition of Complex Networks From Random Walks
Yang, Lin F., Braverman, Vladimir, Zhao, Tuo, Wang, Mengdi
Finding the reduced-dimensional structure is critical to understanding complex networks. Existing approaches such as spectral clustering are applicable only when the full network is explicitly observed. In this paper, we focus on the online factorization and partition of implicit large-scale networks based on observations from an associated random walk. We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm. The algorithm is able to process dependent state-transition data dynamically generated by the underlying network and learn a low-dimensional representation for each vertex. By applying a diffusion approximation analysis, we show that the continuous-time limiting process of the stochastic algorithm converges globally to the "principal components" of the Markov chain and achieves a nearly optimal sample complexity. Once given the learned low-dimensional representations, we further apply clustering techniques to recover the network partition. We show that when the associated Markov process is lumpable, one can recover the partition exactly with high probability. We apply the proposed approach to model the traffic flow of Manhattan as city-wide random walks. By using our algorithm to analyze the taxi trip data, we discover a latent partition of the Manhattan city that closely matches the traffic dynamics.
Measuring Territorial Control in Civil Wars Using Hidden Markov Models: A Data Informatics-Based Approach
Anders, Therese, Xu, Hong, Cheng, Cheng, Kumar, T. K. Satish
Territorial control is a key aspect shaping the dynamics of civil war. Despite its importance, we lack data on territorial control that are fine-grained enough to account for subnational spatio-temporal variation and that cover a large set of conflicts. To resolve this issue, we propose a theoretical model of the relationship between territorial control and tactical choice in civil war and outline how Hidden Markov Models (HMMs) are suitable to capture theoretical intuitions and estimate levels of territorial control. We discuss challenges of using HMMs in this application and mitigation strategies for future work.
An investor's view of AI in 2018
Artificial Intelligence has become a buzzword for investors of late, many of whom recognize its enormous potential to become the most game-changing technology since the industrial revolution. Indeed, the projected impact of AI is likely to be greater than all prior tech trends combined, and savvy investors would be wise not to miss out. From an investor's point of view, you can divide the AI sector into a few major sub-sectors: infrastructure, algorithms, platforms, and applications. The infrastructure segment includes technologies and companies that provide the underpinnings enabling AI: machine learning, deep learning, natural language processing, and computer vision, including cloud infrastructure, specialized semiconductors, large-volume storage devices, low-latency databases, edge-based computing elements, and more. On the algorithm side, one would primarily count neural nets, classification, and clustering algorithms, good old Bayesian networks, and hidden Markov models.
Building competitive direct acoustics-to-word models for English conversational speech recognition
Audhkhasi, Kartik, Kingsbury, Brian, Ramabhadran, Bhuvana, Saon, George, Picheny, Michael
Direct acoustics-to-word (A2W) models in the end-to-end paradigm have received increasing attention compared to conventional sub-word based automatic speech recognition models using phones, characters, or context-dependent hidden Markov model states. This is because A2W models recognize words from speech without any decoder, pronunciation lexicon, or externally-trained language model, making training and decoding with such models simple. Prior work has shown that A2W models require orders of magnitude more training data in order to perform comparably to conventional models. Our work also showed this accuracy gap when using the English Switchboard-Fisher data set. This paper describes a recipe to train an A2W model that closes this gap and is at-par with state-of-the-art sub-word based models. We achieve a word error rate of 8.8%/13.9% on the Hub5-2000 Switchboard/CallHome test sets without any decoder or language model. We find that model initialization, training data order, and regularization have the most impact on the A2W model performance. Next, we present a joint word-character A2W model that learns to first spell the word and then recognize it. This model provides a rich output to the user instead of simple word hypotheses, making it especially useful in the case of words unseen or rarely-seen during training.
A trans-disciplinary review of deep learning research for water resources scientists
Deep learning (DL), a new-generation artificial neural network research, has made profound strides in recent years. This review paper is intended to provide water resources scientists with a simple technical overview, trans-disciplinary progress update, and potentially inspirations about DL. Effective architectures, more accessible data, advances in regularization, and new computing power enabled the success of DL. A trans-disciplinary review reveals that DL is rapidly transforming myriad scientific disciplines including high-energy physics, astronomy, chemistry, genomics and remote sensing, where systematic DL toolkits, innovative customizations, and sub-disciplines have emerged. However, with a few exceptions, its adoption in hydrology has so far been gradual. The literature suggests that novel regularization techniques can effectively prevent high-capacity deep networks from overfitting. As a result, in most scientific disciplines, DL models demonstrated superior predictive and generalization performance to conventional methods. Meanwhile, less noticed is that DL may also serve as a scientific exploratory tool. A new area termed "AI neuroscience", has been born. This budding sub-discipline is accumulating a significant body of work, e.g., distilling knowledge obtained in DL networks to interpretable models, attributing decisions to inputs via back-propagation of relevance, or visualization of activations. These methods are designed to interpret the decision process of deep networks and derive insights. While scientists so far have mostly been using customized, ad-hoc methods for interpretation, vast opportunities await for DL to propel advancement in water science.
Properties and Bayesian fitting of restricted Boltzmann machines
Kaplan, Andee, Nordman, Daniel, Vardeman, Stephen
A restricted Boltzmann machine (RBM) is an undirected graphical model constructed for discrete or continuous random variables, with two layers, one hidden and one visible, and no conditional dependency within a layer. In recent years, RBMs have risen to prominence due to their connection to deep learning. By treating a hidden layer of one RBM as the visible layer in a second RBM, a deep architecture can be created. RBMs are thought to thereby have the ability to encode very complex and rich structures in data, making them attractive for supervised learning. However, the generative behavior of RBMs is largely unexplored. In this paper, we discuss the relationship between RBM parameter specification in the binary case and model properties such as degeneracy, instability and uninterpretability. We also describe the difficulties that arise in likelihood-based and Bayes fitting of such (highly flexible) models, especially as Gibbs sampling (quasi-Bayes) methods are often advocated for the RBM model structure.
Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization
Golovin, Daniel, Krause, Andreas
Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive submodularity, generalizing submodular set functions to adaptive policies. We prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy. In addition to providing performance guarantees for both stochastic maximization and coverage, adaptive submodularity can be exploited to drastically speed up the greedy algorithm by using lazy evaluations. We illustrate the usefulness of the concept by giving several examples of adaptive submodular objectives arising in diverse AI applications including management of sensing resources, viral marketing and active learning. Proving adaptive submodularity for these problems allows us to recover existing results in these applications as special cases, improve approximation guarantees and handle natural generalizations.
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Tang, Haoran, Houthooft, Rein, Foote, Davis, Stooke, Adam, Chen, Xi, Duan, Yan, Schulman, John, De Turck, Filip, Abbeel, Pieter
Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs). It is generally thought that count-based methods cannot be applied in high-dimensional state spaces, since most states will only occur once. Recent deep RL exploration strategies are able to deal with high-dimensional continuous state spaces through complex heuristics, often relying on optimism in the face of uncertainty or intrinsic motivation. In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks. States are mapped to hash codes, which allows to count their occurrences with a hash table. These counts are then used to compute a reward bonus according to the classic count-based exploration theory. We find that simple hash functions can achieve surprisingly good results on many challenging tasks. Furthermore, we show that a domain-dependent learned hash code may further improve these results. Detailed analysis reveals important aspects of a good hash function: 1) having appropriate granularity and 2) encoding information relevant to solving the MDP. This exploration strategy achieves near state-of-the-art performance on both continuous control tasks and Atari 2600 games, hence providing a simple yet powerful baseline for solving MDPs that require considerable exploration.