Undirected Networks
Trading USDCHF filtered by Gold dynamics via HMM coupling
We devise a USDCHF trading strategy using the dynamics of gold as a filter. Our strategy involves modelling both USDCHF and gold using a coupled hidden Markov model (CHMM). The observations will be indicators, RSI and CCI, which will be used as triggers for our trading signals. Upon decoding the model in each iteration, we can get the next most probable state and the next most probable observation. Hopefully by taking advantage of intermarket analysis and the Markov property implicit in the model, trading with these most probable values will produce profitable results.
A Tensor Approach to Learning Mixed Membership Community Models
Anandkumar, Anima, Ge, Rong, Hsu, Daniel, Kakade, Sham M.
Community detection is the task of detecting hidden communities from observed interactions. Guaranteed community detection has so far been mostly limited to models with non-overlapping communities such as the stochastic block model. In this paper, we remove this restriction, and provide guaranteed community detection for a family of probabilistic network models with overlapping communities, termed as the mixed membership Dirichlet model, first introduced by Airoldi et al. This model allows for nodes to have fractional memberships in multiple communities and assumes that the community memberships are drawn from a Dirichlet distribution. Moreover, it contains the stochastic block model as a special case. We propose a unified approach to learning these models via a tensor spectral decomposition method. Our estimator is based on low-order moment tensor of the observed network, consisting of 3-star counts. Our learning method is fast and is based on simple linear algebraic operations, e.g. singular value decomposition and tensor power iterations. We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method. As an important special case, our results match the best known scaling requirements for the (homogeneous) stochastic block model.
Provable Bounds for Learning Some Deep Representations
Arora, Sanjeev, Bhaskara, Aditya, Ge, Rong, Ma, Tengyu
We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an $n$ node multilayer neural net that has degree at most $n^{\gamma}$ for some $\gamma <1$ and each edge has a random edge weight in $[-1,1]$. Our algorithm learns {\em almost all} networks in this class with polynomial running time. The sample complexity is quadratic or cubic depending upon the details of the model. The algorithm uses layerwise learning. It is based upon a novel idea of observing correlations among features and using these to infer the underlying edge structure via a global graph recovery procedure. The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.
Generic identification of binary-valued hidden Markov processes
The generic identification problem is to decide whether a stochastic process $(X_t)$ is a hidden Markov process and if yes to infer its parameters for all but a subset of parametrizations that form a lower-dimensional subvariety in parameter space. Partial answers so far available depend on extra assumptions on the processes, which are usually centered around stationarity. Here we present a general solution for binary-valued hidden Markov processes. Our approach is rooted in algebraic statistics hence it is geometric in nature. We find that the algebraic varieties associated with the probability distributions of binary-valued hidden Markov processes are zero sets of determinantal equations which draws a connection to well-studied objects from algebra. As a consequence, our solution allows for algorithmic implementation based on elementary (linear) algebraic routines.
A Survey of Multi-Objective Sequential Decision-Making
Roijers, D. M., Vamplew, P., Whiteson, S., Dazeley, R.
Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work.
Inference, Sampling, and Learning in Copula Cumulative Distribution Networks
The cumulative distribution network (CDN) is a recently developed class of probabilistic graphical models (PGMs) permitting a copula factorization, in which the CDF, rather than the density, is factored. Despite there being much recent interest within the machine learning community about copula representations, there has been scarce research into the CDN, its amalgamation with copula theory, and no evaluation of its performance. Algorithms for inference, sampling, and learning in these models are underdeveloped compared those of other PGMs, hindering widerspread use. One advantage of the CDN is that it allows the factors to be parameterized as copulae, combining the benefits of graphical models with those of copula theory. In brief, the use of a copula parameterization enables greater modelling flexibility by separating representation of the marginals from the dependence structure, permitting more efficient and robust learning. Another advantage is that the CDN permits the representation of implicit latent variables, whose parameterization and connectivity are not required to be specified. Unfortunately, that the model can encode only latent relationships between variables severely limits its utility. In this thesis, we present inference, learning, and sampling for CDNs, and further the state-of-the-art. First, we explain the basics of copula theory and the representation of copula CDNs. Then, we discuss inference in the models, and develop the first sampling algorithm. We explain standard learning methods, propose an algorithm for learning from data missing completely at random (MCAR), and develop a novel algorithm for learning models of arbitrary treewidth and size. Properties of the models and algorithms are investigated through Monte Carlo simulations. We conclude with further discussion of the advantages and limitations of CDNs, and suggest future work.
Variance Adjusted Actor Critic Algorithms
In Reinforcement Learning (RL; [1]) and planning in Markov Decision Processes (MDPs; [2]), the typical objective is to maximize the cumulative (possibly discounted) expected reward, denoted by J. When the model's parameters are known, several well-established and efficient optimization algorithms are known. When the model parameters are not known, learning is needed and there are several algorithmic frameworks that solve the learning problem effectively, at least when the model is finite. Among these, actor-critic methods [3] are known to be particularly efficient. In typical actor-critic algorithms, the critic maintains an estimate of the value function - the expected reward-to-go. This function is then used by the actor to estimate the gradient of the objective with respect to some policy parameters, and then improve the policy by modifying the parameters in the direction of the gradient. The theory that underlies actor-critic algorithms is the policy gradient theorem [4], which relates the value function with the policy gradient.
New Potentials for Data-Driven Intelligent Tutoring System Development and Optimization
Koedinger, Kenneth R. (Carnegie Mellon University) | Brunskill, Emma (Carnegie Mellon University) | Baker, Ryan S.J.d. (Columbia University) | McLaughlin, Elizabeth A. (Carnegie Mellon University) | Stamper, John (Carnegie Mellon University)
Increasing widespread use of educational technologies is producing vast amounts of data. Such data can be used to help advance our understanding of student learning and enable more intelligent, interactive, engaging, and effective education. In this article, we discuss the status and prospects of this new and powerful opportunity for data-driven development and optimization of educational technologies, focusing on intelligent tutoring systems We provide examples of use of a variety of techniques to develop or optimize the select, evaluate, suggest, and update functions of intelligent tutors, including probabilistic grammar learning, rule induction, Markov decision process, classification, and integrations of symbolic search and statistical inference.
Understanding Boltzmann Machine and Deep Learning via A Confident Information First Principle
Zhao, Xiaozhao, Hou, Yuexian, Yu, Qian, Song, Dawei, Li, Wenjie
Typical dimensionality reduction methods focus on directly reducing the number of random variables while retaining maximal variations in the data. In this paper, we consider the dimensionality reduction in parameter spaces of binary multivariate distributions. We propose a general Confident-Information-First (CIF) principle to maximally preserve parameters with confident estimates and rule out unreliable or noisy parameters. Formally, the confidence of a parameter can be assessed by its Fisher information, which establishes a connection with the inverse variance of any unbiased estimate for the parameter via the Cram\'{e}r-Rao bound. We then revisit Boltzmann machines (BM) and theoretically show that both single-layer BM without hidden units (SBM) and restricted BM (RBM) can be solidly derived using the CIF principle. This can not only help us uncover and formalize the essential parts of the target density that SBM and RBM capture, but also suggest that the deep neural network consisting of several layers of RBM can be seen as the layer-wise application of CIF. Guided by the theoretical analysis, we develop a sample-specific CIF-based contrastive divergence (CD-CIF) algorithm for SBM and a CIF-based iterative projection procedure (IP) for RBM. Both CD-CIF and IP are studied in a series of density estimation experiments.
Towards common-sense reasoning via conditional simulation: legacies of Turing in Artificial Intelligence
Freer, Cameron E., Roy, Daniel M., Tenenbaum, Joshua B.
The problem of replicating the flexibility of human common-sense reasoning has captured the imagination of computer scientists since the early days of Alan Turing's foundational work on computation and the philosophy of artificial intelligence. In the intervening years, the idea of cognition as computation has emerged as a fundamental tenet of Artificial Intelligence (AI) and cognitive science. But what kind of computation is cognition? We describe a computational formalism centered around a probabilistic Turing machine called QUERY, which captures the operation of probabilistic conditioning via conditional simulation. Through several examples and analyses, we demonstrate how the QUERY abstraction can be used to cast common-sense reasoning as probabilistic inference in a statistical model of our observations and the uncertain structure of the world that generated that experience. This formulation is a recent synthesis of several research programs in AI and cognitive science, but it also represents a surprising convergence of several of Turing's pioneering insights in AI, the foundations of computation, and statistics.