Learning Graphical Models
Fast Calculation of the Knowledge Gradient for Optimization of Deterministic Engineering Simulations
van der Herten, Joachim, Couckuyt, Ivo, Deschrijver, Dirk, Dhaene, Tom
A novel efficient method for computing the Knowledge-Gradient policy for Continuous Parameters (KGCP) for deterministic optimization is derived. The differences with Expected Improvement (EI), a popular choice for Bayesian optimization of deterministic engineering simulations, are explored. Both policies and the Upper Confidence Bound (UCB) policy are compared on a number of benchmark functions including a problem from structural dynamics. It is empirically shown that KGCP has similar performance as the EI policy for many problems, but has better convergence properties for complex (multi-modal) optimization problems as it emphasizes more on exploration when the model is confident about the shape of optimal regions. In addition, the relationship between Maximum Likelihood Estimation (MLE) and slice sampling for estimation of the hyperparameters of the underlying models, and the complexity of the problem at hand, is studied.
A Survey of Deep Learning Techniques Applied to Trading
This thesis uses deep learning algorithms to forecast financial data. The deep learning framework is used to train a neural network. The deep neural network is a Deep Belief Network (DBN) coupled to a Multilayer Perceptron (MLP). It is used to choose stocks to form portfolios. The portfolios have better returns than the median of the stocks forming the list. The stocks forming the S&P 500 are included in the study. The results obtained from the deep neural network are compared to benchmarks from a logistic regression network, a multilayer perceptron and a naive benchmark. The results obtained from the deep neural network are better and more stable than the benchmarks. The findings support that deep learning methods will find their way in finance due to their reliability and good performance.
Datasets VS Algorithms - A Breakthrough in AI 6x Faster -
The past years have witnessed strong emergence for different datasets and algorithms repositories. Some inquiries accompanied this emergence. An increasing amount of market research started to investigate which is more important for the development of Artificial Intelligence (AI) sciences, which segments are of highest demand and can have greater market share in the future. By reviewing the artificial intelligence (AI) breakthroughs timeline over 30 years, Wissner-Gross found that the availability of high-quality datasets was the key limiting factor for AI advances and not algorithms. He also found that high-quality dataset availability can cause a breakthrough in the field of AI six times faster than Algorithms.
An Adaptive Resample-Move Algorithm for Estimating Normalizing Constants
Fraccaro, Marco, Paquet, Ulrich, Winther, Ole
The estimation of normalizing constants is a fundamental step in probabilistic model comparison. Sequential Monte Carlo methods may be used for this task and have the advantage of being inherently parallelizable. However, the standard choice of using a fixed number of particles at each iteration is suboptimal because some steps will contribute disproportionately to the variance of the estimate. We introduce an adaptive version of the Resample-Move algorithm, in which the particle set is adaptively expanded whenever a better approximation of an intermediate distribution is needed. The algorithm builds on the expression for the optimal number of particles and the corresponding minimum variance found under ideal conditions. Benchmark results on challenging Gaussian Process Classification and Restricted Boltzmann Machine applications show that Adaptive Resample-Move (ARM) estimates the normalizing constant with a smaller variance, using less computational resources, than either Resample-Move with a fixed number of particles or Annealed Importance Sampling. A further advantage over Annealed Importance Sampling is that ARM is easier to tune.
The Mathematics of Machine Learning
In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I have observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow, R-caret etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results. Selecting the right algorithm which includes giving considerations to accuracy, training time, model complexity, number of parameters and number of features.
Maximum Likelihood Estimate and Logistic Regression simplified
Least squares regression can cause impossible estimates such as probabilities that are less than zero and greater than 1.So, when the predicted value is measured as a probability, use Logistic Regression We use the log of the odds rather than the odds directly because an odds ratio cannot be a negative number--but its log can be negative. Notice that we have randomly initialized our coefficients for income and other predictors. These will be adjusted by Solver based on a likelihood function.We will cover them later Column H tells us the predicted probability of the borrower's actual behavior, whether that behavior is repayment or default--not simply, as in Column G, the predicted probability of defaulting on the loan. One property of logarithms is that their sum equals the logarithm of the product of the numbers on which they're based The logarithms of probabilities are always negative numbers, but the closer a probability is to 1.0, the closer its logarithm is to 0.0. I haven't covered cross-validation, which is commonly used to validate a logistic regression equation.If you don't always have a large number of cases to work with, a different approach is to use statistical inference.
Viewpoint and Topic Modeling of Current Events
Zhang, Kerry, Karlgren, Jussi, Zhang, Cheng, Lagergren, Jens
There are multiple sides to every story, and while statistical topic models have been highly successful at topically summarizing the stories in corpora of text documents, they do not explicitly address the issue of learning the different sides, the viewpoints, expressed in the documents. In this paper, we show how these viewpoints can be learned completely unsupervised and represented in a human interpretable form. We use a novel approach of applying CorrLDA2 for this purpose, which learns topic-viewpoint relations that can be used to form groups of topics, where each group represents a viewpoint. A corpus of documents about the Israeli-Palestinian conflict is then used to demonstrate how a Palestinian and an Israeli viewpoint can be learned. By leveraging the magnitudes and signs of the feature weights of a linear SVM, we introduce a principled method to evaluate associations between topics and viewpoints. With this, we demonstrate, both quantitatively and qualitatively, that the learned topic groups are contextually coherent, and form consistently correct topic-viewpoint associations.
Bayesian Model Selection Methods for Mutual and Symmetric $k$-Nearest Neighbor Classification
The $k$-nearest neighbor classification method ($k$-NNC) is one of the simplest nonparametric classification methods. The mutual $k$-NN classification method (M$k$NNC) is a variant of $k$-NNC based on mutual neighborship. We propose another variant of $k$-NNC, the symmetric $k$-NN classification method (S$k$NNC) based on both mutual neighborship and one-sided neighborship. The performance of M$k$NNC and S$k$NNC depends on the parameter $k$ as the one of $k$-NNC does. We propose the ways how M$k$NN and S$k$NN classification can be performed based on Bayesian mutual and symmetric $k$-NN regression methods with the selection schemes for the parameter $k$. Bayesian mutual and symmetric $k$-NN regression methods are based on Gaussian process models, and it turns out that they can do M$k$NN and S$k$NN classification with new encodings of target values (class labels). The simulation results show that the proposed methods are better than or comparable to $k$-NNC, M$k$NNC and S$k$NNC with the parameter $k$ selected by the leave-one-out cross validation method not only for an artificial data set but also for real world data sets.
Tutorial on Variational Autoencoders
In just three years, Variational Autoencoders (VAEs) have emerged as one of the most popular approaches to unsupervised learning of complicated distributions. VAEs are appealing because they are built on top of standard function approximators (neural networks), and can be trained with stochastic gradient descent. VAEs have already shown promise in generating many kinds of complicated data, including handwritten digits, faces, house numbers, CIFAR images, physical models of scenes, segmentation, and predicting the future from static images. This tutorial introduces the intuitions behind VAEs, explains the mathematics behind them, and describes some empirical behavior. No prior knowledge of variational Bayesian methods is assumed.
A Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian Networks
Chen, Yetian, Tian, Jin, Nikolova, Olga, Aluru, Srinivas
Exact Bayesian structure discovery in Bayesian networks requires exponential time and space. Using dynamic programming (DP), the fastest known sequential algorithm computes the exact posterior probabilities of structural features in $O(2(d+1)n2^n)$ time and space, if the number of nodes (variables) in the Bayesian network is $n$ and the in-degree (the number of parents) per node is bounded by a constant $d$. Here we present a parallel algorithm capable of computing the exact posterior probabilities for all $n(n-1)$ edges with optimal parallel space efficiency and nearly optimal parallel time efficiency. That is, if $p=2^k$ processors are used, the run-time reduces to $O(5(d+1)n2^{n-k}+k(n-k)^d)$ and the space usage becomes $O(n2^{n-k})$ per processor. Our algorithm is based the observation that the subproblems in the sequential DP algorithm constitute a $n$-$D$ hypercube. We take a delicate way to coordinate the computation of correlated DP procedures such that large amount of data exchange is suppressed. Further, we develop parallel techniques for two variants of the well-known \emph{zeta transform}, which have applications outside the context of Bayesian networks. We demonstrate the capability of our algorithm on datasets with up to 33 variables and its scalability on up to 2048 processors. We apply our algorithm to a biological data set for discovering the yeast pheromone response pathways.