Uncertainty
Fast Calculation of the Knowledge Gradient for Optimization of Deterministic Engineering Simulations
van der Herten, Joachim, Couckuyt, Ivo, Deschrijver, Dirk, Dhaene, Tom
A novel efficient method for computing the Knowledge-Gradient policy for Continuous Parameters (KGCP) for deterministic optimization is derived. The differences with Expected Improvement (EI), a popular choice for Bayesian optimization of deterministic engineering simulations, are explored. Both policies and the Upper Confidence Bound (UCB) policy are compared on a number of benchmark functions including a problem from structural dynamics. It is empirically shown that KGCP has similar performance as the EI policy for many problems, but has better convergence properties for complex (multi-modal) optimization problems as it emphasizes more on exploration when the model is confident about the shape of optimal regions. In addition, the relationship between Maximum Likelihood Estimation (MLE) and slice sampling for estimation of the hyperparameters of the underlying models, and the complexity of the problem at hand, is studied.
Minds and machines: The art of forecasting in the age of artificial intelligence
Two of today's major business and intellectual trends offer complementary insights about the challenge of making forecasts in a complex and rapidly changing world. Forty years of behavioral science research into the psychology of probabilistic reasoning have revealed the surprising extent to which people routinely base judgments and forecasts on systematically biased mental heuristics rather than careful assessments of evidence. These findings have fundamental implications for decision making, ranging from the quotidian (scouting baseball players and underwriting insurance contracts) to the strategic (estimating the time, expense, and likely success of a project or business initiative) to the existential (estimating security and terrorism risks). The bottom line: Unaided judgment is an unreliable guide to action. Consider psychologist Philip Tetlock's celebrated multiyear study concluding that even top journalists, historians, and political experts do little better than random chance at forecasting such political events as revolutions and regime changes.1 The second trend is the increasing ubiquity of data-driven decision making and artificial intelligence applications. Once again, an important lesson comes from behavioral science: A body of research dating back to the 1950s has established that even simple predictive models outperform human experts' ability to make predictions and forecasts. This implies that judiciously constructed predictive models can augment human intelligence by helping humans avoid common cognitive traps.
An Adaptive Resample-Move Algorithm for Estimating Normalizing Constants
Fraccaro, Marco, Paquet, Ulrich, Winther, Ole
The estimation of normalizing constants is a fundamental step in probabilistic model comparison. Sequential Monte Carlo methods may be used for this task and have the advantage of being inherently parallelizable. However, the standard choice of using a fixed number of particles at each iteration is suboptimal because some steps will contribute disproportionately to the variance of the estimate. We introduce an adaptive version of the Resample-Move algorithm, in which the particle set is adaptively expanded whenever a better approximation of an intermediate distribution is needed. The algorithm builds on the expression for the optimal number of particles and the corresponding minimum variance found under ideal conditions. Benchmark results on challenging Gaussian Process Classification and Restricted Boltzmann Machine applications show that Adaptive Resample-Move (ARM) estimates the normalizing constant with a smaller variance, using less computational resources, than either Resample-Move with a fixed number of particles or Annealed Importance Sampling. A further advantage over Annealed Importance Sampling is that ARM is easier to tune.
The Mathematics of Machine Learning
In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I have observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow, R-caret etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results. Selecting the right algorithm which includes giving considerations to accuracy, training time, model complexity, number of parameters and number of features.
The Spectral Condition Number Plot for Regularization Parameter Determination
Peeters, Carel F. W., van de Wiel, Mark A., van Wieringen, Wessel N.
Many modern statistical applications ask for the estimation of a covariance (or precision) matrix in settings where the number of variables is larger than the number of observations. There exists a broad class of ridge-type estimators that employs regularization to cope with the subsequent singularity of the sample covariance matrix. These estimators depend on a penalty parameter and choosing its value can be hard, in terms of being computationally unfeasible or tenable only for a restricted set of ridge-type estimators. Here we introduce a simple graphical tool, the spectral condition number plot, for informed heuristic penalty parameter selection. The proposed tool is computationally friendly and can be employed for the full class of ridge-type covariance (precision) estimators.
Viewpoint and Topic Modeling of Current Events
Zhang, Kerry, Karlgren, Jussi, Zhang, Cheng, Lagergren, Jens
There are multiple sides to every story, and while statistical topic models have been highly successful at topically summarizing the stories in corpora of text documents, they do not explicitly address the issue of learning the different sides, the viewpoints, expressed in the documents. In this paper, we show how these viewpoints can be learned completely unsupervised and represented in a human interpretable form. We use a novel approach of applying CorrLDA2 for this purpose, which learns topic-viewpoint relations that can be used to form groups of topics, where each group represents a viewpoint. A corpus of documents about the Israeli-Palestinian conflict is then used to demonstrate how a Palestinian and an Israeli viewpoint can be learned. By leveraging the magnitudes and signs of the feature weights of a linear SVM, we introduce a principled method to evaluate associations between topics and viewpoints. With this, we demonstrate, both quantitatively and qualitatively, that the learned topic groups are contextually coherent, and form consistently correct topic-viewpoint associations.
Bayesian Model Selection Methods for Mutual and Symmetric $k$-Nearest Neighbor Classification
The $k$-nearest neighbor classification method ($k$-NNC) is one of the simplest nonparametric classification methods. The mutual $k$-NN classification method (M$k$NNC) is a variant of $k$-NNC based on mutual neighborship. We propose another variant of $k$-NNC, the symmetric $k$-NN classification method (S$k$NNC) based on both mutual neighborship and one-sided neighborship. The performance of M$k$NNC and S$k$NNC depends on the parameter $k$ as the one of $k$-NNC does. We propose the ways how M$k$NN and S$k$NN classification can be performed based on Bayesian mutual and symmetric $k$-NN regression methods with the selection schemes for the parameter $k$. Bayesian mutual and symmetric $k$-NN regression methods are based on Gaussian process models, and it turns out that they can do M$k$NN and S$k$NN classification with new encodings of target values (class labels). The simulation results show that the proposed methods are better than or comparable to $k$-NNC, M$k$NNC and S$k$NNC with the parameter $k$ selected by the leave-one-out cross validation method not only for an artificial data set but also for real world data sets.
Credibilistic TOPSIS Model for Evaluation and Selection of Municipal Solid Waste Disposal Methods
Roy, Jagannath, Adhikary, Krishnendu, Kar, Samarjit
Municipal solid waste management (MSWM) is a challenging issue of urban development in developing countries. Each country having different socio-economic-environmental background, might not accept a particular disposal method as the optimal choice. Selection of suitable disposal method in MSWM, under vague and imprecise information can be considered as multi criteria decision making problem (MCDM). In the present paper, TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) methodology is extended based on credibility theory for evaluating the performances of MSW disposal methods under some criteria fixed by experts. The proposed model helps decision makers to choose a preferable alternative for their municipal area. A sensitivity analysis by our proposed model confirms this fact.
PAC-Bayesian Theorems for Domain Adaptation with Specialization to Linear Classifiers
Germain, Pascal, Habrard, Amaury, Laviolette, François, Morvant, Emilie
In this paper, we provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different target distribution. On the one hand, we propose an improvement of the previous approach proposed by Germain et al. (2013), that relies on a novel distribution pseudodistance based on a disagreement averaging, allowing us to derive a new tighter PAC-Bayesian domain adaptation bound for the stochastic Gibbs classifier. We specialize it to linear classifiers, and design a learning algorithm which shows interesting results on a synthetic problem and on a popular sentiment annotation task. On the other hand, we generalize these results to multisource domain adaptation allowing us to take into account different source domains. This study opens the door to tackle domain adaptation tasks by making use of all the PAC-Bayesian tools.
Classification with the pot-pot plot
Pokotylo, Oleksii, Mosler, Karl
We propose a procedure for supervised classification that is based on potential functions. The potential of a class is defined as a kernel density estimate multiplied by the class's prior probability. The method transforms the data to a potential-potential (pot-pot) plot, where each data point is mapped to a vector of potentials. Separation of the classes, as well as classification of new data points, is performed on this plot. For this, either the $\alpha$-procedure ($\alpha$-P) or $k$-nearest neighbors ($k$-NN) are employed. For data that are generated from continuous distributions, these classifiers prove to be strongly Bayes-consistent. The potentials depend on the kernel and its bandwidth used in the density estimate. We investigate several variants of bandwidth selection, including joint and separate pre-scaling and a bandwidth regression approach. The new method is applied to benchmark data from the literature, including simulated data sets as well as 50 sets of real data. It compares favorably to known classification methods such as LDA, QDA, max kernel density estimates, $k$-NN, and $DD$-plot classification using depth functions.