Uncertainty
FML-based Dynamic Assessment Agent for Human-Machine Cooperative System on Game of Go
Lee, Chang-Shing, Wang, Mei-Hui, Yang, Sheng-Chi, Hung, Pi-Hsia, Lin, Su-Wei, Shuo, Nan, Kubota, Naoyuki, Chou, Chun-Hsun, Chou, Ping-Chiang, Kao, Chia-Hsiu
In this paper, we demonstrate the application of Fuzzy Markup Language (FML) to construct an FML-based Dynamic Assessment Agent (FDAA), and we present an FML-based Human-Machine Cooperative System (FHMCS) for the game of Go. The proposed FDAA comprises an intelligent decision-making and learning mechanism, an intelligent game bot, a proximal development agent, and an intelligent agent. The intelligent game bot is based on the open-source code of Facebook Darkforest, and it features a representational state transfer application programming interface mechanism. The proximal development agent contains a dynamic assessment mechanism, a GoSocket mechanism, and an FML engine with a fuzzy knowledge base and rule base. The intelligent agent contains a GoSocket engine and a summarization agent that is based on the estimated win rate, real-time simulation number, and matching degree of predicted moves. Additionally, the FML for player performance evaluation and linguistic descriptions for game results commentary are presented. We experimentally verify and validate the performance of the FDAA and variants of the FHMCS by testing five games in 2016 and 60 games of Google Master Go, a new version of the AlphaGo program, in January 2017. The experimental results demonstrate that the proposed FDAA can work effectively for Go applications.
Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis
Benavoli, Alessio, Corani, Giorgio, Demsar, Janez, Zaffalon, Marco
The machine learning community adopted the use of null hypothesis significance testing (NHST) in order to ensure the statistical validity of results. Many scientific fields however realized the shortcomings of frequentist reasoning and in the most radical cases even banned its use in publications. We should do the same: just as we have embraced the Bayesian paradigm in the development of new machine learning methods, so we should also use it in the analysis of our own results. We argue for abandonment of NHST by exposing its fallacies and, more importantly, offer better - more sound and useful - alternatives for it.
On the Performance of Forecasting Models in the Presence of Input Uncertainty
Sangrody, Hossein, Sarailoo, Morteza, Zhou, Ning, Shokrollahi, Ahmad, Foruzan, Elham
Nowadays, with the unprecedented penetration of renewable distributed energy resources (DERs), the necessity of an efficient energy forecasting model is more demanding than before. Generally, forecasting models are trained using observed weather data while the trained models are applied for energy forecasting using forecasted weather data. In this study, the performance of several commonly used forecasting methods in the presence of weather predictors with uncertainty is assessed and compared. Accordingly, both observed and forecasted weather data are collected, then the influential predictors for solar PV generation forecasting model are selected using several measures. Using observed and forecasted weather data, an analysis on the uncertainty of weather variables is represented by MAE and bootstrapping. The energy forecasting model is trained using observed weather data, and finally, the performance of several commonly used forecasting methods in solar energy forecasting is simulated and compared for a real case study.
End-to-End Learning for Structured Prediction Energy Networks
Belanger, David, Yang, Bishan, McCallum, Andrew
Structured Prediction Energy Networks (SPENs) are a simple, yet expressive family of structured prediction models (Belanger and McCallum, 2016). An energy function over candidate structured outputs is given by a deep network, and predictions are formed by gradient-based optimization. This paper presents end-to-end learning for SPENs, where the energy function is discriminatively trained by back-propagating through gradient-based prediction. In our experience, the approach is substantially more accurate than the structured SVM method of Belanger and McCallum (2016), as it allows us to use more sophisticated non-convex energies. We provide a collection of techniques for improving the speed, accuracy, and memory requirements of end-to-end SPENs, and demonstrate the power of our method on 7-Scenes image denoising and CoNLL-2005 semantic role labeling tasks. In both, inexact minimization of non-convex SPEN energies is superior to baseline methods that use simplistic energy functions that can be minimized exactly.
Learning linear structural equation models in polynomial time and sample complexity
The problem of learning structural equation models (SEMs) from data is a fundamental problem in causal inference. We develop a new algorithm --- which is computationally and statistically efficient and works in the high-dimensional regime --- for learning linear SEMs from purely observational data with arbitrary noise distribution. We consider three aspects of the problem: identifiability, computational efficiency, and statistical efficiency. We show that when data is generated from a linear SEM over $p$ nodes and maximum degree $d$, our algorithm recovers the directed acyclic graph (DAG) structure of the SEM under an identifiability condition that is more general than those considered in the literature, and without faithfulness assumptions. In the population setting, our algorithm recovers the DAG structure in $\mathcal{O}(p(d^2 + \log p))$ operations. In the finite sample setting, if the estimated precision matrix is sparse, our algorithm has a smoothed complexity of $\widetilde{\mathcal{O}}(p^3 + pd^7)$, while if the estimated precision matrix is dense, our algorithm has a smoothed complexity of $\widetilde{\mathcal{O}}(p^5)$. For sub-Gaussian noise, we show that our algorithm has a sample complexity of $\mathcal{O}(\frac{d^8}{\varepsilon^2} \log (\frac{p}{\sqrt{\delta}}))$ to achieve $\varepsilon$ element-wise additive error with respect to the true autoregression matrix with probability at most $1 - \delta$, while for noise with bounded $(4m)$-th moment, with $m$ being a positive integer, our algorithm has a sample complexity of $\mathcal{O}(\frac{d^8}{\varepsilon^2} (\frac{p^2}{\delta})^{1/m})$.
Comparative Study of Inference Methods for Bayesian Nonnegative Matrix Factorisation
Brouwer, Thomas, Frellsen, Jes, Liรณ, Pietro
In this paper, we study the trade-offs of different inference approaches for Bayesian matrix factorisation methods, which are commonly used for predicting missing values, and for finding patterns in the data. In particular, we consider Bayesian nonnegative variants of matrix factorisation and tri-factorisation, and compare non-probabilistic inference, Gibbs sampling, variational Bayesian inference, and a maximum-a-posteriori approach. The variational approach is new for the Bayesian nonnegative models. We compare their convergence, and robustness to noise and sparsity of the data, on both synthetic and real-world datasets. Furthermore, we extend the models with the Bayesian automatic relevance determination prior, allowing the models to perform automatic model selection, and demonstrate its efficiency.
Bayesian Optimization for Probabilistic Programs
Rainforth, Tom, Le, Tuan Anh, van de Meent, Jan-Willem, Osborne, Michael A., Wood, Frank
We present the first general purpose framework for marginal maximum a posteriori estimation of probabilistic program variables. By using a series of code transformations, the evidence of any probabilistic program, and therefore of any graphical model, can be optimized with respect to an arbitrary subset of its sampled variables. To carry out this optimization, we develop the first Bayesian optimization package to directly exploit the source code of its target, leading to innovations in problem-independent hyperpriors, unbounded optimization, and implicit constraint satisfaction; delivering significant performance improvements over prominent existing packages.
Learning Features from Co-occurrences: A Theoretical Analysis
Representing a word by its co-occurrences with other words in context is an effective way to capture the meaning of the word. However, the theory behind remains a challenge. In this work, taking the example of a word classification task, we give a theoretical analysis of the approaches that represent a word X by a function f(P(C X)), where C is a context feature, P(C X) is the conditional probability estimated from a text corpus, and the function f maps the co-occurrence measure to a prediction score. We investigate the impact of context feature C and the function f. We also explain the reasons why using the co-occurrences with multiple context features may be better than just using a single one. In addition, some of the results shed light on the theory of feature learning and machine learning in general.
PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach
Goyal, Anil, Morvant, Emilie, Germain, Pascal, Amini, Massih-Reza
We study a two-level multiview learning with more than two views under the PAC-Bayesian framework. This approach, sometimes referred as late fusion, consists in learning sequentially multiple view-specific classifiers at the first level, and then combining these view-specific classifiers at the second level. Our main theoretical result is a generalization bound on the risk of the majority vote which exhibits a term of diversity in the predictions of the view-specific classifiers. From this result it comes out that controlling the trade-off between diversity and accuracy is a key element for multiview learning, which complements other results in multiview learning. Finally, we experiment our principle on multiview datasets extracted from the Reuters RCV1/RCV2 collection.
Knowledge Elicitation via Sequential Probabilistic Inference for High-Dimensional Prediction
Daee, Pedram, Peltola, Tomi, Soare, Marta, Kaski, Samuel
Prediction in a small-sized sample with a large number of covariates, the "small n, large p" problem, is challenging. This setting is encountered in multiple applications, such as precision medicine, where obtaining additional samples can be extremely costly or even impossible, and extensive research effort has recently been dedicated to finding principled solutions for accurate prediction. However, a valuable source of additional information, domain experts, has not yet been efficiently exploited. We formulate knowledge elicitation generally as a probabilistic inference process, where expert knowledge is sequentially queried to improve predictions. In the specific case of sparse linear regression, where we assume the expert has knowledge about the values of the regression coefficients or about the relevance of the features, we propose an algorithm and computational approximation for fast and efficient interaction, which sequentially identifies the most informative features on which to query expert knowledge. Evaluations of our method in experiments with simulated and real users show improved prediction accuracy already with a small effort from the expert.