Uncertainty
A U-statistic Approach to Hypothesis Testing for Structure Discovery in Undirected Graphical Models
Bounliphone, Wacha, Blaschko, Matthew
Structure discovery in graphical models is the determination of the topology of a graph that encodes conditional independence properties of the joint distribution of all variables in the model. For some class of probability distributions, an edge between two variables is present if and only if the corresponding entry in the precision matrix is non-zero. For a finite sample estimate of the precision matrix, entries close to zero may be due to low sample effects, or due to an actual association between variables; these two cases are not readily distinguishable. %Fisher provided a hypothesis test based on a parametric approximation to the distribution of an entry in the precision matrix of a Gaussian distribution, but this may not provide valid upper bounds on $p$-values for non-Gaussian distributions. Many related works on this topic consider potentially restrictive distributional or sparsity assumptions that may not apply to a data sample of interest, and direct estimation of the uncertainty of an estimate of the precision matrix for general distributions remains challenging. Consequently, we make use of results for $U$-statistics and apply them to the covariance matrix. By probabilistically bounding the distortion of the covariance matrix, we can apply Weyl's theorem to bound the distortion of the precision matrix, yielding a conservative, but sound test threshold for a much wider class of distributions than considered in previous works. The resulting test enables one to answer with statistical significance whether an edge is present in the graph, and convergence results are known for a wide range of distributions. The computational complexities is linear in the sample size enabling the application of the test to large data samples for which computation time becomes a limiting factor. We experimentally validate the correctness and scalability of the test on multivariate distributions for which the distributional assumptions of competing tests result in underestimates of the false positive ratio. By contrast, the proposed test remains sound, promising to be a useful tool for hypothesis testing for diverse real-world problems.
Feature extraction using Latent Dirichlet Allocation and Neural Networks: A case study on movie synopses
Feature extraction has gained increasing attention in the field of machine learning, as in order to detect patterns, extract information, or predict future observations from big data, the urge of informative features is crucial. The process of extracting features is highly linked to dimensionality reduction as it implies the transformation of the data from a sparse high-dimensional space, to higher level meaningful abstractions. This dissertation employs Neural Networks for distributed paragraph representations, and Latent Dirichlet Allocation to capture higher level features of paragraph vectors. Although Neural Networks for distributed paragraph representations are considered the state of the art for extracting paragraph vectors, we show that a quick topic analysis model such as Latent Dirichlet Allocation can provide meaningful features too. We evaluate the two methods on the CMU Movie Summary Corpus, a collection of 25,203 movie plot summaries extracted from Wikipedia. Finally, for both approaches, we use K-Nearest Neighbors to discover similar movies, and plot the projected representations using T-Distributed Stochastic Neighbor Embedding to depict the context similarities. These similarities, expressed as movie distances, can be used for movies recommendation. The recommended movies of this approach are compared with the recommended movies from IMDB, which use a collaborative filtering recommendation approach, to show that our two models could constitute either an alternative or a supplementary recommendation approach.
Learning to Generate Posters of Scientific Papers
Qiang, Yuting, Fu, Yanwei, Guo, Yanwen, Zhou, Zhi-Hua, Sigal, Leonid
Researchers often summarize their work in the form of posters. Posters provide a coherent and efficient way to convey core ideas from scientific papers. Generating a good scientific poster, however, is a complex and time consuming cognitive task, since such posters need to be readable, informative, and visually aesthetic. In this paper, for the first time, we study the challenging problem of learning to generate posters from scientific papers. To this end, a data-driven framework, that utilizes graphical models, is proposed. Specifically, given content to display, the key elements of a good poster, including panel layout and attributes of each panel, are learned and inferred from data. Then, given inferred layout and attributes, composition of graphical elements within each panel is synthesized. To learn and validate our model, we collect and make public a Poster-Paper dataset, which consists of scientific papers and corresponding posters with exhaustively labelled panels and attributes. Qualitative and quantitative results indicate the effectiveness of our approach.
R and Stan: introduction to Bayesian modeling
I wrote a series of blog posts on Bayesian modeling with R and Stan. Stan is a growing platform for MC(MC) computing implemented with C . Compared to WinBUGS or OpenBUGS, it is very fast and programmable intuitively. This series of the posts show how to install Stan on R, how to run it, and how to apply it to actual datasets. I hope you'll find it to practice Bayesian modeling easier than ever.
Bayesian machine learning
So you know the Bayes rule. How does it relate to machine learning? It can be quite difficult to grasp how the puzzle pieces fit together โ we know it took us a while. This article is an introduction we wish we had back then. While we have some grasp on the matter, we're not experts, so the following might contain inaccuracies or even outright errors.
Variance, Clustering, and Density Estimation Revisited
We propose here a simple, robust and scalable technique to perform supervised clustering on numerical data. It can also be used for density estimation, and even to define a concept of variance that is scale-invariant. This is part of our general statistical framework for data science. Here we discuss clustering and density estimation on the grid. The grid can be seen as an 2-dimensional or 3-dimensional array.
Robustness of Bayesian Pool-based Active Learning Against Prior Misspecification
Cuong, Nguyen Viet, Ye, Nan, Lee, Wee Sun
We study the robustness of active learning (AL) algorithms against prior misspecification: whether an algorithm achieves similar performance using a perturbed prior as compared to using the true prior. In both the average and worst cases of the maximum coverage setting, we prove that all $\alpha$-approximate algorithms are robust (i.e., near $\alpha$-approximate) if the utility is Lipschitz continuous in the prior. We further show that robustness may not be achieved if the utility is non-Lipschitz. This suggests we should use a Lipschitz utility for AL if robustness is required. For the minimum cost setting, we can also obtain a robustness result for approximate AL algorithms. Our results imply that many commonly used AL algorithms are robust against perturbed priors. We then propose the use of a mixture prior to alleviate the problem of prior misspecification. We analyze the robustness of the uniform mixture prior and show experimentally that it performs reasonably well in practice.
3 Must-Ask Questions Before Choosing That Machine Learning Algorithm!
You know that you want to build a predictive model. You've framed your problem in terms of classification or regression. You've prepared some training data (which took an age). You've heard or experienced first hand that Random Forests, Elastic Net Regression or Deep Belief Networks are "the business" and so you're going to use one of these (you've probably already verified that these algorithms are appropriate to your problem based on their general capabilities: whether it be their ability to deal with real valued data, "big" streaming data, multiple classes and so on). However, no two algorithms are the same (if they were we'd simply have fewer to choose from). As such there are a host of questions that you may not have even thought to ask which could make or break your choice.
What are some recent advances in non-convex optimization research?
What are some recent advances in non-convex optimization research? Non-convex optimization is now ubiquitous in machine learning. While previously, the focus was on convex relaxation methods, now the emphasis is on being able to solve non-convex problems directly. It is not possible to find the global optimum of every non-convex problem due to NP-hardness barrier. An alternate approach is: when can it be solved efficiently (preferably in low order polynomial time).
Towards Practical Bayesian Parameter and State Estimation
Erol, Yusuf Bugra, Wu, Yi, Li, Lei, Russell, Stuart
Joint state and parameter estimation is a core problem for dynamic Bayesian networks. Although modern probabilistic inference toolkits make it relatively easy to specify large and practically relevant probabilistic models, the silver bullet---an efficient and general online inference algorithm for such problems---remains elusive, forcing users to write special-purpose code for each application. We propose a novel blackbox algorithm -- a hybrid of particle filtering for state variables and assumed density filtering for parameter variables. It has following advantages: (a) it is efficient due to its online nature, and (b) it is applicable to both discrete and continuous parameter spaces . On a variety of toy and real models, our system is able to generate more accurate results within a fixed computation budget. This preliminary evidence indicates that the proposed approach is likely to be of practical use.