Statistical Learning
Analysis of Watson's Strategies for Playing Jeopardy!
Tesauro, G., Gondek, D. C., Lenchner, J., Fan, J., Prager, J. M.
Major advances in Question Answering technology were needed for IBM Watson to play Jeopardy! at championship level -- the show requires rapid-fire answers to challenging natural language questions, broad general knowledge, high precision, and accurate confidence estimates. In addition, Jeopardy! features four types of decision making carrying great strategic importance: (1) Daily Double wagering; (2) Final Jeopardy wagering; (3) selecting the next square when in control of the board; (4) deciding whether to attempt to answer, i.e., "buzz in." Using sophisticated strategies for these decisions, that properly account for the game state and future event probabilities, can significantly boost a player's overall chances to win, when compared with simple "rule of thumb" strategies. This article presents our approach to developing Watson's game-playing strategies, comprising development of a faithful simulation model, and then using learning and Monte-Carlo methods within the simulator to optimize Watson's strategic decision-making. After giving a detailed description of each of our game-strategy algorithms, we then focus in particular on validating the accuracy of the simulator's predictions, and documenting performance improvements using our methods. Quantitative performance benefits are shown with respect to both simple heuristic strategies, and actual human contestant performance in historical episodes. We further extend our analysis of human play to derive a number of valuable and counterintuitive examples illustrating how human contestants may improve their performance on the show.
Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data
Pyne, Saumyadipta, Wang, Kui, Irish, Jonathan, Tamayo, Pablo, Nazaire, Marc-Danie, Duong, Tarn, Lee, Sharon, Ng, Shu-Kay, Hafler, David, Levy, Ronald, Nolan, Garry, Mesirov, Jill, McLachlan, Geoffrey J.
In systems biomedicine, an experimenter encounters different potential sources of variation in data such as individual samples, multiple experimental conditions, and multi-variable network-level responses. In multiparametric cytometry, which is often used for analyzing patient samples, such issues are critical. While computational methods can identify cell populations in individual samples, without the ability to automatically match them across samples, it is difficult to compare and characterize the populations in typical experiments, such as those responding to various stimulations or distinctive of particular patients or time-points, especially when there are many samples. Joint Clustering and Matching (JCM) is a multi-level framework for simultaneous modeling and registration of populations across a cohort. JCM models every population with a robust multivariate probability distribution. Simultaneously, JCM fits a random-effects model to construct an overall batch template -- used for registering populations across samples, and classifying new samples. By tackling systems-level variation, JCM supports practical biomedical applications involving large cohorts.
Expectation-maximization for logistic regression
We present a family of expectation-maximization (EM) algorithms for binary and negative-binomial logistic regression, drawing a sharp connection with the variational-Bayes algorithm of Jaakkola and Jordan (2000). Indeed, our results allow a version of this variational-Bayes approach to be re-interpreted as a true EM algorithm. We study several interesting features of the algorithm, and of this previously unrecognized connection with variational Bayes. We also generalize the approach to sparsity-promoting priors, and to an online method whose convergence properties are easily established. This latter method compares favorably with stochastic-gradient descent in situations with marked collinearity.
Privileged Information for Data Clustering
Many machine learning algorithms assume that all input samples are independently and identically distributed from some common distribution on either the input space X, in the case of unsupervised learning, or the input and output space X x Y in the case of supervised and semi-supervised learning. In the last number of years the relaxation of this assumption has been explored and the importance of incorporation of additional information within machine learning algorithms became more apparent. Traditionally such fusion of information was the domain of semi-supervised learning. More recently the inclusion of knowledge from separate hypothetical spaces has been proposed by Vapnik as part of the supervised setting. In this work we are interested in exploring Vapnik's idea of master-class learning and the associated learning using privileged information, however within the unsupervised setting. Adoption of the advanced supervised learning paradigm for the unsupervised setting instigates investigation into the difference between privileged and technical data. By means of our proposed aRi-MAX method stability of the KMeans algorithm is improved and identification of the best clustering solution is achieved on an artificial dataset. Subsequently an information theoretic dot product based algorithm called P-Dot is proposed. This method has the ability to utilize a wide variety of clustering techniques, individually or in combination, while fusing privileged and technical data for improved clustering. Application of the P-Dot method to the task of digit recognition confirms our findings in a real-world scenario.
A Survey on Latent Tree Models and Applications
Mourad, R., Sinoquet, C., Zhang, N. L., Liu, T., Leray, P.
In data analysis, latent variables play a central role because they help provide powerful insights into a wide variety of phenomena, ranging from biological to human sciences. The latent tree model, a particular type of probabilistic graphical models, deserves attention. Its simple structure - a tree - allows simple and efficient inference, while its latent variables capture complex relationships. In the past decade, the latent tree model has been subject to significant theoretical and methodological developments. In this review, we propose a comprehensive study of this model. First we summarize key ideas underlying the model. Second we explain how it can be efficiently learned from data. Third we illustrate its use within three types of applications: latent structure discovery, multidimensional clustering, and probabilistic inference. Finally, we conclude and give promising directions for future researches in this field.
Predicting the Severity of Breast Masses with Data Mining Methods
Mokhtar, Sahar A., Elsayad, Alaa. M.
Mammography is the most effective and available tool for breast cancer screening. However, the low positive predictive value of breast biopsy resulting from mammogram interpretation leads to approximately 70% unnecessary biopsies with benign outcomes. Data mining algorithms could be used to help physicians in their decisions to perform a breast biopsy on a suspicious lesion seen in a mammogram image or to perform a short term follow-up examination instead. In this research paper data mining classification algorithms; Decision Tree (DT), Artificial Neural Network (ANN), and Support Vector Machine (SVM) are analyzed on mammographic masses data set. The purpose of this study is to increase the ability of physicians to determine the severity (benign or malignant) of a mammographic mass lesion from BI-RADS attributes and the patient,s age. The whole data set is divided for training the models and test them by the ratio of 70:30% respectively and the performances of classification algorithms are compared through three statistical measures; sensitivity, specificity, and classification accuracy. Accuracy of DT, ANN and SVM are 78.12%, 80.56% and 81.25% of test samples respectively. Our analysis shows that out of these three classification models SVM predicts severity of breast cancer with least error rate and highest accuracy.
Non-linear dimensionality reduction: Riemannian metric estimation and the problem of geometric discovery
Perraul-Joncas, Dominique, Meila, Marina
In recent years, manifold learning has become increasingly popular as a tool for performing non-linear dimensionality reduction. This has led to the development of numerous algorithms of varying degrees of complexity that aim to recover man ifold geometry using either local or global features of the data. Building on the Laplacian Eigenmap and Diffusionmaps framework, we propose a new paradigm that offers a guarantee, under reasonable assumptions, that any manifo ld learning algorithm will preserve the geometry of a data set. Our approach is based on augmenting the output of embedding algorithms with geometric informatio n embodied in the Riemannian metric of the manifold. We provide an algorithm for estimating the Riemannian metric from data and demonstrate possible application s of our approach in a variety of examples.
Structural and Functional Discovery in Dynamic Networks with Non-negative Matrix Factorization
Mankad, Shawn, Michailidis, George
Due to advances in data collection technologies, it is becoming increasingly common to study time series of networks. An important research question is how to discover the underlying structure and dynamics in time-varying networked systems. In this work, we propose a new matrix factorization-based approach for community discovery and visual exploration within potentially weighted and directed network time-series. Next, we review and discuss this work in relation to popular approaches for addressing the key problems of community detection and visualization of time series of networks. There have been many important contributions for community detection in network time-series, extensively reviewed in [1, 2], from the fields of physics, computer science and statistics. The basic goal of community detection is to extract groups of nodes that feature relatively dense within group connectivity and sparser between group connections [3, 4]. A common strategy is to embed the graphs in low-dimensional latent spaces. For instance, [5] use latent variables to capture groups of papers that evolve similarly in citation network data.
The Expressive Power of Word Embeddings
Chen, Yanqing, Perozzi, Bryan, Al-Rfou, Rami, Skiena, Steven
We seek to better understand the difference in quality of the several publicly released embeddings. We propose several tasks that help to distinguish the characteristics of different embeddings. Our evaluation of sentiment polarity and synonym/antonym relations shows that embeddings are able to capture surprisingly nuanced semantics even in the absence of sentence structure. Moreover, benchmarking the embeddings shows great variance in quality and characteristics of the semantics captured by the tested embeddings. Finally, we show the impact of varying the number of dimensions and the resolution of each dimension on the effective useful features captured by the embedding space. Our contributions highlight the importance of embeddings for NLP tasks and the effect of their quality on the final results.
Normalized Online Learning
Ross, Stephane, Mineiro, Paul, Langford, John
We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust.