Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems): Ian H. Witten, Eibe Frank: 9780120884070: Books


This book is very easy to read and understand. Unlike Hastie's Statistical Learning book, it is not geared towards those with an expert level knowledge of statistics, and instead takes time to explain functions and formulas for the person with a decent but not extrordinary understanding of statistical/math concepts. For example, their description of a Gaussian was the clearest I've seen. On the other hand, if you're math/statistics background is considerable, you may find this book somewhat simplistic or tedious. The book has a good coverage of techniques and algorithms, although I was somewhat disappointed that they do not mention Influence Diagrams, considering the amount of coverage of both decision trees and Bayesian techniques.

Multi-way Interacting Regression via Factorization Machines Machine Learning

We propose a Bayesian regression method that accounts for multi-way interactions of arbitrary orders among the predictor variables. Our model makes use of a factorization mechanism for representing the regression coefficients of interactions among the predictors, while the interaction selection is guided by a prior distribution on random hypergraphs, a construction which generalizes the Finite Feature Model. We present a posterior inference algorithm based on Gibbs sampling, and establish posterior consistency of our regression model. Our method is evaluated with extensive experiments on simulated data and demonstrated to be able to identify meaningful interactions in applications in genetics and retail demand forecasting.

Effective Bayesian Modeling of Groups of Related Count Time Series Machine Learning

Time series of counts arise in a variety of forecasting applications, for which traditional models are generally inappropriate. This paper introduces a hierarchical Bayesian formulation applicable to count time series that can easily account for explanatory variables and share statistical strength across groups of related time series. We derive an efficient approximate inference technique, and illustrate its performance on a number of datasets from supply chain planning.

Bayesian Predictive Profiles With Applications to Retail Transaction Data

Neural Information Processing Systems

Massive transaction data sets are recorded in a routine manner in telecommunications, retail commerce, and Web site management. In this paper we address the problem of inferring predictive individual profilesfrom such historical transaction data. We describe a generative mixture model for count data and use an an approximate Bayesian estimation framework that effectively combines anindividual's specific history with more general population patterns. We use a large real-world retail transaction data set to illustrate how these profiles consistently outperform non-mixture and non-Bayesian techniques in predicting customer behavior in out-of-sample data.

This Week's Top Stocks FB, DDD, AMZN, & TWTR Stock Forecasts Quantifying Uncertainty and Bayesian Inference


The U.S. cotton market has remained stable since its spike in 2011, when China executed its cotton reserving and fiber hoarding plan. It is believed that U.S. cotton demand and price were artificially kept low because there are always worries that China would unexpectedly unleash its cotton stockpile, about half of the global storage. However, U.S. cotton price finally showed a revival in recent days. The ICE July cotton futures closed at 95.21 cents a pound on Tuesday, June 12, the highest level for a front-month future contract in the last 6 years. The revival could be attributed to multiple factors, with an emphasis on the worries about insufficient rain in the cotton-growing areas and the newly issued import quotas from China.