Performance Analysis
WWE Battleground 2017: Live Stream Info, Start Time, For PPV Before SummerSlam
Some of the final pieces for SummerSlam 2017 will fall into place Sunday night with the last "SmackDown Live" pay-per-view prior to the "Big Four" event. WWE Battleground 2017 will feature seven matches on the card with three championships on the line. Battleground 2017 is scheduled to start at 8 p.m. EDT, and the pre-show gets underway at 7:30 p.m. EDT. Ordering the event on PPV costs $54.99, but fans can also watch the event with a live stream on the WWE Network. A subscription to the network costs $9.99 per month, though new subscribers get the first month free.
On kernel methods for covariates that are rankings
Mania, Horia, Ramdas, Aaditya, Wainwright, Martin J., Jordan, Michael I., Recht, Benjamin
Permutation-valued features arise in a variety of applications, either in a direct way when preferences are elicited over a collection of items, or an indirect way in which numerical ratings are converted to a ranking. To date, there has been relatively limited study of regression, classification, and testing problems based on permutation-valued features, as opposed to permutation-valued responses. This paper studies the use of reproducing kernel Hilbert space methods for learning from permutation-valued features. These methods embed the rankings into an implicitly defined function space, and allow for efficient estimation of regression and test functions in this richer space. Our first contribution is to characterize both the feature spaces and spectral properties associated with two kernels for rankings, the Kendall and Mallows kernels. Using tools from representation theory, we explain the limited expressive power of the Kendall kernel by characterizing its degenerate spectrum, and in sharp contrast, we prove that Mallows' kernel is universal and characteristic. We also introduce families of polynomial kernels that interpolate between the Kendall (degree one) and Mallows' (infinite degree) kernels. We show the practical effectiveness of our methods via applications to Eurobarometer survey data as well as a Movielens ratings dataset.
WWE Battleground 2017: Predictions, Match Card For 'SmackDown Live' PPV Before SummerSlam
The final WWE pay-per-view before SummerSlam is set for Sunday night in Philadelphia with WWE Battleground 2017. Jinder Mahal will defend his WWE Championship against Randy Orton in the main event, and two more championships will be on the line. Below are predictions for every match on the card, which will feature only members of the "SmackDown Live" roster. It was shocking for Jinder to win the WWE Championship shortly after WrestleMania 33, but it would make little sense for him to lose the title at this point. He's already beaten Orton twice, and their feud should come to an end at Battleground.
Boolean kernels for collaborative filtering in top-N item recommendation
In many personalized recommendation problems available data consists only of positive interactions (implicit feedback) between users and items. This problem is also known as One-Class Collaborative Filtering (OC-CF). Linear models usually achieve state-of-the-art performances on OC-CF problems and many efforts have been devoted to build more expressive and complex representations able to improve the recommendations.Recent analysis show that collaborative filtering (CF) datasets have peculiar characteristics such as high sparsity and a long tailed distribution of the ratings. In this paper we propose a boolean kernel, called Disjunctive kernel, which is less expressive than the linear one but it is able to alleviate the sparsity issue in CF contexts. The embedding of this kernel is composed by all the combinations of a certain arity d of the input variables, and these combined features are semantically interpreted as disjunctions of the input variables. Experiments on several CF datasets show the effectiveness and the efficiency of the proposed kernel. Keywords: Boolean kernel, Kernel methods, Recommender systems, Collaborative filtering, Implicit feedback 1. Introduction Collaborative Filtering (CF) is the de facto approach for making personalized recommendation. CF techniques exploit historical information about the useritem interactions in order to improve future recommendations to users. Useritem interactions can be of two types: explicit or implicit.
Estimation of Large Covariance and Precision Matrices from Temporally Dependent Observations
We consider the estimation of large covariance and precision matrices from high-dimensional sub-Gaussian or heavier-tailed observations with slowly decaying temporal dependence. The temporal dependence is allowed to be long-range so with longer memory than those considered in the current literature. We show that several commonly used methods for independent observations can be applied to the temporally dependent data. In particular, the rates of convergence are obtained for the generalized thresholding estimation of covariance and correlation matrices, and for the constrained $\ell_1$ minimization and the $\ell_1$ penalized likelihood estimation of precision matrix. Properties of sparsistency and sign-consistency are also established. A gap-block cross-validation method is proposed for the tuning parameter selection, which performs well in simulations. As a motivating example, we study the brain functional connectivity using resting-state fMRI time series data with long-range temporal dependence.
Data Science Has Been Using Rebel Statistics for a Long Time
Many of those who call themselves statisticians just won't admit that data science heavily relies on and uses (heretical, rule-breaking) statistical science, or they don't recognize the true statistical nature of these data science techniques (some are 15-year old), or are opposed to the modernization of their statistical arsenal. They already missed the train when machine learning became a popular discipline (also heavily based on statistics) more than 15 years ago. Now machine learning professionals, who are statistical practitioners working on problems such as clustering, far outnumber statisticians. Many times, I have interacted with statisticians who think that anyone not calling himself statistician, knows nothing or little about statistics; see my recent bio published here, or visit the LinkedIn profiles of many data scientists, to debunk this myth. Any statistical technique that is not in their old books are considered heretical at best, or non-statistic at worst, or most of the time, not understood.
Sparse Probit Linear Mixed Model
Mandt, Stephan, Wenzel, Florian, Nakajima, Shinichi, Cunningham, John P., Lippert, Christoph, Kloft, Marius
Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for linear regression, LMMs have been restricted to continuous phenotypes. We introduce the Sparse Probit Linear Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to binary phenotypes. As a technical challenge, the model no longer possesses a closed-form likelihood function. In this paper, we present a scalable approximate inference algorithm that lets us fit the model to high-dimensional data sets. We show on three real-world examples from different domains that in the setup of binary labels, our algorithm leads to better prediction accuracies and also selects features which show less correlation with the confounding factors.
Predicting Flights Delay Using Supervised Learning, Logistic Regression
In this post, we'll use a supervised machine learning technique called logistic regression to predict delayed flights. But before we proceed, I like to give condolences to the family of the the victims of the Germanwings tragedy. Note: This is a common data set in the machine learning community to test out algorithms and models given it's publicly available and have sizable data. In this blog, we will look at small sample snapsot(2201 flights in January 2004). In another post, we can explore using Big Data technologies such as Hadoop MapReduce or Spark machine learning libraries to do large scale predictive analytics and data mining.
An Ensemble Boosting Model for Predicting Transfer to the Pediatric Intensive Care Unit
Rubin, Jonathan, Potes, Cristhian, Xu-Wilson, Minnan, Dong, Junzi, Rahman, Asif, Nguyen, Hiep, Moromisato, David
Our work focuses on the problem of predicting the transfer of pediatric patients from the general ward of a hospital to the pediatric intensive care unit. Using data collected over 5.5 years from the electronic health records of two medical facilities, we develop classifiers based on adaptive boosting and gradient tree boosting. We further combine these learned classifiers into an ensemble model and compare its performance to a modified pediatric early warning score (PEWS) baseline that relies on expert defined guidelines. To gauge model generalizability, we perform an inter-facility evaluation where we train our algorithm on data from one facility and perform evaluation on a hidden test dataset from a separate facility. We show that improvements are witnessed over the PEWS baseline in accuracy (0.77 vs. 0.69), sensitivity (0.80 vs. 0.68), specificity (0.74 vs. 0.70) and AUROC (0.85 vs. 0.73).
Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis
Benavoli, Alessio, Corani, Giorgio, Demsar, Janez, Zaffalon, Marco
The machine learning community adopted the use of null hypothesis significance testing (NHST) in order to ensure the statistical validity of results. Many scientific fields however realized the shortcomings of frequentist reasoning and in the most radical cases even banned its use in publications. We should do the same: just as we have embraced the Bayesian paradigm in the development of new machine learning methods, so we should also use it in the analysis of our own results. We argue for abandonment of NHST by exposing its fallacies and, more importantly, offer better - more sound and useful - alternatives for it.