Accuracy
WWE Money In The Bank 2017: Live Stream Info, Start Time, Match Card For 'SmackDown Live' PPV
At the very least, a new No.1 contender for each of the top two titles on "SmackDown Live" will be named Sunday night at WWE Money in the Bank 2017 in St. Louis. The pay-per-view could even see three different superstars hold the same belt in the span of just a few minutes, as was the case a year ago. Money in the Bank 2017 is scheduled to start at 8 p.m. EDT, Ordering the event on PPV costs $54.99, but fans can also watch MITB with a live stream on the WWE Network. A subscription to the network costs $9.99 per month, though new subscribers get the first month free. The match card is highlighted by the two main events, which could last for more than half of the PPV.
[P] Low loss but large amount of false positives? โข r/MachineLearning
I'm trying to classify data into two classes and my loss is less than 0.01 under both MSE and BCE. This seems contradictory to me that my performance on the training set is still so low - the ratio of true positives to false positives is at least 1:5 even when sweeping the threshold. Does this behavior mean my net is still not learning?
Accelerating Innovation Through Analogy Mining
Hope, Tom, Chan, Joel, Kittur, Aniket, Shahaf, Dafna
The availability of large idea repositories (e.g., the U.S. patent database) could significantly accelerate innovation and discovery by providing people with inspiration from solutions to analogous problems. However, finding useful analogies in these large, messy, real-world repositories remains a persistent challenge for either human or automated methods. Previous approaches include costly hand-created databases that have high relational structure (e.g., predicate calculus representations) but are very sparse. Simpler machine-learning/information-retrieval similarity metrics can scale to large, natural-language datasets, but struggle to account for structural similarity, which is central to analogy. In this paper we explore the viability and value of learning simpler structural representations, specifically, "problem schemas", which specify the purpose of a product and the mechanisms by which it achieves that purpose. Our approach combines crowdsourcing and recurrent neural networks to extract purpose and mechanism vector representations from product descriptions. We demonstrate that these learned vectors allow us to find analogies with higher precision and recall than traditional information-retrieval methods. In an ideation experiment, analogies retrieved by our models significantly increased people's likelihood of generating creative ideas compared to analogies retrieved by traditional methods. Our results suggest a promising approach to enabling computational analogy at scale is to learn and leverage weaker structural representations.
WWE Money In The Bank 2017: Predictions, Match Card For 'SmackDown Live' PPV
Money in the Bank 2017 isn't considered to be among WWE's "Big 4" pay-per-views, though it probably should be. It's leaped ahead of Survivor Series as one of the most important events each year, and it's set for Sunday night in St. Louis. The PPV will feature members of the "SmackDown Live" roster, and there are only five matches scheduled because of the two big co-main events. Below are predictions for the entire Money in the Bank card. The argument can be made for a few wrestlers to win this match.
A Practical Method for Solving Contextual Bandit Problems Using Decision Trees
Elmachtoub, Adam N., McNellis, Ryan, Oh, Sechan, Petrik, Marek
Many efficient algorithms with strong theoretical guarantees have been proposed for the contextual multi-armed bandit problem. However, applying these algorithms in practice can be difficult because they require domain expertise to build appropriate features and to tune their parameters. We propose a new method for the contextual bandit problem that is simple, practical, and can be applied with little or no domain expertise. Our algorithm relies on decision trees to model the context-reward relationship. Decision trees are non-parametric, interpretable, and work well without hand-crafted features. To guide the exploration-exploitation trade-off, we use a bootstrapping approach which abstracts Thompson sampling to non-Bayesian settings. We also discuss several computational heuristics and demonstrate the performance of our method on several datasets.
Performance Modelling of Planners from Homogeneous Problem Sets
Rosa, Tomรกs de la (Universidad Carlos III de Madrid) | Cenamor, Isabel (Universidad Carlos III de Madrid) | Fernรกndez, Fernando (Universidad Carlos III de Madrid)
Empirical performance models play an important role in the development of planning portfolios that make a per-domain or per-problem configuration of its search components. Even though such portfolios have shown their power when compared to other systems in current benchmarks, there is no clear evidence that they are capable to differentiate problems (instances) having similar input properties (in terms of objects, goals, etc.) but fairly different runtime for a given planner. In this paper we present a study of empirical performance models that are trained using problems having the same configuration, with the objective of guiding the models to recognize the underlying differences existing among homogeneous problems. In addition we propose a set of new features that boost the prediction capabilities under such scenarios. The results show that the learned models clearly performed over random classifiers, which reinforces the hypothesis that the selection of planners can be done on a per-instance basis when configuring a portfolio.
The ratio of normalizing constants for Bayesian graphical Gaussian model selection
Mohammadi, A., Massam, H., Letac, G.
The ratio of normalizing constants for the G-Wishart distribution, for two graphs differing by an edge e, has long been a bottleneck in the search for efficient model selection in the class of graphical Gaussian models. We give an accurate approximation to this ratio under two assumptions: first, we assume that the scale of the prior is the identity, second we assume that the set of paths between the two ends of e are disjoint. The first approximation does not represent a restriction since this is what statisticians use. The second assumption is a real restriction but we conjecture that similar results are also true without this second assumption. We shall prove it in subsequent work. This approximation is simply a ratio of Gamma functions and thus need no simulation. We illustrate the efficiency and practical impact of our result by comparing model selection in the class of graphical Gaussian models using this approximation and using current Metropolis-Hastings methods. We work both with simulated data and a complex high-dimensional real data set. In the numerical examples, we do not assume that the paths between the two endpoints of edge e are disjoint.
Predictive modelling of training loads and injury in Australian football
Carey, David L., Ong, Kok-Leong, Whiteley, Rod, Crossley, Kay M., Crow, Justin, Morris, Meg E.
To investigate whether training load monitoring data could be used to predict injuries in elite Australian football players, data were collected from elite athletes over 3 seasons at an Australian football club. Loads were quantified using GPS devices, accelerometers and player perceived exertion ratings. Absolute and relative training load metrics were calculated for each player each day (rolling average, exponentially weighted moving average, acute:chronic workload ratio, monotony and strain). Injury prediction models (regularised logistic regression, generalised estimating equations, random forests and support vector machines) were built for non-contact, non-contact time-loss and hamstring specific injuries using the first two seasons of data. Injury predictions were generated for the third season and evaluated using the area under the receiver operator characteristic (AUC). Predictive performance was only marginally better than chance for models of non-contact and non-contact time-loss injuries (AUC$<$0.65). The best performing model was a multivariate logistic regression for hamstring injuries (best AUC=0.76). Learning curves suggested logistic regression was underfitting the load-injury relationship and that using a more complex model or increasing the amount of model building data may lead to future improvements. Injury prediction models built using training load data from a single club showed poor ability to predict injuries when tested on previously unseen data, suggesting they are limited as a daily decision tool for practitioners. Focusing the modelling approach on specific injury types and increasing the amount of training data may lead to the development of improved predictive models for injury prevention.
Machine Learning for Everyone - Part 2: Spotting anomalous data
We're going to analyze data that contain cases flagged as abnormal. So we'll build a predictive model in order to spot cases that are not currently flagged as abnormal, but behaving like ones that are. This post contains R code and some machine learning explanations, which can be extrapolated to other languages such as Python. The idea is to create a case study giving the reader the opportunity to recreate results. Note: There are some points oversimplified in the analysis, but hopefully you'll become curious to learn more about this topic, in case you've never done a project like this.
Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier
Futoma, Joseph, Hariharan, Sanjay, Heller, Katherine
We present a scalable end-to-end classifier that uses streaming physiological and medication data to accurately predict the onset of sepsis, a life-threatening complication from infections that has high mortality and morbidity. Our proposed framework models the multivariate trajectories of continuous-valued physiological time series using multitask Gaussian processes, seamlessly accounting for the high uncertainty, frequent missingness, and irregular sampling rates typically associated with real clinical data. The Gaussian process is directly connected to a black-box classifier that predicts whether a patient will become septic, chosen in our case to be a recurrent neural network to account for the extreme variability in the length of patient encounters. We show how to scale the computations associated with the Gaussian process in a manner so that the entire system can be discriminatively trained end-to-end using backpropagation. In a large cohort of heterogeneous inpatient encounters at our university health system we find that it outperforms several baselines at predicting sepsis, and yields 19.4% and 55.5% improved areas under the Receiver Operating Characteristic and Precision Recall curves as compared to the NEWS score currently used by our hospital.