fivethirtyeight
On the Edge by Nate Silver review – the art of risk-taking
Nothing is more interesting to poker players and less interesting to everyone else than a breathless recounting of who bet how much with a jack and six of clubs in some game years ago. There's an awful lot of that kind of thing in this book, which celebrates poker players as paradigmatic citizens of a global intellectual community it calls "the River", which also counts among its inhabitants venture capitalists, crypto traders, fashionable philosophers and mild-mannered statisticians. One such statistician, Nate Silver himself, came to public prominence as a data-driven analyst of political polls at his website FiveThirtyEight, which predicted the results of US elections in 2008 and 2012 with seemingly uncanny accuracy. But before that he was a poker player, making money especially in the nascent internet-casino business, until Congress banned online poker in 2006. That, he has said, was his political awakening.
- Banking & Finance (0.90)
- Leisure & Entertainment > Games (0.77)
- Government > Regional Government > North America Government > United States Government (0.35)
What Was Nate Silver's Data Revolution?
Political journalism suffers from a central contradiction: elections are finicky things, but the best way for a commentator to make a name for himself is to project as much confidence as he can. The collection of confidence can take many forms: journalists can position themselves as monarchs of gossip; they can embed with campaigns and provide a look from the inside; they can simply plug their ears and yell louder than the next guy. The key to staying in the game is to never allow the actual outcome of an election to change the way you go about your business. After the 2012 Presidential election, political media had a moment when it seemed like that confidence game might finally come to an end. If you worked in the news business in any capacity after Nate Silver correctly called all fifty states in 2012, you likely remember feeling desperate to catch up to the new paradigm.
- Media > News (1.00)
- Leisure & Entertainment > Sports > Baseball (1.00)
- Government (1.00)
Calibration Assessment and Boldness-Recalibration for Binary Events
Guthrie, Adeline P., Franck, Christopher T.
Probability predictions are essential to inform decision making in medicine, economics, image classification, sports analytics, entertainment, and many other fields. Ideally, probability predictions are (i) well calibrated, (ii) accurate, and (iii) bold, i.e., far from the base rate of the event. Predictions that satisfy these three criteria are informative for decision making. However, there is a fundamental tension between calibration and boldness, since calibration metrics can be high when predictions are overly cautious, i.e., non-bold. The purpose of this work is to develop a hypothesis test and Bayesian model selection approach to assess calibration, and a strategy for boldness-recalibration that enables practitioners to responsibly embolden predictions subject to their required level of calibration. Specifically, we allow the user to pre-specify their desired posterior probability of calibration, then maximally embolden predictions subject to this constraint. We verify the performance of our procedures via simulation, then demonstrate the breadth of applicability by applying these methods to real world case studies in each of the fields mentioned above. We find that very slight relaxation of calibration probability (e.g., from 0.99 to 0.95) can often substantially embolden predictions (e.g., widening Hockey predictions' range from .25-.75 to .10-.90)
- North America > United States > Virginia (0.04)
- North America > United States > New York (0.04)
- North America > United States > Michigan > Wayne County (0.04)
- Health & Medicine (1.00)
- Leisure & Entertainment > Sports > Hockey (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Exciting Data Science Project Ideas To Brush Up Your Skills
Projects have always been thought of as measurable improvements resulting from a result produced, which serve as the icing on the cake for achieving personal or corporate goals. Talking about individual projects, have you found it challenging to learn at home? Many of us are in the same boat -- there are far too many things to handle during these trying times, and learning has taken a back seat, contrary to our expectations. So, what are our options for getting back on track? How can we apply what we have learned about data science in the real world? Picking an open-source data science project and sticking with it is extremely beneficial.
Comparing Sequential Forecasters
Choe, Yo Joong, Ramdas, Aaditya
We consider two or more forecasters each making a sequence of predictions over time and tackle the problem of how to compare them -- either online or post-hoc. In fields ranging from meteorology to sports, forecasters make predictions on different events or quantities over time, and this work describes how to compare them in a statistically rigorous manner. Specifically, we design a nonasymptotic sequential inference procedure for estimating the time-varying difference in forecast quality when using a relatively large class of scoring rules (bounded scores with a linear equivalent). The resulting confidence intervals can be continuously monitored and yield statistically valid comparisons at arbitrary data-dependent stopping times ("anytime-valid"); this is enabled by adapting recent variance-adaptive confidence sequences (CS) to our setting. In the spirit of Shafer and Vovk's game-theoretic probability, the coverage guarantees for our CSs are also distribution-free, in the sense that they make no distributional assumptions whatsoever on the forecasts or outcomes. Additionally, in contrast to a recent preprint by Henzi and Ziegel, we show how to sequentially test a weak null hypothesis about whether one forecaster outperforms another on average over time, by designing different e-processes that quantify the evidence at any stopping time. We examine the validity of our methods over their fixed-time and asymptotic counterparts in synthetic experiments and demonstrate their effectiveness in real-data settings, including comparing probability forecasts on Major League Baseball (MLB) games and comparing statistical postprocessing methods for ensemble weather forecasts.
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.45)
Clustering Uber Rideshare Data - KDnuggets
According to Gartner, by 2020, a quarter billion connected vehicles will form a major element of the Internet of Things. Connected vehicles are projected to generate 25GB of data per hour, which can be analyzed to provide real-time monitoring and apps, and will lead to new concepts of mobility and vehicle usage. Uber Technologies Inc is a peer-to-peer ride sharing platform. Uber platform connects the cab drivers who can drive to the customer location. Uber uses machine learning, from calculating pricing to finding the optimal positioning of cars to maximize profits.
- Transportation > Ground > Road (0.71)
- Information Technology (0.71)
- Transportation > Passenger (0.56)
Machine Learning 101: The What, Why, and How of Weighting - KDnuggets
One thing I get asked about a lot is weighting. What do I need to worry about? By popular demand, I recently put together a lunch-and-learn at my company to help address all these questions. The goal was to be applicable to a large audience, (e.g., with a gentle introduction), but also some good technical advice/details to help practitioners. This blog was adapted from that presentation. Before we talk about weighting, we should all get on the same page about what a model is, what they are used for, and some of the common issues that modelers run into.
Who Will Win It? An In-game Win Probability Model for Football
Robberechts, Pieter, Van Haaren, Jan, Davis, Jesse
In-game win probability is a statistical metric that provides a sports team's likelihood of winning at any given point in a game, based on the performance of historical teams in the same situation. In-game win-probability models have been extensively studied in baseball, basketball and American football. These models serve as a tool to enhance the fan experience, evaluate in game-decision making and measure the risk-reward balance for coaching decisions. In contrast, they have received less attention in association football, because its low-scoring nature makes it far more challenging to analyze. In this paper, we build an in-game win probability model for football. Specifically, we first show that porting existing approaches, both in terms of the predictive models employed and the features considered, does not yield good in-game win-probability estimates for football. Second, we introduce our own Bayesian statistical model that utilizes a set of eight variables to predict the running win, tie and loss probabilities for the home team. We train our model using event data from the last four seasons of the major European football competitions. Our results indicate that our model provides well-calibrated probabilities. Finally, we elaborate on two use cases for our win probability metric: enhancing the fan experience and evaluating performance in crucial situations.
- Asia > Japan (0.05)
- South America > Argentina (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- (4 more...)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Leisure & Entertainment > Sports > Hockey (1.00)
- Leisure & Entertainment > Sports > Football (1.00)
How to Improve Political Forecasts - Issue 70: Variables
The 2020 Democratic candidates are out of the gate and the pollsters have the call! Bernie Sanders is leading by two lengths with Kamala Harris and Elizabeth Warren right behind, but Cory Booker and Beto O'Rourke are coming on fast! The political horse-race season is upon us and I bet I know what you are thinking: "Stop!" Every election we complain about horse-race coverage and every election we stay glued to it all the same. The problem with this kind of coverage is not that it's unimportant.
- North America > United States > Wisconsin (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Ohio (0.04)
- (2 more...)
Your Data Literacy Depends on Understanding the Types of Data and How They're Captured
The ability to understand and communicate about data is an increasingly important skill for the 21st-century citizen, for three reasons. First, data science and AI are affecting many industries globally, from healthcare and government to agriculture and finance. Second, much of the news is reported through the lenses of data and predictive models. And third, so much of our personal data is being used to define how we interact with the world. When so much data is informing decisions across so many industries, you need to have a basic understanding of the data ecosystem in order to be part of the conversation.
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.91)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Data Science (1.00)
- Information Technology > Cloud Computing (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)