Collaborating Authors

Free Book: Probability and Statistics Cookbook -


The format is very similar to a BIG cheat sheet. It is based on literature and in-class material from courses of the statistics department at the University of California in Berkeley but also influenced by other sources . To read the PDF version, click here.

Probability and Statistics for Business and Data Science


Probability for improved business decisions: Introduction, Combinatorics, Bayesian Inference, Distributions. Welcome to Probability and Statistics for Business and Data Science! In this course we cover what you need to know about probability and statistics to succeed in business and the data science field! This practical course will go over theory and implementation of statistics to real world problems. Each section has example problems, in course quizzes, and assessment tests.

Rodgers, Burrow, Mahomes lead new QB statistic from AWS and NFL


The NFL and Amazon Web Services (AWS) unveiled a new statistic this week to help rank the league's most prolific quarterbacks based on their decision-making. A few brilliant strokes of ingenuity, combined with a large dose of capitalism, made the e-retailer into the world's cloud services leader. The team behind Next Gen Stats (NGS) created the "Passing Score" statistic to assess whether a quarterback made the optimal decision after the play. The NFL said dozens of quarterback evaluation metrics exist but they all miss out on isolating the specific variables a quarterback must evaluate before a passing play. Josh Helmrich, NFL Director of strategy and business development, told ZDNet that AWS and the NFL worked together to combine 7 different machine learning models and several play variables to create the Next Gen Stats Passing Score.

The Kernel Spatial Scan Statistic Machine Learning

Kulldorff's (1997) seminal paper on spatial scan statistics (SSS) has led to many methods considering different regions of interest, different statistical models, and different approximations while also having numerous applications in epidemiology, environmental monitoring, and homeland security. SSS provides a way to rigorously test for the existence of an anomaly and provide statistical guarantees as to how "anomalous" that anomaly is. However, these methods rely on defining specific regions where the spatial information a point contributes is limited to binary 0 or 1, of either inside or outside the region, while in reality anomalies will tend to follow smooth distributions with decaying density further from an epicenter. In this work, we propose a method that addresses this shortcoming through a continuous scan statistic that generalizes SSS by allowing the point contribution to be defined by a kernel. We provide extensive experimental and theoretical results that shows our methods can be computed efficiently while providing high statistical power for detecting anomalous regions.