statistical science
New $1 Million Biennial Prize to Revitalize Statistics - DataScienceCentral.com
While nowadays everyone talk about machine learning, data science and AI, few are mentioning statistical science. Its association, The American Statistical Association, founded in 1839, is the second oldest continuously operating professional society in the US according to Wikipedia. The growth of this community was still strong a few decades ago. Of course machine learning relies heavily on statistics. The disconnect started possibly 20 years ago.
Revisiting Rashomon: A Comment on "The Two Cultures"
Here, I provide some reflections on Prof. Leo Breiman's "The Two Cultures" paper. I focus specifically on the phenomenon that Breiman dubbed the "Rashomon Effect", describing the situation in which there are many models that satisfy predictive accuracy criteria equally well, but process information in the data in substantially different ways. This phenomenon can make it difficult to draw conclusions or automate decisions based on a model fit to data. I make connections to recent work in the Machine Learning literature that explore the implications of this issue, and note that grappling with it can be a fruitful area of collaboration between the algorithmic and data modeling cultures.
Duke U. lands $3M training grant for artificial intelligence research
DURHAM โ The National Science Foundation has awarded Duke University a $3 million, five-year Research Traineeship grant to develop a program for graduate students to develop expertise in using artificial intelligence (AI) for materials science research. The aiM (AI for Understanding and Designing Materials), program will fill a vital workforce gap by training the next generation in the new convergent field of materials and computer science research. "To achieve the promise of the U.S. Materials Genome Initiative of accelerated discovery, design and application of new materials, we must integrate the traditional tools of experimentation, theory and computation with the emerging tools of data science to transform the way we approach materials understanding and discovery," said Cate Brinson, chair of the Department of Mechanical Engineering & Materials Science and director of aiM. The Materials Genome Initiative (MGI), launched in 2011, is a multi-agency federal government effort to accelerate the development and deployment of new, advanced materials to address a host of challenges in clean energy, national security, health and welfare. "The MGI promoted a paradigm shift from slow individual experiments and computation to the beginnings of data-driven AI approaches in materials science research," added Brinson.
Amazon.com: Statistical Regression and Classification: From Linear Models to Machine Learning (Chapman & Hall/CRC Texts in Statistical Science) (9781498710916): Norman Matloff: Books
Matloff delivers a well-balanced book for advanced beginners. Besides the mathematical formulas, he also presents many chunks of R code, and if the reader is able to read R code, the formulas and calculations become clearer. Due to the computational R code, the well-written Appendix, and an overall clear English, the book will help students and autodidacts. Matloff has written a textbook of the best kind for such a broad topic." ". . . the book is well suitable for a wide audience: For practitioners interested in applying the methodology, for students in statistics as well as economics/social sciences and computer science.
Free Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
This book is intended for busy professionals working with data of any kind: engineers, BI analysts, statisticians, operations research, AI and machine learning professionals, economists, data scientists, biologists, and quants, ranging from beginners to executives. In about 300 pages and 28 chapters it covers many new topics, offering a fresh perspective on the subject, including rules of thumb and recipes that are easy to automate or integrate in black-box systems, as well as new model-free, data-driven foundations to statistical science and predictive analytics. The approach focuses on robust techniques; it is bottom-up (from applications to theory), in contrast to the traditional top-down approach. The material is accessible to practitioners with a one-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications with numerous illustrations, is aimed at practitioners, researchers, and executives in various quantitative fields.
Statistical Regression and Classification: From Linear Models to Machine Learning (Chapman & Hall/CRC Texts in Statistical Science)
Statistical Regression and Classification: From Linear Models to Machine Learning takes an innovative look at the traditional statistical regression course, presenting a contemporary treatment in line with today's applications and users. The book treats classical regression methods in an innovative, contemporary manner. Though some statistical learning methods are introduced, the primary methodology used is linear and generalized linear parametric models, covering both the Description and Prediction goals of regression methods. The author is just as interested in Description applications of regression, such as measuring the gender wage gap in Silicon Valley, as in forecasting tomorrow's demand for bike rentals. An entire chapter is devoted to measuring such effects, including discussion of Simpson's Paradox, multiple inference, and causation issues. Similarly, there is an entire chapter of parametric model fit, making use of both residual analysis and assessment via nonparametric analysis.
Data Science Has Been Using Rebel Statistics for a Long Time
Many of those who call themselves statisticians just won't admit that data science heavily relies on and uses (heretical, rule-breaking) statistical science, or they don't recognize the true statistical nature of these data science techniques (some are 15-year old), or are opposed to the modernization of their statistical arsenal. They already missed the train when machine learning became a popular discipline (also heavily based on statistics) more than 15 years ago. Now machine learning professionals, who are statistical practitioners working on problems such as clustering, far outnumber statisticians. Many times, I have interacted with statisticians who think that anyone not calling himself statistician, knows nothing or little about statistics; see my recent bio published here, or visit the LinkedIn profiles of many data scientists, to debunk this myth. Any statistical technique that is not in their old books are considered heretical at best, or non-statistic at worst, or most of the time, not understood.
The Death of the Statistical Tests of Hypotheses
Some foundations of statistical science have been questioned recently, especially the use and abuse of p-values. See also this article published in FiveThirtyEight.com. Statistical tests of hypotheses rely on p-values and other mysterious parameters and concepts that only the initiated can understand: power, type I error, type II error, or UMP tests, just to name a few. Pretty much all of us have had to learn this old stuff (pre-dating the existence of computers) in some college classes. Sometimes results from a statistical test will be published in a mainstream journal - for instance about whether or not global warming is accelerating - using the same jargon that few understand, and accompanied by misinterpretations and flaws in the use of the test itself.