There is much debate among scholars and practitioners about what data science is, and what it isn't. Does it deal only with big data? Is data science really that new? How is it different from statistics and analytics? One way to consider data science is as an evolutionary step in interdisciplinary fields like business analysis that incorporate computer science, modeling, statistics, analytics, and mathematics.
Click here to view more details about the book. This book is a type of "handbook" on data science and data scientists, and contains information not found in traditional statistical, programming, or computer science textbooks. The author has compiled what he considers some of the most important information you will need for a career in data science, based on his 20 years as a leader in the field. Much of the text was initially published on the Data Science Central website over the last three years, which is read by millions of website visitors. The book shows how data science is different from related fields and the value it brings to organizations using big data.
What are the differences between data science, data mining, machine learning, statistics, operations research, and so on? Here I compare several analytic disciplines that overlap, to explain the differences and common denominators. Sometimes differences exist for nothing else other than historical reasons. Sometimes the differences are real and subtle. I also provide typical job titles, types of analyses, and industries traditionally attached to each discipline. Underlined domains are main sub-domains. It would be great if someone can add an historical perspective to my article. First, let's start by describing data science, the new discipline. Job titles include data scientist, chief scientist, senior analyst, director of analytics and many more.
Some foundations of statistical science have been questioned recently, especially the use and abuse of p-values. See also this article published in FiveThirtyEight.com. Statistical tests of hypotheses rely on p-values and other mysterious parameters and concepts that only the initiated can understand: power, type I error, type II error, or UMP tests, just to name a few. Pretty much all of us have had to learn this old stuff (pre-dating the existence of computers) in some college classes. Sometimes results from a statistical test will be published in a mainstream journal - for instance about whether or not global warming is accelerating - using the same jargon that few understand, and accompanied by misinterpretations and flaws in the use of the test itself.
There are many topics that you won't learn in statistics classes. Some such as U-Statistics, stochastic geometry, fractal compression, and stochastic differential equations, are for post-graduate people. So it is OK to not find them in statistics curricula. Others like computational complexity and L 1 metrics (to replace R-squared and other outlier-sensitive L 2 metrics such as traditional variance) should be included, in my opinion. But the classic statistics curriculum is almost written in stone.