The 1999 edition looks like an antiquity as software and data has evolved so quickly recently. This 2013 edition is just about old statistics. Although many graduate students and researchers have had course work in statistics, they sometimes find themselves stumped in proceeding with a particular data analysis question. In fact, statistics is often taught as a lesson in mathematics as opposed to a strategy for answering questions about the real world, leaving beginning researchers at a loss for how to proceed. In these situations, it is common to turn to a statistical expert, the "go to" person when questions regarding appropriate data analysis emerge.
Remember the ASA statement on p-values from last year? The profession is getting together today and tomorrow (Oct. John Ionnidis, who has published widely on the reproducibility crisis in research, said this morning that "we are drowning in a sea of statistical significance" and "p-values have become a boring nuisance." Too many researchers, under career pressure to produce publishable results, are chasing too much data with too much analysis in pursuit of significant results. The p-value has become a standard that can be gamed ("p-hacking"), opening the door to publication. P-hacking is quite common -- the increasing availability of datasets, including big data, means the number of potentially "significant" relationships that can be hunted is increasing exponentially. And researchers rarely report how much they looked at before finding something that rises to the level of (supposed) statistical significance. So more and more artefacts of random chance are getting passed off as something real.
Free Full text (PDF file size is 975 KB).Use the free Adobe Acrobat Reader to view this PDF file Big data are part of a paradigm shift that is significantly transforming statistical agencies, processes, and data analysis. While administrative and satellite data are already well established, the statistical community is now experimenting with structured and unstructured human-sourced, process-mediated, and machine-generated big data. The proposed SDN sets out a typology of big data for statistics and highlights that opportunities to exploit big data for official statistics will vary across countries and statistical domains. To illustrate the former, examples from a diverse set of countries are presented. To provide a balanced assessment on big data, the proposed SDN also discusses the key challenges that come with proprietary data from the private sector with regard to accessibility, representativeness, and sustainability.