Allen, associate professor of statistics, computer science and electrical and computer engineering at Rice and of pediatrics-neurology at Baylor College of Medicine, will address the topic in both a press briefing and a general session today at the 2019 Annual Meeting of the American Association for the Advancement of Science (AAAS). "The question is, 'Can we really trust the discoveries that are currently being made using machine-learning techniques applied to large data sets?'" "The answer in many situations is probably, 'Not without checking,' but work is underway on next-generation machine-learning systems that will assess the uncertainty and reproducibility of their predictions." Machine learning (ML) is a branch of statistics and computer science concerned with building computational systems that learn from data rather than following explicit instructions. Allen said much attention in the ML field has focused on developing predictive models that allow ML to make predictions about future data based on its understanding of data it has studied. "A lot of these techniques are designed to always make a prediction," she said.
Machine learning is everywhere in science and technology: powering facial recognition, picking your recommendations on Netflix, and controlling self-driving cars. But how reliable are machine learning techniques really? A statistician says that the answer is "not very," arguing that questions of accuracy and reproducability of machine learning have not been fully addressed. Dr Genevera Allen, associate professor of statistics, computer science, and electrical and computer engineering Rice University in Houston, Texas has discussed this topic at a press briefing and at a scientific conference, the 2019 Annual Meeting of the American Association for the Advancement of Science (AAAS). She warned that researchers in the field of machine learning have spent so much time developing predictive models that they have not devoted enough attention to checking the accuracy of their models, and that the field must develop systems which can assess the accuracy of their own findings.
The accuracy and reproducibility of scientific discoveries made with machine-learning techniques should be questioned by scientists until systems can be developed that effectively critique themselves, according to a researcher from Rice University. Allen says that it appears that discoveries currently being made by applying machine learning to large data sets can probably not be trusted without confirmation, "but work is underway on next-generation machine-learning systems that will assess the uncertainty and reproducibility of their predictions." Developing predictive models has been one of the focuses of the ML field, according to Allen. "A lot of these techniques are designed to always make a prediction," she notes. "They never come back with'I don't know,' or'I didn't discover anything,' because they aren't made to."
Rice University statistician Genevera Allen knew she was raising an important issue when she spoke earlier this month at the American Association for the Advancement of Science (AAAS) annual meeting in Washington, but she was surprised by the magnitude of the response. Allen, associate professor of statistics and founding director of Rice's Center for Transforming Data to Knowledge (D2K Lab), used the forum to raise awareness about the potential lack of reproducibility of data-driven discoveries produced by machine learning (ML). She cautioned her audience not to assume that today's scientific discoveries made via ML are accurate or reproducible. She said that many commonly used ML techniques are designed to always make a prediction and are not designed to report on the uncertainty of the finding. Her comments garnered worldwide media attention, with some commentators questioning the value of ML in data science.
Statistician Dr. Genevera Allen of Rice University in Houston called it a "crisis in science" as more scientists engaged in machine learning (ML) techniques to analyze their data. Speaking at the American Association for the Advancement of Science (AAAS) in Washington earlier this month, Dr. Allen warned ML is "wasting both time and money" of scientists because it only singles out noise found within existing data patterns which may not be representative of the real World or be reproduced by another experiment. Dr. Allen believes the problem of reproducibility is especially significant when scientists employ ML on genome data to identify patients with similar genomic profiles. A common approach in precision medicine which aims to develop drugs that target specific genome of a disease. However, ML fails to yield consistent results at the moment.