Rice University statistician Genevera Allen says scientists must keep questioning the accuracy and reproducibility of scientific discoveries made by machine-learning techniques until researchers develop new computational systems that can critique themselves. Allen, associate professor of statistics, computer science and electrical and computer engineering at Rice and of pediatrics-neurology at Baylor College of Medicine, will address the topic in both a press briefing and a general session today at the 2019 Annual Meeting of the American Association for the Advancement of Science (AAAS). "The question is, 'Can we really trust the discoveries that are currently being made using machine-learning techniques applied to large data sets?'" "The answer in many situations is probably, 'Not without checking,' but work is underway on next-generation machine-learning systems that will assess the uncertainty and reproducibility of their predictions." Machine learning (ML) is a branch of statistics and computer science concerned with building computational systems that learn from data rather than following explicit instructions. Allen said much attention in the ML field has focused on developing predictive models that allow ML to make predictions about future data based on its understanding of data it has studied.
Machine learning is everywhere in science and technology: powering facial recognition, picking your recommendations on Netflix, and controlling self-driving cars. But how reliable are machine learning techniques really? A statistician says that the answer is "not very," arguing that questions of accuracy and reproducability of machine learning have not been fully addressed. Dr Genevera Allen, associate professor of statistics, computer science, and electrical and computer engineering Rice University in Houston, Texas has discussed this topic at a press briefing and at a scientific conference, the 2019 Annual Meeting of the American Association for the Advancement of Science (AAAS). She warned that researchers in the field of machine learning have spent so much time developing predictive models that they have not devoted enough attention to checking the accuracy of their models, and that the field must develop systems which can assess the accuracy of their own findings.
Rice University statistician Genevera Allen knew she was raising an important issue when she spoke earlier this month at the American Association for the Advancement of Science (AAAS) annual meeting in Washington, but she was surprised by the magnitude of the response. Allen, associate professor of statistics and founding director of Rice's Center for Transforming Data to Knowledge (D2K Lab), used the forum to raise awareness about the potential lack of reproducibility of data-driven discoveries produced by machine learning (ML). She cautioned her audience not to assume that today's scientific discoveries made via ML are accurate or reproducible. She said that many commonly used ML techniques are designed to always make a prediction and are not designed to report on the uncertainty of the finding. Her comments garnered worldwide media attention, with some commentators questioning the value of ML in data science.
A leading U.S. computer scientist and medical statistician warned that artificial intelligence is being applied with undue haste to analyze data in some areas of biomedical research, leading to inaccurate findings. Artificial intelligence is being applied with undue haste to analyze data in some areas of biomedical research, leading to inaccurate findings, a leading US computer scientist and medical statistician warned on Friday. "I would not trust a very large fraction of the discoveries that are currently being made using machine learning techniques applied to large data sets," Genevera Allen of Baylor College of Medicine and Rice University warned at the American Association for the Advancement of Science annual meeting. Machine learning is a form of AI being applied widely to find patterns and associations within scientific and medical data, for example between genes and diseases. In precision medicine, researchers look for groups of patients with similar DNA profiles so that treatments can be targeted at their particular genetic form of disease.
The accuracy and reproducibility of scientific discoveries made with machine-learning techniques should be questioned by scientists until systems can be developed that effectively critique themselves, according to a researcher from Rice University. Allen says that it appears that discoveries currently being made by applying machine learning to large data sets can probably not be trusted without confirmation, "but work is underway on next-generation machine-learning systems that will assess the uncertainty and reproducibility of their predictions." Developing predictive models has been one of the focuses of the ML field, according to Allen. "A lot of these techniques are designed to always make a prediction," she notes. "They never come back with'I don't know,' or'I didn't discover anything,' because they aren't made to."