Goto

Collaborating Authors

scientific knowledge


Council Post: Integrating Science With Data For Reliable Machine Learning Models

#artificialintelligence

I consult and educate companies to transform technology and data into a valuable, measurable, and monetizable business asset. In my data analytics and machine learning (ML) consulting engagements, I often come across use cases aimed at solving scientific problems using data, such as predicting the failure of a turbine or forecasting the carbon footprint of our IT data center. But what exactly is a scientific problem, and how is it different from a data problem? Is it really necessary to validate a known scientific fact or model again with data? Before answering these questions, let's define some key terms and scientific laws needed to answer these questions.


Text Mining Machines Can Uncover Hidden Scientific Knowledge

#artificialintelligence

Berkeley Lab researchers Vahe Tshitoyan, Anubhav Jain, Leigh Weston, and John Dagdelen used machine learning to analyze 3.3 million abstracts from materials science papers. Sure, computers can be used to play grandmaster-level chess, but can they make scientific discoveries? Researchers at the U.S. Department of Energy's Lawrence Berkeley National Laboratory have shown that an algorithm with no training in materials science can scan the text of millions of papers and uncover new scientific knowledge. A team led by Anubhav Jain, a scientist in Berkeley Lab's Energy Storage & Distributed Resources Division, collected 3.3 million abstracts of published materials science papers and fed them into an algorithm called Word2vec. By analyzing relationships between words the algorithm was able to predict discoveries of new thermoelectric materials years in advance and suggest as-yet unknown materials as candidates for thermoelectric materials.


Thoughtfully Using Artificial Intelligence in Earth Science - Eos

#artificialintelligence

Deriving scientific insights from artificial intelligence methods requires adhering to best practices and moving beyond off-the-shelf approaches. Artificial intelligence (AI) methods have emerged as useful tools in many Earth science domains (e.g., climate models, weather prediction, hydrology, space weather, and solid Earth). AI methods are being used for tasks of prediction, anomaly detection, event classification, and onboard decision-making on satellites, and they could potentially provide high-speed alternatives for representing subgrid processes in climate models [Rasp et al., 2018; Brenowitz and Bretherton, 2019]. Although the use of AI methods has spiked dramatically in recent years, we caution that their use in Earth science should be approached with vigilance and accompanied by the development of best practices for their use. Without best practices, inappropriate use of these methods might lead to "bad science," which could create a general backlash in the Earth science community against the use of AI methods.


Apple, Alibaba, Amazon, and the gang promote state of the art in AI and Knowledge Discovery with Graphs ZDNet

#artificialintelligence

Anchorage may not be the most well-connected location in the world. But as it turns out, when people and data are well-connected, location may follow. Anchorage was host to SIGKDD's Conference on Knowledge Discovery and Data Mining in 2019 or KDD as it's commonly known. The conference is organized by the Association for Computing Machinery (ACM)'s Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). KDD is one of the most well-known and popular events for data science and AI, attracting around 3.500 researchers in 2018 in London.


Douglas Adams was right – knowledge without understanding is meaningless John Naughton

#artificialintelligence

Fans of Douglas Adams's Hitchhiker's Guide to the Galaxy treasure the bit where a group of hyper-dimensional beings demand that a supercomputer tells them the secret to life, the universe and everything. The machine, which has been constructed specifically for this purpose, takes 7.5m years to compute the answer, which famously comes out as 42. The computer helpfully points out that the answer seems meaningless because the beings who instructed it never knew what the question was. Machine-learning may soon enable us to accurately predict how a protein will fold. But it won't be scientific knowledge It's years since I read Adams's wonderful novel, but an article published in Nature last month brought it vividly to mind.


With little training, machine-learning algorithms can uncover hidden scientific knowledge

#artificialintelligence

Researchers at the U.S. Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab) have shown that an algorithm with no training in materials science can scan the text of millions of papers and uncover new scientific knowledge. A team led by Anubhav Jain, a scientist in Berkeley Lab's Energy Storage & Distributed Resources Division, collected 3.3 million abstracts of published materials science papers and fed them into an algorithm called Word2vec. By analyzing relationships between words the algorithm was able to predict discoveries of new thermoelectric materials years in advance and suggest as-yet unknown materials as candidates for thermoelectric materials. "Without telling it anything about materials science, it learned concepts like the periodic table and the crystal structure of metals," said Jain. "That hinted at the potential of the technique. But probably the most interesting thing we figured out is, you can use this algorithm to address gaps in materials research, things that people should study but haven't studied so far."


Text Mining of Scientific Literature Can Lead to New Discoveries

#artificialintelligence

Berkeley Lab researchers (from left) Vahe Tshitoyan, Anubhav Jain, Leigh Weston, and John Dagdelen used machine learning to analyze 3.3 million abstracts from materials science papers. Researchers at the U.S. Department of Energy's Lawrence Berkeley National Laboratory have shown that an algorithm with no training in materials science can scan the text of millions of papers and uncover new scientific knowledge. A team led by Anubhav Jain, a scientist in Berkeley Lab's Energy Storage & Distributed Resources Division, collected 3.3 million abstracts of published materials science papers and fed them into an algorithm called Word2vec. By analyzing relationships between words the algorithm was able to predict discoveries of new thermoelectric materials years in advance and suggest as-yet unknown materials as candidates for thermoelectric materials. "Without telling it anything about materials science, it learned concepts like the periodic table and the crystal structure of metals," says Jain. "That hinted at the potential of the technique. But probably the most interesting thing we figured out is, you can use this algorithm to address gaps in materials research, things that people should study but haven't studied so far."


Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data

arXiv.org Artificial Intelligence

Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this paper, we formally conceptualize the paradigm of TGDS and present a taxonomy of research themes in TGDS. We describe several approaches for integrating domain knowledge in different research themes using illustrative examples from different disciplines. We also highlight some of the promising avenues of novel research for realizing the full potential of theory-guided data science.


The Moral Challenge of Modern Science

AITopics Original Links

A few years ago, in the course of a long speech about health policy, President George W. Bush spoke of the challenge confronting a society increasingly empowered by science. The powers of science are morally neutral -- as easily used for bad purposes as good ones. In the excitement of discovery, we must never forget that mankind is defined not by intelligence alone, but by conscience. Even the most noble ends do not justify every means. In the president's sensible formulation, the moral challenge posed for us by modern science is that our scientific tools simply give us raw power, and it is up to us to determine the right ways to use that power and to proscribe the wrong ways. The notion that science is morally neutral is also widely held and advanced by scientists. Indeed, many scientists wear their neutrality as a badge of honor, presenting themselves as disinterested servants of truth who merely supply society with facts and tools. They leave it up to others to decide how to use them. "Science can only ascertain what is, but not what should be," Albert Einstein said, "and outside of its domain value judgments of all kinds remain necessary." This proposition seems at first perfectly reasonable. The universe, in its benign indifference, is as it is regardless of what we think is right, and it would seem not to pick sides in moral disputes. Science uses knowledge of the natural world to inform us or empower us, but what we do with that knowledge and power remains up to us.


5 ways AI will disrupt science -- Future Earth Media Lab

#artificialintelligence

In 2011, Artificial Intelligence (AI) came of age when IBM's Watson computer beat two human contestants to win Jeopardy. These were not any two contestants. Ken Jennings had won 74 times consecutively and Brad Rutter had pocketed the biggest pot in history -- 3.25 million. In the battle between men and machine, Watson's win was historic. Jennings was sanguine about losing: "I, for one, welcome our new computer overlords."