Text Mining of Scientific Literature Can Lead to New Discoveries

#artificialintelligence

Berkeley Lab researchers (from left) Vahe Tshitoyan, Anubhav Jain, Leigh Weston, and John Dagdelen used machine learning to analyze 3.3 million abstracts from materials science papers. Researchers at the U.S. Department of Energy's Lawrence Berkeley National Laboratory have shown that an algorithm with no training in materials science can scan the text of millions of papers and uncover new scientific knowledge. A team led by Anubhav Jain, a scientist in Berkeley Lab's Energy Storage & Distributed Resources Division, collected 3.3 million abstracts of published materials science papers and fed them into an algorithm called Word2vec. By analyzing relationships between words the algorithm was able to predict discoveries of new thermoelectric materials years in advance and suggest as-yet unknown materials as candidates for thermoelectric materials. "Without telling it anything about materials science, it learned concepts like the periodic table and the crystal structure of metals," says Jain. "That hinted at the potential of the technique. But probably the most interesting thing we figured out is, you can use this algorithm to address gaps in materials research, things that people should study but haven't studied so far."


With little training, machine-learning algorithms can uncover hidden scientific knowledge

#artificialintelligence

Researchers at the U.S. Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab) have shown that an algorithm with no training in materials science can scan the text of millions of papers and uncover new scientific knowledge. A team led by Anubhav Jain, a scientist in Berkeley Lab's Energy Storage & Distributed Resources Division, collected 3.3 million abstracts of published materials science papers and fed them into an algorithm called Word2vec. By analyzing relationships between words the algorithm was able to predict discoveries of new thermoelectric materials years in advance and suggest as-yet unknown materials as candidates for thermoelectric materials. "Without telling it anything about materials science, it learned concepts like the periodic table and the crystal structure of metals," said Jain. "That hinted at the potential of the technique. But probably the most interesting thing we figured out is, you can use this algorithm to address gaps in materials research, things that people should study but haven't studied so far."


Artificial Intelligence Set Loose On Old Scientific Papers Discovers Something Humans Missed

#artificialintelligence

Researchers at Lawrence Berkeley National Laboratory have developed an artificial intelligence (AI) that, with very little training, has made discoveries in material science. To spot what scientists had missed, all the AI had to do was read millions of previously published scientific papers. The AI approach is known as machine learning. It is an algorithm capable of being trained on a particular task until, after many iterations, it can produce something that makes sense. Machine-learning approaches are being used to solve many problems, and this team used it to look for latent knowledge in the world of materials science.


AI Trained on Old Scientific Papers Makes Discoveries Humans Missed

#artificialintelligence

Using just the language in millions of old scientific papers, a machine learning algorithm was able to make completely new scientific discoveries. In a study published in Nature on July 3, researchers from the Lawrence Berkeley National Laboratory used an algorithm called Word2Vec sift through scientific papers for connections humans had missed. Their algorithm then spit out predictions for possible thermoelectric materials, which convert heat to energy and are used in many heating and cooling applications. The algorithm didn't know the definition of thermoelectric, though. It received no training in materials science.


AI Trained on Old Scientific Papers Makes Discoveries Humans Missed

#artificialintelligence

Using just the language in millions of old scientific papers, a machine learning algorithm was able to make completely new scientific discoveries. In a study published in Nature on July 3, researchers from the Lawrence Berkeley National Laboratory used an algorithm called Word2Vec sift through scientific papers for connections humans had missed. Their algorithm then spit out predictions for possible thermoelectric materials, which convert heat to energy and are used in many heating and cooling applications. The algorithm didn't know the definition of thermoelectric, though. It received no training in materials science.