Goto

Collaborating Authors

Text Mining of Scientific Literature Can Lead to New Discoveries

#artificialintelligence

Berkeley Lab researchers (from left) Vahe Tshitoyan, Anubhav Jain, Leigh Weston, and John Dagdelen used machine learning to analyze 3.3 million abstracts from materials science papers. Researchers at the U.S. Department of Energy's Lawrence Berkeley National Laboratory have shown that an algorithm with no training in materials science can scan the text of millions of papers and uncover new scientific knowledge. A team led by Anubhav Jain, a scientist in Berkeley Lab's Energy Storage & Distributed Resources Division, collected 3.3 million abstracts of published materials science papers and fed them into an algorithm called Word2vec. By analyzing relationships between words the algorithm was able to predict discoveries of new thermoelectric materials years in advance and suggest as-yet unknown materials as candidates for thermoelectric materials. "Without telling it anything about materials science, it learned concepts like the periodic table and the crystal structure of metals," says Jain. "That hinted at the potential of the technique. But probably the most interesting thing we figured out is, you can use this algorithm to address gaps in materials research, things that people should study but haven't studied so far."


AI Trained on Old Scientific Papers Makes Discoveries Humans Missed

#artificialintelligence

Using just the language in millions of old scientific papers, a machine learning algorithm was able to make completely new scientific discoveries. In a study published in Nature on July 3, researchers from the Lawrence Berkeley National Laboratory used an algorithm called Word2Vec sift through scientific papers for connections humans had missed. Their algorithm then spit out predictions for possible thermoelectric materials, which convert heat to energy and are used in many heating and cooling applications. The algorithm didn't know the definition of thermoelectric, though. It received no training in materials science.


AI Trained on Old Scientific Papers Makes Discoveries Humans Missed

#artificialintelligence

Using just the language in millions of old scientific papers, a machine learning algorithm was able to make completely new scientific discoveries. In a study published in Nature on July 3, researchers from the Lawrence Berkeley National Laboratory used an algorithm called Word2Vec sift through scientific papers for connections humans had missed. Their algorithm then spit out predictions for possible thermoelectric materials, which convert heat to energy and are used in many heating and cooling applications. The algorithm didn't know the definition of thermoelectric, though. It received no training in materials science.


With little training, machine-learning algorithms can uncover hidden scientific knowledge

#artificialintelligence

Researchers at the U.S. Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab) have shown that an algorithm with no training in materials science can scan the text of millions of papers and uncover new scientific knowledge. A team led by Anubhav Jain, a scientist in Berkeley Lab's Energy Storage & Distributed Resources Division, collected 3.3 million abstracts of published materials science papers and fed them into an algorithm called Word2vec. By analyzing relationships between words the algorithm was able to predict discoveries of new thermoelectric materials years in advance and suggest as-yet unknown materials as candidates for thermoelectric materials. "Without telling it anything about materials science, it learned concepts like the periodic table and the crystal structure of metals," said Jain. "That hinted at the potential of the technique. But probably the most interesting thing we figured out is, you can use this algorithm to address gaps in materials research, things that people should study but haven't studied so far."


Tidying up the mess

Science

Thermoelectric materials are engines that convert heat into an electrical current. Intuitively, the efficiency of this process depends on how many electrons (charge carriers) can move and how easily they do so, how much energy those moving electrons transport, and how easily the temperature gradient is maintained. In terms of material properties, an excellent thermoelectric material requires a high electrical conductivity σ, a high Seebeck coefficient S (a measure of the induced thermoelectric voltage as a function of temperature gradient), and a low thermal conductivity κ. The challenge is that these three properties are strongly interrelated in a conflicting manner ([ 1 ][1]). On page 722 of this issue, Roychowdhury et al. ([ 2 ][2]) have found a way to partially break these ties in silver antimony telluride (AgSbTe2) with the addition of cadmium (Cd) cations, which increase the ordering in this inherently disordered thermoelectric material. The thermoelectric effect was discovered more than 200 years ago when Volta was conducting experiments with metallic junctions, vessels of water at different temperatures, and dead frogs ([ 3 ][3]). Since then, physicists, chemists, engineers, and material scientists have tried to identify the nature of the effect; understand its microscopic origin; and find, design, and optimize materials with the highest thermoelectric conversion efficiency. The quality of a thermoelectric material is evaluated by using the dimensionless thermoelectric figure of merit, ZT , which includes the relevant properties mentioned above: ZT = σS2 T κ−1, where T is the absolute temperature. Two components contribute to κ: the lattice thermal conductivity κlatt, which represents heat carried by atomic vibrations (phonons), and electronic thermal conductivity κele, which represents heat carried by electrons. As the number of free charge carriers increases, σ and κele also increase, S decreases, and κlatt is unaltered. The optimal charge-carrier concentration for most thermoelectric materials is between 1019 and 1020 carriers per cubic centimeter ([ 4 ][4]). Once the carrier concentration is optimized, further improvements in the ZT require modifying the material electronic structure and its phononic interactions ([ 5 ][5]). In practice, all of these changes are achieved by slightly modifying the compound's chemical composition by using solid solutions or by introducing nanometric secondary phases ([ 6 ][6]). The carrier concentration is commonly tuned by substituting atoms in the crystal structure with other atoms with a different valence. These electronic impurities, or dopants ([ 7 ][7]), change the chemical potential of electrons (the Fermi level) and may also affect the material's electronic structure. The energy-band structure is linked with the overlapping of atomic orbitals and properties such as the ionization energy, bond energy, and bond length. Thus, the choice of an adequate substitutional atom to modify the electronic band becomes less intuitive ([ 6 ][6]). Introducing atomic defects in the host lattice will also directly affect κ by adding disorder and local strain, which alters the lattice vibrations. Last, given that all of the above is optimized, κ can be further reduced by introducing nano- or microscale features (ideally transparent to electrons) that act as barriers for the propagation of phonons ([ 8 ][8]). Usually, to maximize ZT in a given material, all the above strategies should be cumulatively integrated. In the well-studied material lead telluride (PbTe) ([ 9 ][9]), the optimal number of free carriers is obtained by replacing a small percentage of the double-charged Pb atoms with a single-charged cation. Proper doping with Na creates a p-type material, Pb0.98Na0.02Te, which has a carrier concentration of around 1020 carriers per cubic centimeter. A solid solution of PbTe with SrTe increases the bandgap and shifts the light and heavy bands closer to each other. Such band conversion increases S . Last, secondary SrTe phases are formed in the PbTe matrix material by pushing the content of Sr beyond the solubility limit, which reduces κlatt. ![Figure][10] Thermoelectricity comes to order Roychowdhury et al. improved the thermoelectric performance of silver antimony telluride (AgSbTe2) by partially substituting antimony (Sb) atoms with cadmium (Cd). GRAPHIC: A. KITTERMAN/ SCIENCE In short, to maximize ZT for PbTe, it was necessary to introduce lattice imperfections (Na and Sr at the Pb sites) and nanoprecipitates (SrTe)—that is, increase material structural disorder. Roychowdhury et al. embraced the challenge of optimizing the thermoelectric performance of an already highly disordered material, AgSbTe2, a complex material system that can exhibit different structural properties depending on the synthetic conditions ([ 10 ][11]). In line with previous reports, Roychowdhury et al. showed that polycrystalline AgSbTe2 is inherently disordered because of the random distribution of Ag and Sb over the cubic structure's cationic sites (see the figure) and appearance of Ag2Te secondary phase ([ 11 ][12], [ 12 ][13]). To modify material properties, they replace Sb3+ atoms with Cd2+. Such elemental substitution has further consequences than the simple increase of free charge carriers. Theoretical calculations showed that Cd preferentially occupies disordered Sb sites, which lowered ordered cationic configurations' formation energy and favored their presence. Although the disordered phase is cubic, the ordered counterpart can be formed either in a cubic or rhombohedral crystal structure with different cationic configurations at a similar energy cost, which results in Cd-doped AgSbTe2 having different ordered nanoscale domains (see the figure). The strain between the two lattice configuration results in nanoscale superstructures (strain ripples). The promoted changes in the atomic arrangement directly translate into an enhanced σ and reduced κ. Roychowdhury et al. explained that the increase in σ is associated with the increase of carriers and the reduction of disorder-induced fluctuations in the lattice potential, which facilitate carrier movement through the material. Two main factors hinder heat transport in this Cd-doped higher-order configuration: The nanoscale domain ripples as well as the local strain around Cd sites. Overall, Cd's multifaceted role allowed the authors to achieve a ZT of up to 1.5 at room temperature, a maximum ZT of 2.6 at 573 K, and an average ZT of 1.8. These values are among the highest reported to date. The idiosyncrasy of the strategy presented by Roychowdhury et al. is that the introduction of a foreign atom into the crystal lattice yields a material with a higher ordered state. Considering that most thermoelectric materials are optimized through added disorder (as in the archetypal PbTe case), this result suggests a new avenue for thermoelectric material design through the optimization of atomic ordering. 1. [↵][14]1. A. Zevalkink et al ., Appl. Phys. Rev. 5, 021303 (2018). [OpenUrl][15] 2. [↵][16]1. S. Roychowdhury et al ., Science 371, 722 (2021). [OpenUrl][17][Abstract/FREE Full Text][18] 3. [↵][19]1. D. Beretta et al ., Mater. Sci. Eng. Rep. 138, 100501 (2019). [OpenUrl][20] 4. [↵][21]1. G. J. Snyder, 2. E. S. Toberer , Nat. Mater. 7, 105 (2008). [OpenUrl][22][CrossRef][23][PubMed][24][Web of Science][25] 5. [↵][26]1. L.-D. Zhao, 2. V. P. Dravid, 3. M. G. Kanatzidis , Energy Environ. Sci. 7, 251 (2014). [OpenUrl][27][CrossRef][28] 6. [↵][29]1. W. G. Zeier et al ., Angew. Chem. Int. Ed. 55, 6826 (2016). [OpenUrl][30][CrossRef][31][PubMed][32] 7. [↵][33]1. M. Ibáñez et al ., Chem. Mater. 29, 7093 (2017). [OpenUrl][34] 8. [↵][35]1. K. Biswas et al ., Nature 489, 414 (2012). [OpenUrl][36][CrossRef][37][PubMed][38][Web of Science][39] 9. [↵][40]1. G. Tan et al ., Nat. Commun. 7, 12167 (2016). [OpenUrl][41] 10. [↵][42]1. S. V. Barabash, 2. V. Ozolins, 3. C. Wolverton , Phys. Rev. Lett. 101, 155704 (2008). [OpenUrl][43][CrossRef][44][PubMed][45] 11. [↵][46]1. K. T. Wojciechowski, 2. M. Schmidt , Phys. Rev. B Condens. Matter Mater. Phys. 79, 184202 (2009). [OpenUrl][47] 12. [↵][48]1. R. W. Armstrong, 2. J. W. Faust Jr., 3. W. A. Tiller , J. Appl. Phys. 31, 1954 (1960). [OpenUrl][49][CrossRef][50] [1]: #ref-1 [2]: #ref-2 [3]: #ref-3 [4]: #ref-4 [5]: #ref-5 [6]: #ref-6 [7]: #ref-7 [8]: #ref-8 [9]: #ref-9 [10]: pending:yes [11]: #ref-10 [12]: #ref-11 [13]: #ref-12 [14]: #xref-ref-1-1 "View reference 1 in text" [15]: {openurl}?query=rft.jtitle%253DAppl.%2BPhys.%2BRev.%26rft.volume%253D5%26rft.spage%253D021303%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [16]: #xref-ref-2-1 "View reference 2 in text" [17]: {openurl}?query=rft.jtitle%253DScience%26rft.stitle%253DScience%26rft.aulast%253DRoychowdhury%26rft.auinit1%253DS.%26rft.volume%253D371%26rft.issue%253D6530%26rft.spage%253D722%26rft.epage%253D727%26rft.atitle%253DEnhanced%2Batomic%2Bordering%2Bleads%2Bto%2Bhigh%2Bthermoelectric%2Bperformance%2Bin%2BAgSbTe2%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscience.abb3517%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [18]: /lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNzEvNjUzMC83MjIiO3M6NDoiYXRvbSI7czoyMjoiL3NjaS8zNzEvNjUzMC82NzguYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9 [19]: #xref-ref-3-1 "View reference 3 in text" [20]: {openurl}?query=rft.jtitle%253DMater.%2BSci.%2BEng.%2BRep.%26rft.volume%253D138%26rft.spage%253D100501%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [21]: #xref-ref-4-1 "View reference 4 in text" [22]: {openurl}?query=rft.jtitle%253DNature%2Bmaterials%26rft.stitle%253DNat%2BMater%26rft.aulast%253DSnyder%26rft.auinit1%253DG.%2BJ.%26rft.volume%253D7%26rft.issue%253D2%26rft.spage%253D105%26rft.epage%253D114%26rft.atitle%253DComplex%2Bthermoelectric%2Bmaterials.%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fnmat2090%26rft_id%253Dinfo%253Apmid%252F18219332%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [23]: /lookup/external-ref?access_num=10.1038/nmat2090&link_type=DOI [24]: /lookup/external-ref?access_num=18219332&link_type=MED&atom=%2Fsci%2F371%2F6530%2F678.atom [25]: /lookup/external-ref?access_num=000252673000014&link_type=ISI [26]: #xref-ref-5-1 "View reference 5 in text" [27]: {openurl}?query=rft.jtitle%253DEnergy%2BEnviron.%2BSci.%26rft.volume%253D7%26rft.spage%253D251%26rft_id%253Dinfo%253Adoi%252F10.1039%252FC3EE43099E%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [28]: /lookup/external-ref?access_num=10.1039/C3EE43099E&link_type=DOI [29]: #xref-ref-6-1 "View reference 6 in text" [30]: {openurl}?query=rft.jtitle%253DAngew.%2BChem.%2BInt.%2BEd.%26rft.volume%253D55%26rft.spage%253D6826%26rft_id%253Dinfo%253Adoi%252F10.1002%252Fanie.201508381%26rft_id%253Dinfo%253Apmid%252F27111867%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [31]: /lookup/external-ref?access_num=10.1002/anie.201508381&link_type=DOI [32]: /lookup/external-ref?access_num=27111867&link_type=MED&atom=%2Fsci%2F371%2F6530%2F678.atom [33]: #xref-ref-7-1 "View reference 7 in text" [34]: {openurl}?query=rft.jtitle%253DChem.%2BMater.%26rft.volume%253D29%26rft.spage%253D7093%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [35]: #xref-ref-8-1 "View reference 8 in text" [36]: {openurl}?query=rft.jtitle%253DNature%26rft.stitle%253DNature%26rft.aulast%253DBiswas%26rft.auinit1%253DK.%26rft.volume%253D489%26rft.issue%253D7416%26rft.spage%253D414%26rft.epage%253D418%26rft.atitle%253DHigh-performance%2Bbulk%2Bthermoelectrics%2Bwith%2Ball-scale%2Bhierarchical%2Barchitectures.%26rft_id%253Dinfo%253Adoi%252F10.1038%252Fnature11439%26rft_id%253Dinfo%253Apmid%252F22996556%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [37]: /lookup/external-ref?access_num=10.1038/nature11439&link_type=DOI [38]: /lookup/external-ref?access_num=22996556&link_type=MED&atom=%2Fsci%2F371%2F6530%2F678.atom [39]: /lookup/external-ref?access_num=000308860900041&link_type=ISI [40]: #xref-ref-9-1 "View reference 9 in text" [41]: {openurl}?query=rft.jtitle%253DNat.%2BCommun.%26rft.volume%253D7%26rft.spage%253D12167%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [42]: #xref-ref-10-1 "View reference 10 in text" [43]: {openurl}?query=rft.jtitle%253DPhysical%2BReview%2BLetters%26rft.stitle%253DPhysical%2BReview%2BLetters%26rft.aulast%253DBarabash%26rft.auinit1%253DS.%2BV.%26rft.volume%253D101%26rft.issue%253D15%26rft.spage%253D155704%26rft.epage%253D155704%26rft.atitle%253DFirst-principles%2Btheory%2Bof%2Bcompeting%2Border%2Btypes%252C%2Bphase%2Bseparation%252C%2Band%2Bphonon%2Bspectra%2Bin%2Bthermoelectric%2BAgPbmSbTe%2528m%252B2%2529%2Balloys.%26rft_id%253Dinfo%253Adoi%252F10.1103%252FPhysRevLett.101.155704%26rft_id%253Dinfo%253Apmid%252F18999614%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [44]: /lookup/external-ref?access_num=10.1103/PhysRevLett.101.155704&link_type=DOI [45]: /lookup/external-ref?access_num=18999614&link_type=MED&atom=%2Fsci%2F371%2F6530%2F678.atom [46]: #xref-ref-11-1 "View reference 11 in text" [47]: {openurl}?query=rft.jtitle%253DPhys.%2BRev.%2BB%2BCondens.%2BMatter%2BMater.%2BPhys.%26rft.volume%253D79%26rft.spage%253D184202%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [48]: #xref-ref-12-1 "View reference 12 in text" [49]: {openurl}?query=rft.jtitle%253DJ.%2BAppl.%2BPhys.%26rft.volume%253D31%26rft.spage%253D1954%26rft_id%253Dinfo%253Adoi%252F10.1063%252F1.1735478%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [50]: /lookup/external-ref?access_num=10.1063/1.1735478&link_type=DOI