OAG-BERT: Towards A Unified Backbone Language Model For Academic Knowledge Services
Liu, Xiao, Yin, Da, Zheng, Jingnan, Zhang, Xingjian, Zhang, Peng, Yang, Hongxia, Dong, Yuxiao, Tang, Jie
–arXiv.org Artificial Intelligence
Academic knowledge services have substantially facilitated the development of the science enterprise by providing a plenitude of efficient research tools. However, many applications highly depend on ad-hoc models and expensive human labeling to understand scientific contents, hindering deployments into real products. To build a unified backbone language model for different knowledge-intensive academic applications, we pre-train an academic language model OAG-BERT that integrates both the heterogeneous entity knowledge and scientific corpora in the Open Academic Graph (OAG) -- the largest public academic graph to date. In OAG-BERT, we develop strategies for pre-training text and entity data along with zero-shot inference techniques. In OAG-BERT, we develop strategies for pre-training text and entity data along with zero-shot inference techniques. Its zero-shot capability furthers the path to mitigate the need of expensive annotations. OAG-BERT has been deployed for real-world applications, such as the reviewer recommendation function for National Nature Science Foundation of China (NSFC) -- one of the largest funding agencies in China -- and paper tagging in AMiner. All codes and pre-trained models are available via the CogDL toolkit.
arXiv.org Artificial Intelligence
Oct-3-2022
- Country:
- Africa
- Cameroon > Gulf of Guinea (0.04)
- South Africa > Indian Ocean (0.04)
- Asia
- Europe
- Belgium > Flanders
- Flemish Brabant > Leuven (0.04)
- France > Île-de-France
- Germany > Bavaria
- Upper Bavaria > Munich (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Oxfordshire > Oxford (0.04)
- Belgium > Flanders
- North America > United States
- California > Alameda County
- Berkeley (0.04)
- District of Columbia > Washington (0.05)
- New York > New York County
- New York City (0.04)
- California > Alameda County
- Africa
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine (1.00)
- Information Technology > Services (0.68)
- Technology: