KnowledgeShovel: An AI-in-the-Loop Document Annotation System for Scientific Knowledge Base Construction

Zhang, Shao, Jia, Yuting, Xu, Hui, Wang, Dakuo, Li, Toby Jia-jun, Wen, Ying, Wang, Xinbing, Zhou, Chenghu

arXiv.org Artificial Intelligence 

Scientific knowledge bases [16, 23], a collection of structured and verified research results that consists of various numeric, word-oriented, or image-organized data, emerge in this context and bring entirely new approaches and opportunities to scientific research. Researchers in many disciplines uses AI techniques and the scientific knowledge bases, often constructed from the published literature, to drive scientific discoveries [38, 45, 46], such as Geoscience [10, 64], Medicine [9], Biology [3], Chemistry [50]. The rapid development of AI and data science has further promoted the development of scientific knowledge base [26, 42]. For example, AlphaFold [27], which uses Protein Data Bank [63] as input data, can accurately predict protein structure and greatly promote the development of biological and medical research [12, 39]. Although successful research examples illustrate the importance of scientific knowledge bases for scientific research in the data explosive age, there are still many challenges in the composition of the scientific knowledge base and the construction process due to their characteristics. The characteristic of a scientific knowledge base composition is that it is described around one type of scientific entity. For example, "sample" is a general type of scientific entity. The data contained are the values and sources of the relevant attributes of the scientific entity. The current process of constructing a scientific knowledge base includes four main steps:literature collection, entity and attribute extraction, entity linking, and data storage (see Figure 2).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found