A Knowledge Graph Embeddings based Approach for Author Name Disambiguation using Literals
Santini, Cristian, Gesese, Genet Asefa, Peroni, Silvio, Gangemi, Aldo, Sack, Harald, Alam, Mehwish
–arXiv.org Artificial Intelligence
Data available in scholarly knowledge graphs (SKGs) - i.e., "a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent potentially different relations between these entities" [14] - is growing continuously every day, leading to a plethora of challenges concerning, for instance, article exploration and visualization [17], article recommendation [3], citation recommendation [11], and Author Name Disambiguation (AND) [24], which is relevant for the purposes of the present article. In particular, AND refers to a specific task of entity resolution which aims at resolving author mentions in bibliographic references to real-world people. Author persistent identifiers, such as ORCIDs and VIAFs, simplify the AND activity since such identifiers can be used for reconciling entities defined as different objects and representing the same real-world person. However, the availability of such persistent identifiers in SKGs - such as OpenCitations (OC) [22], AMiner [27] and Microsoft Academic Knowledge Graph (MAKG) [10] - is characterized by very low coverage and, as such, additional and computationally-oriented techniques must be adopted to identify different authors as the same person. In the past, many automatic approaches have been developed to automatically address AND by using publications metadata (e.g., title, abstract, keywords, venue, affiliation, etc.) to extract some features which can be used in the disambiguation task. These methods vary widely from supervised learning methods to unsupervised learning including recently developed deep neural network-based architectures [31]. However, the existing SKGs do not provide all the relevant contextual information necessary to reuse effectively and efficiently such approaches, that often rely on pure textual data. In contrast with the approaches mentioned above, this study focuses on performing AND for scholarly data represented as linked data or included in SKGs by considering the multi-modal information available in such collections, i.e., the structural information consisting of entities and relations between them as well as text or numeric values associated with the authors and publications defined in the form of literals (family name, given name, publication title, venue title, year of publication, etc.). The proposed framework to address this task is named Literally Author Name Disambiguation (LAND), which focuses on tackling the following research questions: - Can Knowledge Graph Embeddings (KGEs) - i.e. a technique that enables the creation of a "dense representation of the graph in a continuous, low-dimensional vector space that can then be used for machine learning tasks"[13] - be used effectively for the downstream task of clustering, more specifically for author name disambiguation?
arXiv.org Artificial Intelligence
Jan-24-2022