AITopics | author name disambiguation

Collaborating Authors

author name disambiguation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Revisiting gender bias research in bibliometrics: Standardizing methodological variability using Scholarly Data Analysis (SoDA) Cards

Lee, HaeJin, Mishra, Shubhanshu, Mishra, Apratim, You, Zhiwen, Kim, Jinseok, Diesner, Jana

arXiv.org Artificial IntelligenceJan-29-2025

Gender biases in scholarly metrics remain a persistent concern, despite numerous bibliometric studies exploring their presence and absence across productivity, impact, acknowledgment, and self-citations. However, methodological inconsistencies, particularly in author name disambiguation and gender identification, limit the reliability and comparability of these studies, potentially perpetuating misperceptions and hindering effective interventions. A review of 70 relevant publications over the past 12 years reveals a wide range of approaches, from name-based and manual searches to more algorithmic and gold-standard methods, with no clear consensus on best practices. This variability, compounded by challenges such as accurately disambiguating Asian names and managing unassigned gender labels, underscores the urgent need for standardized and robust methodologies. To address this critical gap, we propose the development and implementation of ``Scholarly Data Analysis (SoDA) Cards." These cards will provide a structured framework for documenting and reporting key methodological choices in scholarly data analysis, including author name disambiguation and gender identification procedures. By promoting transparency and reproducibility, SoDA Cards will facilitate more accurate comparisons and aggregations of research findings, ultimately supporting evidence-informed policymaking and enabling the longitudinal tracking of analytical approaches in the study of gender and other social biases in academia.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.18129

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > North Korea (0.14)
North America > Canada > Quebec (0.04)
(18 more...)

Genre:

Research Report > New Finding (0.93)
Overview (0.93)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government (0.93)
Law > Civil Rights & Constitutional Law (0.69)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Recent Developments in Deep Learning-based Author Name Disambiguation

Cappelli, Francesca, Colavizza, Giovanni, Peroni, Silvio

arXiv.org Artificial IntelligenceDec-23-2024

Author Name Disambiguation (AND) is a critical task for digital libraries aiming to link existing authors with their respective publications. Due to the lack of persistent identifiers used by researchers and the presence of intrinsic linguistic challenges, such as homonymy, the development of Deep Learning algorithms to address this issue has become widespread. Many AND deep learning methods have been developed, and surveys exist comparing the approaches in terms of techniques, complexity, performance. However, none explicitly addresses AND methods in the context of deep learning in the latest years (i.e. timeframe 2016-2024). In this paper, we provide a systematic review of state-of-the-art AND techniques based on deep learning, highlighting recent improvements, challenges, and open issues in the field. We find that DL methods have significantly impacted AND by enabling the integration of structured and unstructured data, and hybrid approaches effectively balance supervised and unsupervised learning.

author name disambiguation, dataset, name disambiguation, (12 more...)

arXiv.org Artificial Intelligence

2503.13448

Country:

Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.05)
Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Graph Based Approaches for Author Name Disambiguation

Rastogi, Chetanya, Agarwal, Prabhat, Singh, Shreya

arXiv.org Artificial IntelligenceDec-12-2023

In many applications, such as scientific literature management, researcher search, social network analysis and etc, Name Disambiguation In our project, we aim to implement author name disambiguation (aiming at disambiguating WhoIsWho) has been a challenging techniques to disambiguate profiles of authors with similar names problem. In addition, the growth of scientific literature makes the and affiliations. We study the problem from a network perspective problem more difficult and urgent. Although name disambiguation where researchers communicate with one another by means of their has been extensively studied in academia and industry, the problem publication. The network is modeled as a bipartite graph containing has not been solved well due to the clutter of data and the complexity two types of nodes, viz.

author profile, node, publication, (13 more...)

arXiv.org Artificial Intelligence

2312.08388

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry: Information Technology (0.34)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

PADME-SoSci: A Platform for Analytics and Distributed Machine Learning for the Social Sciences

Boukhers, Zeyd, Bleier, Arnim, Yediel, Yeliz Ucer, Hienstorfer-Heitmann, Mio, Jaberansary, Mehrshad, Koumpis, Adamantios, Beyan, Oya

arXiv.org Artificial IntelligenceApr-3-2023

Data privacy and ownership are significant in social data science, raising legal and ethical concerns. Sharing and analyzing data is difficult when different parties own different parts of it. An approach to this challenge is to apply de-identification or anonymization techniques to the data before collecting it for analysis. However, this can reduce data utility and increase the risk of re-identification. To address these limitations, we present PADME, a distributed analytics tool that federates model implementation and training. PADME uses a federated approach where the model is implemented and deployed by all parties and visits each data location incrementally for training. This enables the analysis of data across locations while still allowing the model to be trained as if all data were in a single location. Training the model on data in its original location preserves data ownership. Furthermore, the results are not provided until the analysis is completed on all data locations to ensure privacy and avoid bias in the results.

analytic model, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2303.182

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Cologne (0.06)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.05)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.85)
Information Technology > Data Science > Data Mining (0.51)

Add feedback

Deep Author Name Disambiguation using DBLP Data

Boukhers, Zeyd, Asundi, Nagaraj Bahubali

arXiv.org Artificial IntelligenceMar-17-2023

In the academic world, the number of scientists grows every year and so does the number of authors sharing the same names. Consequently, it challenging to assign newly published papers to their respective authors. Therefore, Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries. This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities by leveraging their co-authors and domain of research. To this end, we use data collected from the DBLP repository that contains more than 5 million bibliographic records authored by around 2.6 million co-authors. Our approach first groups authors who share the same last names and same first name initials. The author within each group is identified by capturing the relation with his/her co-authors and area of research, represented by the titles of the validated publications of the corresponding author. To this end, we train a neural network model that learns from the representations of the co-authors and titles. We validated the effectiveness of our approach by conducting extensive experiments on a large dataset.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.10067

Country:

Europe > Germany (0.04)
Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.04)
Europe > Italy (0.04)

Genre:

Overview (0.93)
Research Report > New Finding (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Author Name Disambiguation via Heterogeneous Network Embedding from Structural and Semantic Perspectives

Xie, Wenjin, Liu, Siyuan, Wang, Xiaomeng, Jia, Tao

arXiv.org Artificial IntelligenceDec-24-2022

Name ambiguity is common in academic digital libraries, such as multiple authors having the same name. This creates challenges for academic data management and analysis, thus name disambiguation becomes necessary. The procedure of name disambiguation is to divide publications with the same name into different groups, each group belonging to a unique author. A large amount of attribute information in publications makes traditional methods fall into the quagmire of feature selection. These methods always select attributes artificially and equally, which usually causes a negative impact on accuracy. The proposed method is mainly based on representation learning for heterogeneous networks and clustering and exploits the self-attention technology to solve the problem. The presentation of publications is a synthesis of structural and semantic representations. The structural representation is obtained by meta-path-based sampling and a skip-gram-based embedding method, and meta-path level attention is introduced to automatically learn the weight of each feature. The semantic representation is generated using NLP tools. Our proposal performs better in terms of name disambiguation accuracy compared with baselines and the ablation experiments demonstrate the improvement by feature selection and the meta-path level attention in our method. The experimental results show the superiority of our new method for capturing the most attributes from publications and reducing the impact of redundant information.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2212.12715

Country:

Asia > China > Chongqing Province > Chongqing (0.04)
Asia > Middle East > Republic of Türkiye > Adana Province > Adana (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback

A Bayesian Learning, Greedy agglomerative clustering approach and evaluation techniques for Author Name Disambiguation Problem

Sourav, Shashwat

arXiv.org Artificial IntelligenceNov-1-2022

Author names often suffer from ambiguity owing to the same author appearing under different names and multiple authors possessing similar names. It creates difficulty in associating a scholarly work with the person who wrote it, thereby introducing inaccuracy in credit attribution, bibliometric analysis, search-by-author in a digital library, and expert discovery. A plethora of techniques for disambiguation of author names have been proposed in the literature. I try to focus on the research efforts targeted to disambiguate author names. I first go through the conventional methods, then I discuss evaluation techniques and the clustering model which finally leads to the Bayesian learning and Greedy agglomerative approach. I believe this concentrated review will be useful for the research community because it discusses techniques applied to a very large real database that is actively used worldwide. The Bayesian and the greedy agglomerative approach used will help to tackle AND problems in a better way. Finally, I try to outline a few directions for future work.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2211.01303

Country: Asia > India > Madhya Pradesh > Bhopal (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.83)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)

Add feedback

A Knowledge Graph Embeddings based Approach for Author Name Disambiguation using Literals

Santini, Cristian, Gesese, Genet Asefa, Peroni, Silvio, Gangemi, Aldo, Sack, Harald, Alam, Mehwish

arXiv.org Artificial IntelligenceJan-24-2022

Data available in scholarly knowledge graphs (SKGs) - i.e., "a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent potentially different relations between these entities" [14] - is growing continuously every day, leading to a plethora of challenges concerning, for instance, article exploration and visualization [17], article recommendation [3], citation recommendation [11], and Author Name Disambiguation (AND) [24], which is relevant for the purposes of the present article. In particular, AND refers to a specific task of entity resolution which aims at resolving author mentions in bibliographic references to real-world people. Author persistent identifiers, such as ORCIDs and VIAFs, simplify the AND activity since such identifiers can be used for reconciling entities defined as different objects and representing the same real-world person. However, the availability of such persistent identifiers in SKGs - such as OpenCitations (OC) [22], AMiner [27] and Microsoft Academic Knowledge Graph (MAKG) [10] - is characterized by very low coverage and, as such, additional and computationally-oriented techniques must be adopted to identify different authors as the same person. In the past, many automatic approaches have been developed to automatically address AND by using publications metadata (e.g., title, abstract, keywords, venue, affiliation, etc.) to extract some features which can be used in the disambiguation task. These methods vary widely from supervised learning methods to unsupervised learning including recently developed deep neural network-based architectures [31]. However, the existing SKGs do not provide all the relevant contextual information necessary to reuse effectively and efficiently such approaches, that often rely on pure textual data. In contrast with the approaches mentioned above, this study focuses on performing AND for scholarly data represented as linked data or included in SKGs by considering the multi-modal information available in such collections, i.e., the structural information consisting of entities and relations between them as well as text or numeric values associated with the authors and publications defined in the form of literals (family name, given name, publication title, venue title, year of publication, etc.). The proposed framework to address this task is named Literally Author Name Disambiguation (LAND), which focuses on tackling the following research questions: - Can Knowledge Graph Embeddings (KGEs) - i.e. a technique that enables the creation of a "dense representation of the graph in a continuous, low-dimensional vector space that can then be used for machine learning tasks"[13] - be used effectively for the downstream task of clustering, more specifically for author name disambiguation?

author name disambiguation, dataset, information, (10 more...)

arXiv.org Artificial Intelligence

2201.09555

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.06)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.05)
Europe > Netherlands > South Holland > Leiden (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Effective Unsupervised Author Disambiguation with Relative Frequencies

Backes, Tobias

arXiv.org Machine LearningAug-10-2018

This work addresses the problem of author name homonymy in the Web of Science. Aiming for an efficient, simple and straightforward solution, we introduce a novel probabilistic similarity measure for author name disambiguation based on feature overlap. Using the researcher-ID available for a subset of the Web of Science, we evaluate the application of this measure in the context of agglomeratively clustering author mentions. We focus on a concise evaluation that shows clearly for which problem setups and at which time during the clustering process our approach works best. In contrast to most other works in this field, we are skeptical towards the performance of author name disambiguation methods in general and compare our approach to the trivial single-cluster baseline. Our results are presented separately for each correct clustering size as we can explain that, when treating all cases together, the trivial baseline and more sophisticated approaches are hardly distinguishable in terms of evaluation results. Our model shows state-of-the-art performance for all correct clustering sizes without any discriminative training and with tuning only one convergence parameter.

artificial intelligence, disambiguation, machine learning, (15 more...)

arXiv.org Machine Learning

doi: 10.1145/3197026.3197036

1808.04216

Country:

North America > United States > Texas > Tarrant County > Fort Worth (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Data Science (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

The impact of imbalanced training data on machine learning for author name disambiguation

Kim, Jinseok, Kim, Jenna

arXiv.org Machine LearningAug-2-2018

In supervised machine learning for author name disambiguation, negative training data are often dominantly larger than positive training data. This paper examines how the ratios of negative to positive training data can affect the performance of machine learning algorithms to disambiguate author names in bibliographic records. On multiple labeled datasets, three classifiers - Logistic Regression, Na\"ive Bayes, and Random Forest - are trained through representative features such as coauthor names, and title words extracted from the same training data but with various positive-negative training data ratios. Results show that increasing negative training data can improve disambiguation performance but with a few percent of performance gains and sometimes degrade it. Logistic Regression and Na\"ive Bayes learn optimal disambiguation models even with a base ratio (1:1) of positive and negative training data. Also, the performance improvement by Random Forest tends to quickly saturate roughly after 1:10 ~ 1:15. These findings imply that contrary to the common practice using all training data, name disambiguation algorithms can be trained using part of negative training data without degrading much disambiguation performance while increasing computational efficiency. This study calls for more attention from author name disambiguation scholars to methods for machine learning from imbalanced data.

artificial intelligence, machine learning, training data, (13 more...)

arXiv.org Machine Learning

doi: 10.1007/s11192-018-2865-9

1808.00525

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > New York > Onondaga County > Syracuse (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Add feedback