Fine-Grained Entity Typing for Domain Independent Entity Linking
–arXiv.org Artificial Intelligence
Neural entity linking models are very powerful, but run the risk of overfitting to the domain they are trained in. For this problem, a domain can be narrowly construed as a particular distribution of entities, as models can even overfit by memorizing properties of specific frequent entities in a dataset. We tackle the problem of building robust entity linking models that generalize effectively and do not rely on labeled entity linking data with a specific entity distribution. Rather than predicting entities directly, our approach models fine-grained entity properties, which can help disambiguate between even closely related entities. We derive a large inventory of types (tens of thousands) from Wikipedia categories, and use hy-perlinked mentions in Wikipedia to distantly label data and train an entity typing model. At test time, we classify a mention with this typing model and use soft type predictions to link the mention to the most similar candidate entity. We evaluate our entity linking system on the CoNLL-Y AGO (Hoffart et al., 2011) dataset and show that our approach outperforms prior domain-independent entity linking systems. We also test our approach in a harder setting derived from the WikilinksNED dataset (Eshel et al., 2017) where all the mention-entity pairs are unseen during test time. Results indicate that our approach generalizes better than a state-of-the-art neural model on the dataset. 1 Introduction Historically, systems for entity linking to Wikipedia relied on heuristics such as anchor text distributions (Cucerzan, 2007; Milne and Witten, 2008), tf-idf (Ratinov et al., 2011), and Wikipedia relatedness of nearby entities (Hoffart et al., 2011). These systems have few parameters, making them relatively flexible across domains. More recent systems have typically been parameter-rich neural network models (Sun et al., 2015; Y amada et al., 2016; Francis-Landau et al., 2016; Eshel et al., 2017).
arXiv.org Artificial Intelligence
Sep-12-2019