Discovering Types for Entity Disambiguation

#artificialintelligence 

Using the top solution from our type system optimization, we can now label data from Wikipedia using labels generated by the type system. Using this data (in our experiments, 400M tokens for each of English and French), we can now train a bidirectional LSTM to independently predict all the type memberships for each word. On the Wikipedia source text, we only have supervision on intra-wiki links, however this is sufficient to train a deep neural network to predict type membership with an F1 of over 0.91. One of our type systems, discovered by beam search, includes types such as Aviation, Clothing, and Games (as well as surprisingly specific ones like 1754 in Canada -- indicating 1754 was an exciting year in the dataset of 1,000 Wikipedia articles it was trained on); you can also view the full type system. Predicting entities in a document usually relies on a "coherence" metric between different entities, e.g.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found