Predicting Links on Wikipedia with Anchor Text Information
Brochier, Robin, Béchet, Frédéric
–arXiv.org Artificial Intelligence
Wikipedia, the largest open-collaborative online encyclopedia, is a corpus of documents bound together by internal hyperlinks. These links form the building blocks of a large network whose structure contains important information on the concepts covered in this encyclopedia. The presence of a link between two articles, materialised by an anchor text in the source page pointing to the target page, can increase readers' understanding of a topic. However, the process of linking follows specific editorial rules to avoid both under-linking and over-linking. In this paper, we study the transductive and the inductive tasks of link prediction on several subsets of the English Wikipedia and identify some key challenges behind automatic linking based on anchor text information. We propose an appropriate evaluation sampling methodology and compare several algorithms. Moreover, we propose baseline models that provide a good estimation of the overall difficulty of the tasks.
arXiv.org Artificial Intelligence
May-25-2021
- Country:
- North America
- Canada (0.04)
- United States > New York
- New York County > New York City (0.04)
- Europe
- United Kingdom (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.05)
- North America
- Genre:
- Research Report (0.64)
- Industry:
- Technology:
- Information Technology
- Information Management > Search (0.90)
- Data Science > Data Mining (0.89)
- Communications
- Social Media (1.00)
- Web (0.69)
- Artificial Intelligence
- Natural Language > Information Retrieval (0.69)
- Machine Learning > Statistical Learning (0.47)
- Information Technology