Predicting Long-Term Citations from Short-Term Linguistic Influence
Soni, Sandeep, Bamman, David, Eisenstein, Jacob
–arXiv.org Artificial Intelligence
A standard measure of the influence of a research paper is the number of times it is cited. However, papers may be cited for many reasons, and citation count offers limited information about the extent to which a paper affected the content of subsequent publications. We therefore propose a novel method to quantify linguistic influence in timestamped document collections. There are two main steps: first, identify lexical and semantic changes using contextual embeddings and word frequencies; second, aggregate information about these changes into per-document influence scores by estimating a high-dimensional Hawkes process with a low-rank parameter matrix. We show that this measure of linguistic influence is predictive of $\textit{future}$ citations: the estimate of linguistic influence from the two years after a paper's publication is correlated with and predictive of its citation count in the following three years. This is demonstrated using an online evaluation with incremental temporal training/test splits, in comparison with a strong baseline that includes predictors for initial citation counts, topics, and lexical features.
arXiv.org Artificial Intelligence
Oct-24-2022
- Country:
- North America > United States
- Maryland > Baltimore (0.14)
- Washington > King County
- Seattle (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Alameda County
- Berkeley (0.04)
- Europe
- Asia
- South Korea (0.04)
- China > Hong Kong (0.04)
- Middle East
- Jordan (0.04)
- Republic of Türkiye (0.04)
- North America > United States
- Genre:
- Research Report > Promising Solution (0.34)
- Technology: