T5 meets Tybalt: Author Attribution in Early Modern English Drama Using Large Language Models
Hicke, Rebecca M. M., Mimno, David
–arXiv.org Artificial Intelligence
Large language models have shown breakthrough potential in many NLP domains. Here we consider their use for stylometry, specifically authorship identification in Early Modern English drama. We find both promising and concerning results; LLMs are able to accurately predict the author of surprisingly short passages but are also prone to confidently misattribute texts to specific authors. A fine-tuned t5-large model outperforms all tested baselines, including logistic regression, SVM with a linear kernel, and cosine delta, at attributing small passages. However, we see indications that the presence of certain authors in the model's pre-training data affects predictive results in ways that are difficult to assess.
arXiv.org Artificial Intelligence
Oct-27-2023
- Country:
- Africa > Middle East
- Egypt (0.04)
- Asia > China
- Jilin Province > Changchun (0.04)
- Europe
- Belgium > Flanders
- Antwerp Province > Antwerp (0.04)
- France > Île-de-France
- Ireland (0.04)
- Italy
- Emilia-Romagna > Metropolitan City of Bologna
- Bologna (0.04)
- Sicily (0.04)
- Emilia-Romagna > Metropolitan City of Bologna
- Middle East > Malta (0.04)
- Spain > Aragón (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.14)
- Belgium > Flanders
- North America > United States
- Hawaii > Honolulu County
- Honolulu (0.04)
- New York > New York County
- New York City (0.04)
- Virginia (0.04)
- Washington > King County
- Seattle (0.04)
- Hawaii > Honolulu County
- Oceania > Australia
- Victoria (0.04)
- South America
- Brazil > Bahia
- Salvador (0.04)
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Brazil > Bahia
- Africa > Middle East
- Genre:
- Research Report
- Experimental Study (0.49)
- New Finding (0.66)
- Research Report
- Industry:
- Government (0.46)
- Leisure & Entertainment (0.67)
- Technology: