Topics, Authors, and Networks in Large Language Model Research: Trends from a Survey of 17K arXiv Papers

Movva, Rajiv, Balachandar, Sidhika, Peng, Kenny, Agostini, Gabriel, Garg, Nikhil, Pierson, Emma

arXiv.org Artificial Intelligence 

Large language model (LLM) research is dramatically impacting society, making it essential to understand the topics and values it prioritizes, the authors and institutions driving it, and its networks of collaboration. Due to the recent growth of the field, many of these fundamental attributes lack systematic description. We gather, annotate, and analyze a new dataset of 16,979 LLM-related arXiv papers, focusing on changes in 2023 vs. 2018-2022. We show that LLM research increasingly focuses on societal impacts: the Computers and Society sub-arXiv has seen 20x growth in its proportion of LLM-related papers in 2023. This change is driven in part by an influx of new authors: a majority of 2023 papers are first-authored by researchers who have not previously written an LLM-related paper, and these papers focus particularly on applications and societal considerations. While a handful of companies hold outsize influence, academia publishes a much larger fraction of papers than industry overall, and this gap widens in 2023. LLM research is also being shaped by social dynamics: there are gender and academic/industry differences in the topics authors prioritize, and a stark U.S./China schism in the collaboration network. Overall, our analysis documents how LLM research both shapes and is shaped by society, attesting to the necessity of sociotechnical lenses; we discuss implications for researchers and policymakers.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found