The study of short texts in digital politics: Document aggregation for topic modeling
Nakka, Nitheesha, Yalcin, Omer F., Desmarais, Bruce A., Rajtmajer, Sarah, Monroe, Burt
–arXiv.org Artificial Intelligence
Statistical topic modeling is widely used in political science to study text. Researchers examine documents of varying lengths, from tweets to speeches. There is ongoing debate on how document length affects the interpretability of topic models. We investigate the effects of aggregating short documents into larger ones based on natural units that partition the corpus. In our study, we analyze one million tweets by U.S. state legislators from April 2016 to September 2020. We find that for documents aggregated at the account level, topics are more associated with individual states than when using individual tweets. This finding is replicated with Wikipedia pages aggregated by birth cities, showing how document definitions can impact topic modeling results.
arXiv.org Artificial Intelligence
Mar-6-2025
- Country:
- South America > Venezuela (0.04)
- Africa > Ghana (0.04)
- Oceania
- North America
- Puerto Rico (0.04)
- Mexico (0.04)
- Jamaica (0.04)
- Canada > Ontario (0.04)
- United States
- Iowa (0.05)
- Virginia (0.05)
- Missouri (0.05)
- Mississippi (0.04)
- Arkansas (0.04)
- Nebraska (0.04)
- Kentucky (0.04)
- Montana (0.04)
- Utah (0.04)
- Connecticut (0.04)
- Wyoming (0.04)
- Vermont (0.04)
- Nevada (0.04)
- Kansas (0.04)
- District of Columbia > Washington (0.04)
- South Dakota > Day County (0.04)
- Idaho > Ada County
- Boise (0.04)
- Alaska
- Anchorage Municipality > Anchorage (0.14)
- Kusilvak Census Area > Marshall (0.04)
- Indiana > Marion County
- Indianapolis (0.04)
- Florida
- Leon County > Tallahassee (0.04)
- Duval County > Jacksonville (0.04)
- Miami-Dade County > Miami (0.04)
- Hillsborough County > Tampa (0.04)
- Escambia County > Pensacola (0.04)
- Alachua County > Gainesville (0.04)
- Alabama
- Lee County > Auburn (0.04)
- Jefferson County > Birmingham (0.04)
- Oklahoma > Tulsa County
- Tulsa (0.04)
- Minnesota
- St. Louis County > Duluth (0.04)
- Saint Louis County > Duluth (0.04)
- Dakota County > Lakeville (0.04)
- Hennepin County
- Minneapolis (0.14)
- Bloomington (0.04)
- Maine > York County
- Saco (0.04)
- Texas
- Travis County > Austin (0.04)
- Taylor County > Abilene (0.04)
- Dallas County > Richardson (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Arizona
- Pima County > Tucson (0.04)
- Maricopa County > Phoenix (0.04)
- Maryland
- Baltimore (0.28)
- Anne Arundel County > Annapolis (0.04)
- North Carolina
- Wake County > Raleigh (0.04)
- Forsyth County > Winston-Salem (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Illinois > Cook County
- Michigan
- Wayne County > Detroit (0.14)
- Genesee County > Flint (0.14)
- New Jersey > Mercer County
- Trenton (0.04)
- Tennessee
- Shelby County > Memphis (0.04)
- Knox County > Knoxville (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Ohio
- Summit County > Akron (0.14)
- Portage County > Kent (0.04)
- Fairfield County > Pickerington (0.04)
- Louisiana
- Caddo Parish > Shreveport (0.04)
- Orleans Parish > New Orleans (0.04)
- East Baton Rouge Parish > Baton Rouge (0.04)
- Colorado
- Denver County > Denver (0.04)
- Boulder County > Boulder (0.04)
- Pennsylvania
- Philadelphia County > Philadelphia (0.14)
- Centre County > University Park (0.04)
- Lackawanna County > Scranton (0.04)
- Delaware County > Chester (0.04)
- Dauphin County > Harrisburg (0.04)
- Allegheny County > Pittsburgh (0.04)
- Georgia > Chatham County
- Savannah (0.14)
- Massachusetts
- Suffolk County > Boston (0.14)
- Hampshire County > Amherst (0.14)
- New York > Richmond County
- New York City (0.04)
- California
- Sacramento County > Sacramento (0.04)
- San Diego County > San Diego (0.04)
- Mendocino County (0.04)
- Kern County > Bakersfield (0.04)
- Fresno County > Fresno (0.04)
- Los Angeles County
- Los Angeles (0.14)
- Pasadena (0.04)
- Wisconsin > Milwaukee County
- Milwaukee (0.04)
- Cuba > La Habana Province
- Havana (0.04)
- Europe
- Germany (0.04)
- Ireland (0.04)
- Poland (0.04)
- Netherlands (0.04)
- United Kingdom
- Wales (0.04)
- England > Greater London
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Hungary > Budapest
- Budapest (0.04)
- France > Grand Est
- Meurthe-et-Moselle > Nancy (0.04)
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Law (1.00)
- Education (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.92)
- Banking & Finance (0.92)
- Energy (0.92)
- Leisure & Entertainment
- Games (0.93)
- Sports
- Soccer (1.00)
- Olympic Games (1.00)
- Hockey (1.00)
- Football (1.00)
- Basketball (1.00)
- Health & Medicine
- Government
- Media
- Technology:
- Information Technology
- Communications > Social Media (1.00)
- Data Science > Data Mining (0.93)
- Artificial Intelligence
- Natural Language (1.00)
- Machine Learning > Statistical Learning (1.00)
- Information Technology