Text-Based Twitter User Geolocation Prediction
Han, B., Cook, P., Baldwin, T.
–Journal of Artificial Intelligence Research
Geographical location is vital to geospatial applications like local search and event detection. In this paper, we investigate and improve on the task of text-based geolocation prediction of Twitter users. Previous studies on this topic have typically assumed that geographical references (e.g., gazetteer terms, dialectal words) in a text are indicative of its authors location. However, these references are often buried in informal, ungrammatical, and multilingual data, and are therefore non-trivial to identify and exploit. We present an integrated geolocation prediction framework and investigate what factors impact on prediction accuracy. First, we evaluate a range of feature selection methods to obtain location indicative words. We then evaluate the impact of non-geotagged tweets, language, and user-declared metadata on geolocation prediction. In addition, we evaluate the impact of temporal variance on model generalisation, and discuss how users differ in terms of their geolocatability. We achieve state-of-the-art results for the text-based Twitter user geolocation task, and also provide the most extensive exploration of the task to date. Our findings provide valuable insights into the design of robust, practical text-based geolocation prediction systems.
Journal of Artificial Intelligence Research
Mar-20-2014
- Country:
- South America > Brazil
- Rio de Janeiro > Rio de Janeiro (0.04)
- Bahia > Salvador (0.04)
- Oceania > Australia
- North America
- Mexico (0.04)
- United States
- New York (0.04)
- New Hampshire (0.04)
- Pennsylvania (0.04)
- District of Columbia > Washington (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Texas > Tarrant County
- Fort Worth (0.04)
- California
- Los Angeles County > Los Angeles (0.04)
- San Diego County > San Diego (0.04)
- San Francisco County > San Francisco (0.04)
- Canada
- Europe
- Austria > Vienna (0.14)
- Germany (0.04)
- Russia (0.04)
- Czechia > Prague (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Spain
- Galicia > Madrid (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- United Kingdom > England
- South Yorkshire > Sheffield (0.04)
- France
- Île-de-France > Paris
- Paris (0.04)
- Auvergne-Rhône-Alpes > Lyon
- Lyon (0.04)
- Île-de-France > Paris
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Asia
- South Korea (0.14)
- Singapore (0.04)
- Russia (0.04)
- Indonesia > Java
- India
- Thailand > Bangkok
- Bangkok (0.04)
- China > Beijing
- Beijing (0.04)
- Middle East
- Jordan (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Malaysia > Kuala Lumpur
- Kuala Lumpur (0.04)
- Japan > Honshū
- Chūbu > Aichi Prefecture > Nagoya (0.04)
- Africa > Middle East
- Egypt > Cairo Governorate > Cairo (0.04)
- South America > Brazil
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Information Technology > Services (1.00)
- Technology: