foursquare
World-POI: Global Point-of-Interest Data Enriched from Foursquare and OpenStreetMap as Tabular and Graph Data
Amiri, Hossein, Hashemi, Mohammad, Züfle, Andreas
Recently, Foursquare released a global dataset with more than 100 million points of interest (POIs), each representing a real-world business on its platform. However, many entries lack complete metadata such as addresses or categories, and some correspond to non-existent or fictional locations. In contrast, OpenStreetMap (OSM) offers a rich, user-contributed POI dataset with detailed and frequently updated metadata, though it does not formally verify whether a POI represents an actual business. In this data paper, we present a methodology that integrates the strengths of both datasets: Foursquare as a comprehensive baseline of commercial POIs and OSM as a source of enriched metadata. The combined dataset totals approximately 1 TB. While this full version is not publicly released, we provide filtered releases with adjustable thresholds that reduce storage needs and make the data practical to download and use across domains. We also provide step-by-step instructions to reproduce the full 631 GB build. Record linkage is achieved by computing name similarity scores and spatial distances between Foursquare and OSM POIs. These measures identify and retain high-confidence matches that correspond to real businesses in Foursquare, have representations in OSM, and show strong name similarity. Finally, we use this filtered dataset to construct a graph-based representation of POIs enriched with attributes from both sources, enabling advanced spatial analyses and a range of downstream applications.
Can We Predict Your Next Move Without Breaking Your Privacy?
Soni, Arpita, Tripathi, Sahil, Kashyap, Gautam Siddharth, Kulahara, Manaswi, Azeez, Mohammad Anas, Siddiqui, Zohaib Hasan, Joshi, Nipun, Gao, Jiechao
We propose FLLL3M--Federated Learning with Large Language Models for Mobility Modeling--a privacy-preserving framework for Next-Location Prediction (NxLP). By retaining user data locally and leveraging LLMs through an efficient outer product mechanism, FLLL3M ensures high accuracy with low resource demands. It achieves SOT results on Gowalla (Acc@1: 12.55, MRR: 0.1422), WeePlace (10.71, 0.1285), Brightkite (10.42, 0.1169), and FourSquare (8.71, 0.1023), while reducing parameters by up to 45.6% and memory usage by 52.7%.
Massive-STEPS: Massive Semantic Trajectories for Understanding POI Check-ins -- Dataset and Benchmarks
Wongso, Wilson, Xue, Hao, Salim, Flora D.
Understanding human mobility through Point-of-Interest (POI) recommendation is increasingly important for applications such as urban planning, personalized services, and generative agent simulation. However, progress in this field is hindered by two key challenges: the over-reliance on older datasets from 2012-2013 and the lack of reproducible, city-level check-in datasets that reflect diverse global regions. To address these gaps, we present Massive-STEPS (Massive Semantic Trajectories for Understanding POI Check-ins), a large-scale, publicly available benchmark dataset built upon the Semantic Trails dataset and enriched with semantic POI metadata. Massive-STEPS spans 12 geographically and culturally diverse cities and features more recent (2017-2018) and longer-duration (24 months) check-in data than prior datasets. We benchmarked a wide range of POI recommendation models on Massive-STEPS using both supervised and zero-shot approaches, and evaluated their performance across multiple urban contexts. By releasing Massive-STEPS, we aim to facilitate reproducible and equitable research in human mobility and POI recommendation. The dataset and benchmarking code are available at: https://github.com/cruiseresearchgroup/Massive-STEPS
REPLAY: Modeling Time-Varying Temporal Regularities of Human Mobility for Location Prediction over Sparse Trajectories
Deng, Bangchao, Qu, Bingqing, Wang, Pengyang, Yang, Dingqi, Fankhauser, Benjamin, Cudre-Mauroux, Philippe
Location prediction forecasts a user's location based on historical user mobility traces. To tackle the intrinsic sparsity issue of real-world user mobility traces, spatiotemporal contexts have been shown as significantly useful. Existing solutions mostly incorporate spatiotemporal distances between locations in mobility traces, either by feeding them as additional inputs to Recurrent Neural Networks (RNNs) or by using them to search for informative past hidden states for prediction. However, such distance-based methods fail to capture the time-varying temporal regularities of human mobility, where human mobility is often more regular in the morning than in other periods, for example; this suggests the usefulness of the actual timestamps besides the temporal distances. Against this background, we propose REPLAY, a general RNN architecture learning to capture the time-varying temporal regularities for location prediction. Specifically, REPLAY not only resorts to the spatiotemporal distances in sparse trajectories to search for the informative past hidden states, but also accommodates the time-varying temporal regularities by incorporating smoothed timestamp embeddings using Gaussian weighted averaging with timestamp-specific learnable bandwidths, which can flexibly adapt to the temporal regularities of different strengths across different timestamps. Our extensive evaluation compares REPLAY against a sizable collection of state-of-the-art techniques on two real-world datasets. Results show that REPLAY consistently and significantly outperforms state-of-the-art methods by 7.7\%-10.9\% in the location prediction task, and the bandwidths reveal interesting patterns of the time-varying temporal regularities.
A Machine-Learned Ranking Algorithm for Dynamic and Personalised Car Pooling Services
Campana, Mattia Giovanni, Delmastro, Franca, Bruno, Raffaele
Car pooling is expected to significantly help in reducing traffic congestion and pollution in cities by enabling drivers to share their cars with travellers with similar itineraries and time schedules. A number of car pooling matching services have been designed in order to efficiently find successful ride matches in a given pool of drivers and potential passengers. However, it is now recognised that many non-monetary aspects and social considerations, besides simple mobility needs, may influence the individual willingness of sharing a ride, which are difficult to predict. To address this problem, in this study we propose GoTogether, a recommender system for car pooling services that leverages on learning-to-rank techniques to automatically derive the personalised ranking model of each user from the history of her choices (i.e., the type of accepted or rejected shared rides). Then, GoTogether builds the list of recommended rides in order to maximise the success rate of the offered matches. To test the performance of our scheme we use real data from Twitter and Foursquare sources in order to generate a dataset of plausible mobility patterns and ride requests in a metropolitan area. The results show that the proposed solution quickly obtain an accurate prediction of the personalised user's choice model both in static and dynamic conditions.
Senior Data Software Engineer at Foursquare - Serbia
Foursquare is the leading independent location technology and data cloud platform, dedicated to building meaningful bridges between digital spaces and physical places. Our proprietary technology unlocks the most accurate, trustworthy location data in the world, empowering businesses to answer key questions, uncover hidden insights, improve customer experiences, and achieve better business outcomes. A pioneer of the geo-location space, Foursquare's location tech stack is being utilized by our mobile apps CityGuide and Swarm, as well as the world's largest enterprises and most recognizable brands, like Amazon, Microsoft, Samsung, Spotify, Uber, Airbnb and others. Foursquare's Marketers Engineering team writes and operate the software which produce core data sets for our Marketers suite of products. These petabyte-scale pipelines process geospatial data for the purposes of marketing use cases, such as ad targeting and attribution.
Data science is becoming software engineering
Editor's note: The Towards Data Science podcast's "Climbing the Data Science Ladder" series is hosted by Jeremie Harris, Edouard Harris and Russell Pollari. Together, they run a data science mentorship startup called SharpestMinds. When I think of the trends I've seen in data science over the last few years, perhaps the most significant and hardest to ignore has been the increased focus on deployment and productionization of models. Not all companies need models deployed to production, of course but at those that do, there's increasing pressure on data science teams to deliver software engineering along with machine learning solutions. That's why I wanted to sit down with Adam Waksman, Head of Core Technology at Foursquare.
Senior Machine Learning Engineer
Foursquare is the leading independent location technology company, powered by our deep understanding of how people move throughout the world. Our solutions help businesses make smarter decisions, developers create more engaging experiences, and brands build more effective marketing strategies. Foursquare's platform includes Attribution, Audience, Pinpoint, Proximity, Places, Pilgrim SDK and Visits. As the industry's first and only accredited company for location data from the Media Rating Council (MRC), this foundation powers all our solutions -- those that exist today and those we have yet to build. Over 14 billion consumer-verified place visit confirmations help us keep our map and models fresh and up-to-date, building a phone's-eye-view of the world with 105 million unique places of interest worldwide.
Modelling Urban Dynamics with Multi-Modal Graph Convolutional Networks
D'Silva, Krittika, Cambe, Jordan, Noulas, Anastasios, Mascolo, Cecilia, Waksman, Adam
Modelling the dynamics of urban venues is a challenging task as it is multifaceted in nature. Demand is a function of many complex and nonlinear features such as neighborhood composition, real-time events, and seasonality. Recent advances in Graph Convolutional Networks (GCNs) have had promising results as they build a graphical representation of a system and harness the potential of deep learning architectures. However, there has been limited work using GCNs in a temporal setting to model dynamic dependencies of the network. Further, within the context of urban environments, there has been no prior work using dynamic GCNs to support venue demand analysis and prediction. In this paper, we propose a novel deep learning framework which aims to better model the popularity and growth of urban venues. Using a longitudinal dataset from location technology platform Foursquare, we model individual venues and venue types across London and Paris. First, representing cities as connected networks of venues, we quantify their structure and note a strong community structure in these retail networks, an observation that highlights the interplay of cooperative and competitive forces that emerge in local ecosystems of retail businesses. Next, we present our deep learning architecture which integrates both spatial and topological features into a temporal model which predicts the demand of a venue at the subsequent time-step. Our experiments demonstrate that our model can learn spatio-temporal trends of venue demand and consistently outperform baseline models. Relative to state-of-the-art deep learning models, our model reduces the RSME by ~ 28% in London and ~ 13% in Paris. Our approach highlights the power of complex network measures and GCNs in building prediction models for urban environments. The model could have numerous applications within the retail sector to better model venue demand and growth.
A Tale of Two Cities -- Clustering Neighborhoods of London and Paris
A Tale of Two cities, a novel written by Charles Dickens was set in London and Paris which takes place during the French Revolution. These cities were both happening then and now. A lot has changed over the years and we now take a look at how the cities have grown. London and Paris are quite the popular tourist and vacation destinations for people all around the world. They are diverse and multicultural and offer a wide variety of experiences that is widely sought after.