Generating Landmark Navigation Instructions from Maps as a Graph-to-Text Problem
Schumann, Raphael, Riezler, Stefan
–arXiv.org Artificial Intelligence
Car-focused navigation services are based on turns and distances of named streets, whereas navigation instructions naturally used by humans are centered around physical objects called landmarks. We present a neural model that takes Open-StreetMap representations as input and learns to generate navigation instructions that contain visible and salient landmarks from human natural language instructions. Routes on the map are encoded in a location-and rotation-invariant graph representation that is decoded into natural language instructions. Our work is based on a novel dataset of 7,672 crowd-sourced instances that have been verified by human navigation in Street View. Our evaluation shows that the navigation instructions generated by our system have similar properties as human-generated instructions, and lead to successful human navigation in Street View. Current navigation services provided by the automotive industry or by Google Maps generate route instructions based on turns and distances of named streets. In contrast, humans naturally use an efficient mode of navigation based on visible and salient physical objects called landmarks. Route instructions based on landmarks are useful if GPS tracking is poor or not available, and if information is inexact regarding distances (e.g., in human estimates) or street names (e.g., for users riding a bicycle or on a bus). In our framework, routes on the map are learned by discretizing the street layout, connecting street segments with adjacent points of interest - thus encoding visibility of landmarks, and encoding the route and surrounding landmarks in a location-and rotation-invariant graph representation. Based on crowd-sourced natural language instructions for such map representations, a graph-to-text mapping is learned that decodes graph representations into natural language route instructions that contain salient landmarks. Our work is accompanied by a dataset of 7,672 instances of routes rendered on OpenStreetMap and crowd-sourced natural language instructions. The navigation instructions were generated by workers on the basis of maps including all points of interest, but no street names. Furthermore, the timenormalized success rate of human workers finding the correct goal location on Street View is at 66%. Since these routes can have a partial overlap with routes in the training set, we further performed an evaluation on completely unseen routes. The rate of produced landmarks drops slightly compared to human references, and the time-normalized success rate also drops slightly to 63%. While there is still room for improvement, our results showcase a promising direction of research, with a wide potential of applications in various existing map applications and navigation systems. Mirowski et al. (2018) published a subset of Street View covering parts of New York City and Pittsburgh.
arXiv.org Artificial Intelligence
Dec-30-2020
- Country:
- Asia > China
- Hong Kong (0.04)
- Europe
- Belgium (0.04)
- France (0.04)
- Germany (0.04)
- United Kingdom > England
- East Sussex > Brighton (0.04)
- North America
- Asia > China
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Consumer Products & Services
- Food, Beverage, Tobacco & Cannabis (0.47)
- Restaurants (0.68)
- Transportation
- Ground > Road (0.46)
- Infrastructure & Services (0.46)
- Consumer Products & Services
- Technology: