Cheema, Muhammad Aamir
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
Dihan, Mahir Labib, Hassan, Md Tanvir, Parvez, Md Tanvir, Hasan, Md Hasebul, Alam, Md Almash, Cheema, Muhammad Aamir, Ali, Mohammed Eunus, Parvez, Md Rizwan
Recent advancements in foundation models have enhanced AI systems' capabilities in autonomous tool usage and reasoning. However, their ability in location or map-based reasoning - which improves daily life by optimizing navigation, facilitating resource discovery, and streamlining logistics - has not been systematically studied. To bridge this gap, we introduce MapEval, a benchmark designed to assess diverse and complex map-based user queries with geo-spatial reasoning. MapEval features three task types (textual, API-based, and visual) that require collecting world information via map tools, processing heterogeneous geo-spatial contexts (e.g., named entities, travel distances, user reviews or ratings, images), and compositional reasoning, which all state-of-the-art foundation models find challenging. Comprising 700 unique multiple-choice questions about locations across 180 cities and 54 countries, MapEval evaluates foundation models' ability to handle spatial relationships, map infographics, travel planning, and navigation challenges. Using MapEval, we conducted a comprehensive evaluation of 28 prominent foundation models. While no single model excelled across all tasks, Claude-3.5-Sonnet, GPT-4o, and Gemini-1.5-Pro achieved competitive performance overall. However, substantial performance gaps emerged, particularly in MapEval, where agents with Claude-3.5-Sonnet outperformed GPT-4o and Gemini-1.5-Pro by 16% and 21%, respectively, and the gaps became even more amplified when compared to open-source LLMs. Our detailed analyses provide insights into the strengths and weaknesses of current models, though all models still fall short of human performance by more than 20% on average, struggling with complex map images and rigorous geo-spatial reasoning. This gap highlights MapEval's critical role in advancing general-purpose foundation models with stronger geo-spatial understanding.
Tracking Progress in Multi-Agent Path Finding
Shen, Bojie, Chen, Zhe, Cheema, Muhammad Aamir, Harabor, Daniel D., Stuckey, Peter J.
Multi-Agent Path Finding (MAPF) is an important core problem for many new and emerging industrial applications. Many works appear on this topic each year, and a large number of substantial advancements and performance improvements have been reported. Yet measuring overall progress in MAPF is difficult: there are many potential competitors, and the computational burden for comprehensive experimentation is prohibitively large. Moreover, detailed data from past experimentation is usually unavailable. In this work, we introduce a set of methodological and visualisation tools which can help the community establish clear indicators for state-of-the-art MAPF performance and which can facilitate large-scale comparisons between MAPF solvers. Our objectives are to lower the barrier of entry for new researchers and to further promote the study of MAPF, since progress in the area and the main challenges are made much clearer.
Comparing Alternative Route Planning Techniques: A Web-based Demonstration and User Study
Li, Lingxiao, Cheema, Muhammad Aamir, Lu, Hua, Ali, Mohammed Eunus, Toosi, Adel N.
Due to the popularity of smartphones, cheap wireless networks and availability of road network data, navigation applications have become a part of our everyday life. Many modern navigation systems and map-based services do not only provide the fastest route from a source location s to a target location t but also provide a few alternative routes to the users as more options to choose from. Consequently, computing alternative paths from a source s to a target t has received significant research attention in the past few years. However, it is not clear which of the existing approaches generates alternative paths of better quality because the quality of these alternatives is mostly subjective. Motivated by this, in this paper, we present the first user study that compares the quality of the alternative routes generated by four of the most popular existing approaches including the routes provided by Google Maps. We also present the details of a web-based demo system that can be accessed using any internet enabled device and allows users to see the alternative routes generated by the four approaches for any pair of source and target selected by the users. Our user study shows that although the mean rating received by Google Maps is slightly lower than the mean ratings received by the other three approaches, the results are not statistically significant. We also discuss the limitations of this user study and recommend the readers to interpret these results with caution because certain factors beyond our control may have affected the participants' ratings.
An Efficient Approximation Algorithm for Multi-criteria Indoor Route Planning Queries
Salgado, Chaluka, Cheema, Muhammad Aamir, Taniar, David
A route planning query has many real-world applications and has been studied extensively in outdoor spaces such as road networks or Euclidean space. Despite its many applications in indoor venues (e.g., shopping centres, libraries, airports), almost all existing studies are specifically designed for outdoor spaces and do not take into account unique properties of the indoor spaces such as hallways, stairs, escalators, rooms etc. We identify this research gap and formally define the problem of category aware multi-criteria route planning query, denoted by CAM, which returns the optimal route from an indoor source point to an indoor target point that passes through at least one indoor point from each given category while minimizing the total cost of the route in terms of travel distance and other relevant attributes. We show that CAM query is NP-hard. Based on a novel dominance-based pruning, we propose an efficient algorithm which generates high-quality results. We provide an extensive experimental study conducted on the largest shopping centre in Australia and compare our algorithm with alternative approaches. The experiments demonstrate that our algorithm is highly efficient and produces quality results.