AITopics | Qu, Ao

Collaborating Authors

Qu, Ao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning

Tang, Yihong, Qu, Ao, Wang, Zhaokai, Zhuang, Dingyi, Wu, Zhaofeng, Ma, Wei, Wang, Shenhao, Zheng, Yunhan, Zhao, Zhan, Zhao, Jinhua

arXiv.org Artificial IntelligenceNov-21-2024

Vision language models (VLMs) have demonstrated impressive performance across a wide range of downstream tasks. However, their proficiency in spatial reasoning remains limited, despite its crucial role in tasks involving navigation and interaction with physical environments. Specifically, most of these tasks rely on the core spatial reasoning capabilities in two-dimensional (2D) environments, and our evaluation reveals that state-of-the-art VLMs frequently generate implausible and incorrect responses to composite spatial reasoning problems, including simple pathfinding tasks that humans can solve effortlessly at a glance. To address this, we explore an effective approach to enhance 2D spatial reasoning within VLMs by training the model solely on basic spatial capabilities. We begin by disentangling the key components of 2D spatial reasoning: direction comprehension, distance estimation, and localization. Our central hypothesis is that mastering these basic spatial capabilities can significantly enhance a model's performance on composite spatial tasks requiring advanced spatial understanding and combinatorial problem-solving, with generalized improvements in visual-spatial tasks. To investigate this hypothesis, we introduce Sparkle, a framework that fine-tunes VLMs on these three basic spatial capabilities by synthetic data generation and targeted supervision to form an instruction dataset for each capability. Our experiments demonstrate that VLMs fine-tuned with Sparkle achieve significant performance gains, not only in the basic tasks themselves but also in generalizing to composite and out-of-distribution spatial reasoning tasks. These findings underscore the effectiveness of mastering basic spatial capabilities in enhancing composite spatial problem-solving, offering insights into systematic strategies for improving VLMs' spatial reasoning capabilities.

artificial intelligence, machine learning, spatial reasoning, (19 more...)

arXiv.org Artificial Intelligence

2410.16162

Country:

Asia > China (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning

Jayawardana, Vindula, Freydt, Baptiste, Qu, Ao, Hickert, Cameron, Yan, Zhongxia, Wu, Cathy

arXiv.org Artificial IntelligenceOct-19-2024

Despite the popularity of multi-agent reinforcement learning (RL) in simulated and two-player applications, its success in messy real-world applications has been limited. A key challenge lies in its generalizability across problem variations, a common necessity for many real-world problems. Contextual reinforcement learning (CRL) formalizes learning policies that generalize across problem variations. However, the lack of standardized benchmarks for multi-agent CRL has hindered progress in the field. Such benchmarks are desired to be based on real-world applications to naturally capture the many open challenges of real-world problems that affect generalization. To bridge this gap, we propose IntersectionZoo, a comprehensive benchmark suite for multi-agent CRL through the real-world application of cooperative eco-driving in urban road networks. The task of cooperative eco-driving is to control a fleet of vehicles to reduce fleet-level vehicular emissions. By grounding IntersectionZoo in a real-world application, we naturally capture real-world problem characteristics, such as partial observability and multiple competing objectives. IntersectionZoo is built on data-informed simulations of 16,334 signalized intersections derived from 10 major US cities, modeled in an open-source industry-grade microscopic traffic simulator. By modeling factors affecting vehicular exhaust emissions (e.g., temperature, road conditions, travel demand), IntersectionZoo provides one million data-driven traffic scenarios. Using these traffic scenarios, we benchmark popular multi-agent RL and human-like driving algorithms and demonstrate that the popular multi-agent RL algorithms struggle to generalize in CRL settings. Having demonstrated impressive performance in simulated multi-agent applications such as Starcraft (Samvelyan et al., 2019), RL holds potential for various multi-agent real-world applications including autonomous driving (Kiran et al., 2021), robotic warehousing (Bahrpeyma & Reichelt, 2022), and traffic control (Wu et al., 2021). However, compared to simulated applications, the success of RL in real-world applications has been rather limited (Dulac-Arnold et al., 2021). A key challenge lies in making RL algorithms generalize across problem variations, such as when weather conditions change in autonomous driving.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2410.15221

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Synergizing Spatial Optimization with Large Language Models for Open-Domain Urban Itinerary Planning

Tang, Yihong, Wang, Zhaokai, Qu, Ao, Yan, Yihao, Hou, Kebing, Zhuang, Dingyi, Guo, Xiaotong, Zhao, Jinhua, Zhao, Zhan, Ma, Wei

arXiv.org Artificial IntelligenceFeb-11-2024

In this paper, we for the first time propose the task of Open-domain Urban Itinerary Planning (OUIP) for citywalk, which directly generates itineraries based on users' requests described in natural language. OUIP is different from conventional itinerary planning, which limits users from expressing more detailed needs and hinders true personalization. Recently, large language models (LLMs) have shown potential in handling diverse tasks. However, due to non-real-time information, incomplete knowledge, and insufficient spatial awareness, they are unable to independently deliver a satisfactory user experience in OUIP. Given this, we present ItiNera, an OUIP system that synergizes spatial optimization with Large Language Models (LLMs) to provide services that customize urban itineraries based on users' needs. Specifically, we develop an LLM-based pipeline for extracting and updating POI features to create a user-owned personalized POI database. For each user request, we leverage LLM in cooperation with an embedding-based module for retrieving candidate POIs from the user's POI database. Then, a spatial optimization module is used to order these POIs, followed by LLM crafting a personalized, spatially coherent itinerary. To the best of our knowledge, this study marks the first integration of LLMs to innovate itinerary planning solutions. Extensive experiments on offline datasets and online subjective evaluation have demonstrated the capacities of our system to deliver more responsive and spatially coherent itineraries than current LLM-based solutions. Our system has been deployed in production at the TuTu online travel service and has attracted thousands of users for their urban travel planning.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.07204

Country:

Asia > China (0.69)
North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Industry: Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

SEIP: Simulation-based Design and Evaluation of Infrastructure-based Collective Perception

Qu, Ao, Huang, Xuhuan, Suo, Dajiang

arXiv.org Artificial IntelligenceSep-18-2023

Recent advances in sensing and communication have paved the way for collective perception in traffic management, with real-time data sharing among multiple entities. While vehicle-based collective perception has gained traction, infrastructure-based approaches, which entail the real-time sharing and merging of sensing data from different roadside sensors for object detection, grapple with challenges in placement strategy and high ex-post evaluation costs. Despite anecdotal evidence of their effectiveness, many current deployments rely on engineering heuristics and face budget constraints that limit post-deployment adjustments. This paper introduces polynomial-time heuristic algorithms and a simulation tool for the ex-ante evaluation of infrastructure sensor deployment. By modeling it as an integer programming problem, we guide decisions on sensor locations, heights, and configurations to harmonize cost, installation constraints, and coverage. Our simulation engine, integrated with open-source urban driving simulators, enables us to evaluate the effectiveness of each sensor deployment solution through the lens of object detection. A case study with infrastructure LiDARs revealed that the incremental benefit derived from integrating additional low-resolution LiDARs could surpass that of incorporating more high-resolution ones. The results reinforce the necessity of investigating the cost-performance tradeoff prior to deployment. The code for our simulation experiments can be found at https://github.com/dajiangsuo/SEIP.

artificial intelligence, machine learning, real time system, (17 more...)

arXiv.org Artificial Intelligence

2305.17892

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Ground > Road (0.94)
Transportation > Infrastructure & Services (0.69)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Vision (0.90)
(3 more...)

Add feedback

Domain Adversarial Spatial-Temporal Network: A Transferable Framework for Short-term Traffic Forecasting across Cities

Tang, Yihong, Qu, Ao, Chow, Andy H. F., Lam, William H. K., Wong, S. C., Ma, Wei

arXiv.org Artificial IntelligenceFeb-7-2022

Accurate real-time traffic forecast is critical for intelligent transportation systems (ITS) and it serves as the cornerstone of various smart mobility applications. Though this research area is dominated by deep learning, recent studies indicate that the accuracy improvement by developing new model structures is becoming marginal. Instead, we envision that the improvement can be achieved by transferring the "forecasting-related knowledge" across cities with different data distributions and network topologies. To this end, this paper aims to propose a novel transferable traffic forecasting framework: Domain Adversarial Spatial-Temporal Network (DASTNet). DASTNet is pre-trained on multiple source networks and fine-tuned with the target network's traffic data. Specifically, we leverage the graph representation learning and adversarial domain adaptation techniques to learn the domain-invariant node embeddings, which are further incorporated to model the temporal traffic data. To the best of our knowledge, we are the first to employ adversarial multi-domain adaptation for network-wide traffic forecasting problems. DASTNet consistently outperforms all state-of-the-art baseline methods on three benchmark datasets. The trained DASTNet is applied to Hong Kong's new traffic detectors, and accurate traffic predictions can be delivered immediately (within one day) when the detector is available. Overall, this study suggests an alternative to enhance the traffic forecasting methods and provides practical implications for cities lacking historical traffic data.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Artificial Intelligence

2202.0363

Country: Asia > China > Hong Kong (0.27)

Genre: Research Report > Experimental Study (0.34)

Industry: Transportation > Infrastructure & Services (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Attacking Deep Reinforcement Learning-Based Traffic Signal Control Systems with Colluding Vehicles

Qu, Ao, Tang, Yihong, Ma, Wei

arXiv.org Artificial IntelligenceNov-4-2021

The rapid advancements of Internet of Things (IoT) and artificial intelligence (AI) have catalyzed the development of adaptive traffic signal control systems (ATCS) for smart cities. In particular, deep reinforcement learning (DRL) methods produce the state-of-the-art performance and have great potentials for practical applications. In the existing DRL-based ATCS, the controlled signals collect traffic state information from nearby vehicles, and then optimal actions (e.g., switching phases) can be determined based on the collected information. The DRL models fully "trust" that vehicles are sending the true information to the signals, making the ATCS vulnerable to adversarial attacks with falsified information. In view of this, this paper first time formulates a novel task in which a group of vehicles can cooperatively send falsified information to "cheat" DRL-based ATCS in order to save their total travel time. To solve the proposed task, we develop CollusionVeh, a generic and effective vehicle-colluding framework composed of a road situation encoder, a vehicle interpreter, and a communication mechanism. We employ our method to attack established DRL-based ATCS and demonstrate that the total travel time for the colluding vehicles can be significantly reduced with a reasonable number of learning episodes, and the colluding effect will decrease if the number of colluding vehicles increases. Additionally, insights and suggestions for the real-world deployment of DRL-based ATCS are provided. The research outcomes could help improve the reliability and robustness of the ATCS and better protect the smart mobility systems.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2111.02845

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.82)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Graph Convolutional Networks for traffic anomaly

Hu, Yue, Qu, Ao, Work, Dan

arXiv.org Artificial IntelligenceDec-25-2020

Event detection has been an important task in transportation, whose task is to detect points in time when large events disrupts a large portion of the urban traffic network. Travel information {Origin-Destination} (OD) matrix data by map service vendors has large potential to give us insights to discover historic patterns and distinguish anomalies. However, to fully capture the spatial and temporal traffic patterns remains a challenge, yet serves a crucial role for effective anomaly detection. Meanwhile, existing anomaly detection methods have not well-addressed the extreme data sparsity and high-dimension challenges, which are common in OD matrix datasets. To tackle these challenges, we formulate the problem in a novel way, as detecting anomalies in a set of directed weighted graphs representing the traffic conditions at each time interval. We further propose \textit{Context augmented Graph Autoencoder} (\textbf{Con-GAE }), that leverages graph embedding and context embedding techniques to capture the spatial traffic network patterns while working around the data sparsity and high-dimensionality issue. Con-GAE adopts an autoencoder framework and detect anomalies via semi-supervised learning. Extensive experiments show that our method can achieve up can achieve a 0.1-0.4 improvements of the area under the curve (AUC) score over state-of-art anomaly detection baselines, when applied on several real-world large scale OD matrix datasets.

anomaly detection, deep learning, neural network, (24 more...)

arXiv.org Artificial Intelligence

2012.13637

Country: North America > United States > Tennessee > Davidson County > Nashville (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (0.46)
Transportation > Infrastructure & Services (0.46)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback