Wu, Dongming
DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
Yan, Tianyi, Wu, Dongming, Han, Wencheng, Jiang, Junpeng, Zhou, Xia, Zhan, Kun, Xu, Cheng-zhong, Shen, Jianbing
Autonomous driving evaluation requires simulation environments that closely replicate actual road conditions, including real-world sensory data and responsive feedback loops. However, many existing simulations need to predict waypoints along fixed routes on public datasets or synthetic photorealistic data, \ie, open-loop simulation usually lacks the ability to assess dynamic decision-making. While the recent efforts of closed-loop simulation offer feedback-driven environments, they cannot process visual sensor inputs or produce outputs that differ from real-world data. To address these challenges, we propose DrivingSphere, a realistic and closed-loop simulation framework. Its core idea is to build 4D world representation and generate real-life and controllable driving scenarios. In specific, our framework includes a Dynamic Environment Composition module that constructs a detailed 4D driving world with a format of occupancy equipping with static backgrounds and dynamic objects, and a Visual Scene Synthesis module that transforms this data into high-fidelity, multi-view video outputs, ensuring spatial and temporal consistency. By providing a dynamic and realistic simulation environment, DrivingSphere enables comprehensive testing and validation of autonomous driving algorithms, ultimately advancing the development of more reliable autonomous cars. The benchmark will be publicly released.
Bootstrapping Referring Multi-Object Tracking
Zhang, Yani, Wu, Dongming, Han, Wencheng, Dong, Xingping
Referring multi-object tracking (RMOT) aims at detecting and tracking multiple objects following human instruction represented by a natural language expression. Existing RMOT benchmarks are usually formulated through manual annotations, integrated with static regulations. This approach results in a dearth of notable diversity and a constrained scope of implementation. In this work, our key idea is to bootstrap the task of referring multi-object tracking by introducing discriminative language words as much as possible. In specific, we first develop Refer-KITTI into a large-scale dataset, named Refer-KITTI-V2. It starts with 2,719 manual annotations, addressing the issue of class imbalance and introducing more keywords to make it closer to real-world scenarios compared to Refer-KITTI. They are further expanded to a total of 9,758 annotations by prompting large language models, which create 617 different words, surpassing previous RMOT benchmarks. In addition, the end-to-end framework in RMOT is also bootstrapped by a simple yet elegant temporal advancement strategy, which achieves better performance than previous approaches. The source code and dataset is available at https: //github.com/zyn213/TempRMOT.
A Novel Counterfactual Data Augmentation Method for Aspect-Based Sentiment Analysis
Wu, Dongming, Wen, Lulu, Chen, Chao, Shi, Zhaoshu
Generally, the emotional polarity of an aspect exists in the corresponding opinion expression, whose diversity has great impact on model's performance. To mitigate this problem, we propose a novel and simple counterfactual data augmentation method to generate opinion expressions with reversed sentiment polarity. In particular, the integrated gradients are calculated to locate and mask the opinion expression. Then, a prompt combined with the reverse expression polarity is added to the original text, and a Pre-trained language model (PLM), T5, is finally was employed to predict the masks. The experimental results shows the proposed counterfactual data augmentation method performs better than current augmentation methods on three ABSA datasets, i.e.