Goto

Collaborating Authors

 Li, Yunlong


TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness

arXiv.org Artificial Intelligence

3D semantic occupancy has rapidly become a research focus in the fields of robotics and autonomous driving environment perception due to its ability to provide more realistic geometric perception and its closer integration with downstream tasks. By performing occupancy prediction of the 3D space in the environment, the ability and robustness of scene understanding can be effectively improved. However, existing occupancy prediction tasks are primarily modeled using voxel or point cloud-based approaches: voxel-based network structures often suffer from the loss of spatial information due to the voxelization process, while point cloud-based methods, although better at retaining spatial location information, face limitations in representing volumetric structural details. To address this issue, we propose a dual-modal prediction method based on 3D Gaussian sets and sparse points, which balances both spatial location and volumetric structural information, achieving higher accuracy in semantic occupancy prediction. Specifically, our method adopts a Transformer-based architecture, taking 3D Gaussian sets, sparse points, and queries as inputs. Through the multi-layer structure of the Transformer, the enhanced queries and 3D Gaussian sets jointly contribute to the semantic occupancy prediction, and an adaptive fusion mechanism integrates the semantic outputs of both modalities to generate the final prediction results. Additionally, to further improve accuracy, we dynamically refine the point cloud at each layer, allowing for more precise location information during occupancy prediction. We conducted experiments on the Occ3DnuScenes dataset, and the experimental results demonstrate superior performance of the proposed method on IoU based metrics.


DMPA: Model Poisoning Attacks on Decentralized Federated Learning for Model Differences

arXiv.org Artificial Intelligence

Federated learning (FL) has garnered significant attention as a prominent privacy-preserving Machine Learning (ML) paradigm. Decentralized FL (DFL) eschews traditional FL's centralized server architecture, enhancing the system's robustness and scalability. However, these advantages of DFL also create new vulnerabilities for malicious participants to execute adversarial attacks, especially model poisoning attacks. In model poisoning attacks, malicious participants aim to diminish the performance of benign models by creating and disseminating the compromised model. Existing research on model poisoning attacks has predominantly concentrated on undermining global models within the Centralized FL (CFL) paradigm, while there needs to be more research in DFL. To fill the research gap, this paper proposes an innovative model poisoning attack called DMPA. This attack calculates the differential characteristics of multiple malicious client models and obtains the most effective poisoning strategy, thereby orchestrating a collusive attack by multiple participants. The effectiveness of this attack is validated across multiple datasets, with results indicating that the DMPA approach consistently surpasses existing state-of-the-art FL model poisoning attack strategies.


SafeAug: Safety-Critical Driving Data Augmentation from Naturalistic Datasets

arXiv.org Artificial Intelligence

Safety-critical driving data is crucial for developing safe and trustworthy self-driving algorithms. Due to the scarcity of safety-critical data in naturalistic datasets, current approaches primarily utilize simulated or artificially generated images. However, there remains a gap in authenticity between these generated images and naturalistic ones. We propose a novel framework to augment the safety-critical driving data from the naturalistic dataset to address this issue. In this framework, we first detect vehicles using YOLOv5, followed by depth estimation and 3D transformation to simulate vehicle proximity and critical driving scenarios better. This allows for targeted modification of vehicle dynamics data to reflect potentially hazardous situations. Compared to the simulated or artificially generated data, our augmentation methods can generate safety-critical driving data with minimal compromise on image authenticity. Experiments using KITTI datasets demonstrate that a downstream self-driving algorithm trained on this augmented dataset performs superiorly compared to the baselines, which include SMOGN and importance sampling.


Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

arXiv.org Artificial Intelligence

Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lacking the ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performance in determining coreference for the events where their argument information relies on long-distance dependencies. In light of these limitations, we propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents. Subsequently, cross-document heterogeneous graphs are constructed and GAT is utilized to learn the representations of events. Finally, a pair scorer calculates the similarity between each pair of events and co-referred events can be recognized using standard clustering algorithm. Additionally, as the existing cross-document event coreference datasets are limited to English, we have developed a large-scale Chinese cross-document event coreference dataset to fill this gap, which comprises 53,066 event mentions and 4,476 clusters. After applying our model on the English and Chinese datasets respectively, it outperforms all baselines by large margins.