Goto

Collaborating Authors

 Wang, Shiyi


M2UD: A Multi-model, Multi-scenario, Uneven-terrain Dataset for Ground Robot with Localization and Mapping Evaluation

arXiv.org Artificial Intelligence

Ground robots play a crucial role in inspection, exploration, rescue, and other applications. In recent years, advancements in LiDAR technology have made sensors more accurate, lightweight, and cost-effective. Therefore, researchers increasingly integrate sensors, for SLAM studies, providing robust technical support for ground robots and expanding their application domains. Public datasets are essential for advancing SLAM technology. However, existing datasets for ground robots are typically restricted to flat-terrain motion with 3 DOF and cover only a limited range of scenarios. Although handheld devices and UAV exhibit richer and more aggressive movements, their datasets are predominantly confined to small-scale environments due to endurance limitations. To fill these gap, we introduce M2UD, a multi-modal, multi-scenario, uneven-terrain SLAM dataset for ground robots. This dataset contains a diverse range of highly challenging environments, including cities, open fields, long corridors, and mixed scenarios. Additionally, it presents extreme weather conditions. The aggressive motion and degradation characteristics of this dataset not only pose challenges for testing and evaluating existing SLAM methods but also advance the development of more advanced SLAM algorithms. To benchmark SLAM algorithms, M2UD provides smoothed ground truth localization data obtained via RTK and introduces a novel localization evaluation metric that considers both accuracy and efficiency. Additionally, we utilize a high-precision laser scanner to acquire ground truth maps of two representative scenes, facilitating the development and evaluation of mapping algorithms. We select 12 localization sequences and 2 mapping sequences to evaluate several classical SLAM algorithms, verifying usability of the dataset. To enhance usability, the dataset is accompanied by a suite of development kits.


A Survey on Large Language Model-empowered Autonomous Driving

arXiv.org Artificial Intelligence

Artificial intelligence (AI) plays a crucial role in autonomous driving (AD) research, propelling its development towards intelligence and efficiency. Currently, the development of AD technology follows two main technical paths: modularization and end-to-end. Modularization decompose the driving task into modules such as perception, prediction, planning, and control, and train them separately. Due to the inconsistency of training objectives between modules, the integrated effect suffers from bias. End-to-end attempts to address this issue by utilizing a single model that directly maps from sensor data to control signals. This path has limited learning capabilities in a comprehensive set of features and struggles to handle unpredictable long-tail events and complex urban traffic scenarios. In the face of challenges encountered in both paths, many researchers believe that large language models (LLMs) with powerful reasoning capabilities and extensive knowledge understanding may be the solution, expecting LLMs to provide AD systems with deeper levels of understanding and decision-making capabilities. In light of the challenges faced by both paths, many researchers believe that LLMs, with their powerful reasoning abilities and extensive knowledge, could offer a solution. To understand if LLMs could enhance AD, this paper conducts a thorough analysis of the potential applications of LLMs in AD systems, including exploring their optimization strategies in both modular and end-to-end approaches, with a particular focus on how LLMs can tackle the problems and challenges present in current solutions. Furthermore, we discuss an important question: Can LLM-based artificial general intelligence (AGI) be a key to achieve high-level AD? We further analyze the potential limitations and challenges that LLMs may encounter in promoting the development of AD technology.


Probing Perfection: The Relentless Art of Meddling for Pulmonary Airway Segmentation from HRCT via a Human-AI Collaboration Based Active Learning Method

arXiv.org Artificial Intelligence

In the realm of pulmonary tracheal segmentation, the scarcity of annotated data stands as a prevalent pain point in most medical segmentation endeavors. Concurrently, most Deep Learning (DL) methodologies employed in this domain invariably grapple with other dual challenges: the inherent opacity of'black box' models and the ongoing pursuit of performance enhancement. In response to these intertwined challenges, the core concept of our Human-Computer Interaction (HCI) based learning models (RS_UNet, LC_UNet, UUNet and WD_UNet) hinge on the versatile combination of diverse query strategies and an array of deep learning models. We train four HCI models based on the initial training dataset and sequentially repeat the following steps 1-4: (1) Query Strategy: Our proposed HCI models selects those samples which contribute the most additional representative information when labeled in each iteration of the query strategy (showing the names and sequence numbers of the samples to be annotated). Additionally, in this phase, the model selects the unlabeled samples with the greatest predictive disparity by calculating the Wasserstein Distance, Least Confidence, Entropy Sampling, and Random Sampling.


Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge

arXiv.org Artificial Intelligence

Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intricate honeycombing patterns present in the lung tissues of fibrotic lung disease patients exacerbate the challenges, often leading to various prediction errors. To address this issue, the 'Airway-Informed Quantitative CT Imaging Biomarker for Fibrotic Lung Disease 2023' (AIIB23) competition was organized in conjunction with the official 2023 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). The airway structures were meticulously annotated by three experienced radiologists. Competitors were encouraged to develop automatic airway segmentation models with high robustness and generalization abilities, followed by exploring the most correlated QIB of mortality prediction. A training set of 120 high-resolution computerised tomography (HRCT) scans were publicly released with expert annotations and mortality status. The online validation set incorporated 52 HRCT scans from patients with fibrotic lung disease and the offline test set included 140 cases from fibrosis and COVID-19 patients. The results have shown that the capacity of extracting airway trees from patients with fibrotic lung disease could be enhanced by introducing voxel-wise weighted general union loss and continuity loss. In addition to the competitive image biomarkers for prognosis, a strong airway-derived biomarker (Hazard ratio>1.5, p<0.0001) was revealed for survival prognostication compared with existing clinical measurements, clinician assessment and AI-based biomarkers.


Post-COVID Highlights: Challenges and Solutions of AI Techniques for Swift Identification of COVID-19

arXiv.org Artificial Intelligence

Since the onset of the COVID-19 pandemic in 2019, there has been a concerted effort to develop cost-effective, non-invasive, and rapid AI-based tools. These tools were intended to alleviate the burden on healthcare systems, control the rapid spread of the virus, and enhance intervention outcomes, all in response to this unprecedented global crisis. As we transition into a post-COVID era, we retrospectively evaluate these proposed studies and offer a review of the techniques employed in AI diagnostic models, with a focus on the solutions proposed for different challenges. This review endeavors to provide insights into the diverse solutions designed to address the multifaceted challenges that arose during the pandemic. By doing so, we aim to prepare the AI community for the development of AI tools tailored to address public health emergencies effectively.


HAKE: Human Activity Knowledge Engine

arXiv.org Artificial Intelligence

Human activity understanding is crucial for building automatic intelligent system. With the help of deep learning, activity understanding has made huge progress recently. But some challenges such as imbalanced data distribution, action ambiguity, complex visual patterns still remain. To address these and promote the activity understanding, we build a large-scale Human Activity Knowledge Engine (HAKE) based on the human body part states. Upon existing activity datasets, we annotate the part states of all the active persons in all images, thus establish the relationship between instance activity and body part states. Furthermore, we propose a HAKE based part state recognition model with a knowledge extractor named Activity2Vec and a corresponding part state based reasoning network. With HAKE, our method can alleviate the learning difficulty brought by the long-tail data distribution, and bring in interpretability. Now our HAKE has more than 7 M+ part state annotations and is still under construction. We first validate our approach on a part of HAKE in this preliminary paper, where we show 7.2 mAP performance improvement on Human-Object Interaction recognition, and 12.38 mAP improvement on the one-shot subsets.