AITopics | Cui, Can

Collaborating Authors

Cui, Can

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models

Park, Sung-Yeon, Cui, Can, Ma, Yunsheng, Moradipari, Ahmadreza, Gupta, Rohit, Han, Kyungtae, Wang, Ziran

arXiv.org Artificial IntelligenceMar-16-2025

Recent advances in multi-modal large language models (MLLMs) have demonstrated strong performance across various domains; however, their ability to comprehend driving scenes remains less proven. The complexity of driving scenarios, which includes multi-view information, poses significant challenges for existing MLLMs. In this paper, we introduce NuPlanQA-Eval, a multi-view, multi-modal evaluation benchmark for driving scene understanding. To further support generalization to multi-view driving scenarios, we also propose NuPlanQA-1M, a large-scale dataset comprising 1M real-world visual question-answering (VQA) pairs. For context-aware analysis of traffic scenes, we categorize our dataset into nine subtasks across three core skills: Road Environment Perception, Spatial Relations Recognition, and Ego-Centric Reasoning. Furthermore, we present BEV-LLM, integrating Bird's-Eye-View (BEV) features from multi-view images into MLLMs. Our evaluation results reveal key challenges that existing MLLMs face in driving scene-specific perception and spatial reasoning from ego-centric perspectives. In contrast, BEV-LLM demonstrates remarkable adaptability to this domain, outperforming other models in six of the nine subtasks. These findings highlight how BEV integration enhances multi-view MLLMs while also identifying key areas that require further refinement for effective adaptation to driving scenes. To facilitate further research, we publicly release NuPlanQA at https://github.com/sungyeonparkk/NuPlanQA.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.12772

Country:

North America > United States (0.46)
Europe (0.46)

Genre: Research Report (0.50)

Industry: Transportation > Ground > Road (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning

Tong, Xinyang, Ding, Pengxiang, Wang, Donglin, Zhang, Wenjie, Cui, Can, Sun, Mingyang, Fan, Yiguo, Zhao, Han, Zhang, Hongyin, Dang, Yonghao, Huang, Siteng, Lyu, Shangke

arXiv.org Artificial IntelligenceDec-23-2024

This paper addresses the inherent inference latency challenges associated with deploying multimodal large language models (MLLM) in quadruped vision-language-action (QUAR-VLA) tasks. Our investigation reveals that conventional parameter reduction techniques ultimately impair the performance of the language foundation model during the action instruction tuning phase, making them unsuitable for this purpose. We introduce a novel latency-free quadruped MLLM model, dubbed QUART-Online, designed to enhance inference efficiency without degrading the performance of the language foundation model. By incorporating Action Chunk Discretization (ACD), we compress the original action representation space, mapping continuous action values onto a smaller set of discrete representative vectors while preserving critical information. Subsequently, we fine-tune the MLLM to integrate vision, language, and compressed actions into a unified semantic space. Experimental results demonstrate that QUART-Online operates in tandem with the existing MLLM system, achieving real-time inference in sync with the underlying controller frequency, significantly boosting the success rate across various tasks by 65%. Our project page is https://quart-online.github.io.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.15576

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.43)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)

Add feedback

Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation

Jia, Bofang, Ding, Pengxiang, Cui, Can, Sun, Mingyang, Qian, Pengfang, Huang, Siteng, Fan, Zhaoxin, Wang, Donglin

arXiv.org Machine LearningDec-19-2024

Visual-motor policy learning has advanced with architectures like diffusion-based policies, known for modeling complex robotic trajectories. However, their prolonged inference times hinder high-frequency control tasks requiring real-time feedback. While consistency distillation (CD) accelerates inference, it introduces errors that compromise action quality. To address these limitations, we propose the Score and Distribution Matching Policy (SDM Policy), which transforms diffusion-based policies into single-step generators through a two-stage optimization process: score matching ensures alignment with true action distributions, and distribution matching minimizes KL divergence for consistency. A dual-teacher mechanism integrates a frozen teacher for stability and an unfrozen teacher for adversarial training, enhancing robustness and alignment with target distributions. Evaluated on a 57-task simulation benchmark, SDM Policy achieves a 6x inference speedup while having state-of-the-art action quality, providing an efficient and reliable framework for high-frequency robotic tasks.

artificial intelligence, machine learning, sdm policy, (15 more...)

arXiv.org Machine Learning

2412.09265

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On-Board Vision-Language Models for Personalized Autonomous Vehicle Motion Control: System Design and Real-World Validation

Cui, Can, Yang, Zichong, Zhou, Yupeng, Peng, Juntong, Park, Sung-Yeon, Zhang, Cong, Ma, Yunsheng, Cao, Xu, Ye, Wenqian, Feng, Yiheng, Panchal, Jitesh, Li, Lingxi, Chen, Yaobin, Wang, Ziran

arXiv.org Artificial IntelligenceNov-17-2024

Personalized driving refers to an autonomous vehicle's ability to adapt its driving behavior or control strategies to match individual users' preferences and driving styles while maintaining safety and comfort standards. However, existing works either fail to capture every individual preference precisely or become computationally inefficient as the user base expands. Vision-Language Models (VLMs) offer promising solutions to this front through their natural language understanding and scene reasoning capabilities. In this work, we propose a lightweight yet effective on-board VLM framework that provides low-latency personalized driving performance while maintaining strong reasoning capabilities. Our solution incorporates a Retrieval-Augmented Generation (RAG)-based memory module that enables continuous learning of individual driving preferences through human feedback. Through comprehensive real-world vehicle deployment and experiments, our system has demonstrated the ability to provide safe, comfortable, and personalized driving experiences across various scenarios and significantly reduce takeover rates by up to 76.9%. To the best of our knowledge, this work represents the first end-to-end VLM-based motion control system in real-world autonomous vehicles.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.11913

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

How Good Are We? Evaluating Cell AI Foundation Models in Kidney Pathology with Human-in-the-Loop Enrichment

Guo, Junlin, Lu, Siqi, Cui, Can, Deng, Ruining, Yao, Tianyuan, Tao, Zhewen, Lin, Yizhe, Lionts, Marilyn, Liu, Quan, Xiong, Juming, Wang, Yu, Zhao, Shilin, Chang, Catie, Wilkes, Mitchell, Yin, Mengmeng, Yang, Haichun, Huo, Yuankai

arXiv.org Artificial IntelligenceOct-31-2024

Training AI foundation models has emerged as a promising large-scale learning approach for addressing real-world healthcare challenges, including digital pathology. While many of these models have been developed for tasks like disease diagnosis and tissue quantification using extensive and diverse training datasets, their readiness for deployment on some arguably simplest tasks, such as nuclei segmentation within a single organ (e.g., the kidney), remains uncertain. This paper seeks to answer this key question, "How good are we?", by thoroughly evaluating the performance of recent cell foundation models on a curated multi-center, multi-disease, and multi-species external testing dataset. Additionally, we tackle a more challenging question, "How can we improve?", by developing and assessing human-in-the-loop data enrichment strategies aimed at enhancing model performance while minimizing the reliance on pixel-level human annotation. To address the first question, we curated a multicenter, multidisease, and multispecies dataset consisting of 2,542 kidney whole slide images (WSIs). Three state-of-the-art (SOTA) cell foundation models-Cellpose, StarDist, and CellViT-were selected for evaluation. To tackle the second question, we explored data enrichment algorithms by distilling predictions from the different foundation models with a human-in-the-loop framework, aiming to further enhance foundation model performance with minimal human efforts. Our experimental results showed that all three foundation models improved over their baselines with model fine-tuning with enriched data. Interestingly, the baseline model with the highest F1 score does not yield the best segmentation outcomes after fine-tuning. This study establishes a benchmark for the development and deployment of cell vision foundation models tailored for real-world data applications.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.00078

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.67)
Health & Medicine > Diagnostic Medicine > Imaging (0.47)
Health & Medicine > Therapeutic Area > Oncology (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone Meeting Transcription

Cui, Can, Sheikh, Imran Ahamad, Sadeghi, Mostafa, Vincent, Emmanuel

arXiv.org Artificial IntelligenceOct-29-2024

Distant-microphone meeting transcription is a challenging task. State-of-the-art end-to-end speaker-attributed automatic speech recognition (SA-ASR) architectures lack a multichannel noise and reverberation reduction front-end, which limits their performance. In this paper, we introduce a joint beamforming and SA-ASR approach for real meeting transcription. We first describe a data alignment and augmentation method to pretrain a neural beamformer on real meeting data. We then compare fixed, hybrid, and fully neural beamformers as front-ends to the SA-ASR model. Finally, we jointly optimize the fully neural beamformer and the SA-ASR model. Experiments on the real AMI corpus show that,while state-of-the-art multi-frame cross-channel attention based channel fusion fails to improve ASR performance, fine-tuning SA-ASR on the fixed beamformer's output and jointly fine-tuning SA-ASR with the neural beamformer reduce the word error rate by 8% and 9% relative, respectively.

artificial intelligence, machine learning, speech recognition, (14 more...)

arXiv.org Artificial Intelligence

2410.21849

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment

Cui, Can, Ma, Yunsheng, Yang, Zichong, Zhou, Yupeng, Liu, Peiran, Lu, Juanwu, Li, Lingxi, Chen, Yaobin, Panchal, Jitesh H., Abdelraouf, Amr, Gupta, Rohit, Han, Kyungtae, Wang, Ziran

arXiv.org Artificial IntelligenceOct-20-2024

With the broader usage and highly successful development of Large Language Models (LLMs), there has been a growth of interest and demand for applying LLMs to autonomous driving technology. Driven by their natural language understanding and reasoning ability, LLMs have the potential to enhance various aspects of autonomous driving systems, from perception and scene understanding to language interaction and decision-making. In this paper, we first introduce novel concepts and approaches to designing LLMs for autonomous driving (LLM4AD). Then, we propose a comprehensive benchmark for evaluating the instruction-following abilities of LLMs within the autonomous driving domain. Furthermore, we conduct a series of experiments on both simulation and real-world vehicle platforms, thoroughly evaluating the performance and potential of our LLM4AD systems. Our research highlights the significant potential of LLMs to enhance various aspects of autonomous vehicle technology, from perception and scene understanding to language interaction and decision-making.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.15281

Country:

North America > United States > California (0.46)
North America > United States > Indiana (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Data-driven Nucleus Subclassification on Colon H&E using Style-transferred Digital Pathology

Remedios, Lucas W., Bao, Shunxing, Remedios, Samuel W., Lee, Ho Hin, Cai, Leon Y., Li, Thomas, Deng, Ruining, Newlin, Nancy R., Saunders, Adam M., Cui, Can, Li, Jia, Liu, Qi, Lau, Ken S., Roland, Joseph T., Washington, Mary K, Coburn, Lori A., Wilson, Keith T., Huo, Yuankai, Landman, Bennett A.

arXiv.org Artificial IntelligenceMay-15-2024

Understanding the way cells communicate, co-locate, and interrelate is essential to furthering our understanding of how the body functions. H&E is widely available, however, cell subtyping often requires expert knowledge and the use of specialized stains. To reduce the annotation burden, AI has been proposed for the classification of cells on H&E. For example, the recent Colon Nucleus Identification and Classification (CoNIC) Challenge focused on labeling 6 cell types on H&E of the colon. However, the CoNIC Challenge was unable to classify epithelial subtypes (progenitor, enteroendocrine, goblet), lymphocyte subtypes (B, helper T, cytotoxic T), and connective subtypes (fibroblasts). We use inter-modality learning to label previously un-labelable cell types on H&E. We take advantage of multiplexed immunofluorescence (MxIF) histology to label 14 cell subclasses. We performed style transfer on the same MxIF tissues to synthesize realistic virtual H&E which we paired with the MxIF-derived cell subclassification labels. We evaluated the efficacy of using a supervised learning scheme where the input was realistic-quality virtual H&E and the labels were MxIF-derived cell subclasses. We assessed our model on private virtual H&E and public real H&E. On virtual H&E, we were able to classify helper T cells and epithelial progenitors with positive predictive values of $0.34 \pm 0.15$ (prevalence $0.03 \pm 0.01$) and $0.47 \pm 0.1$ (prevalence $0.07 \pm 0.02$) respectively, when using ground truth centroid information. On real H&E we could classify helper T cells and epithelial progenitors with upper bound positive predictive values of $0.43 \pm 0.03$ (parent class prevalence 0.21) and $0.94 \pm 0.02$ (parent class prevalence 0.49) when using ground truth centroid information. This is the first work to provide cell type classification for helper T and epithelial progenitor nuclei on H&E.

artificial intelligence, machine learning, nucleus, (19 more...)

arXiv.org Artificial Intelligence

2407.06116

Country: North America > United States > Tennessee (0.14)

Genre: Research Report > Experimental Study (0.87)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot

Song, Wenxuan, Zhao, Han, Ding, Pengxiang, Cui, Can, Lyu, Shangke, Fan, Yaning, Wang, Donglin

arXiv.org Artificial IntelligenceApr-9-2024

Multi-task robot learning holds significant importance in tackling diverse and complex scenarios. However, current approaches are hindered by performance issues and difficulties in collecting training datasets. In this paper, we propose GeRM (Generalist Robotic Model). We utilize offline reinforcement learning to optimize data utilization strategies to learn from both demonstrations and sub-optimal data, thus surpassing the limitations of human demonstrations. Thereafter, we employ a transformer-based VLA network to process multi-modal inputs and output actions. By introducing the Mixture-of-Experts structure, GeRM allows faster inference speed with higher whole model capacity, and thus resolves the issue of limited RL parameters, enhancing model performance in multi-task learning while controlling computational costs. Through a series of experiments, we demonstrate that GeRM outperforms other methods across all tasks, while also validating its efficiency in both training and inference processes. Additionally, we uncover its potential to acquire emergent skills. Additionally, we contribute the QUARD-Auto dataset, collected automatically to support our training approach and foster advancements in multi-task quadruped robot learning. This work presents a new paradigm for reducing the cost of collecting robot data and driving progress in the multi-task learning community. You can reach our project and video through the link: https://songwxuan.github.io/GeRM/ .

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2403.13358

Country: Asia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.72)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture

Lu, Juanwu, Cui, Can, Ma, Yunsheng, Bera, Aniket, Wang, Ziran

arXiv.org Artificial IntelligenceApr-4-2024

Safety and robustness are crucial factors in developing trustworthy autonomous vehicles. One essential aspect of addressing these factors is to equip vehicles with the capability to predict future trajectories for all moving objects in the surroundings and quantify prediction uncertainties. In this paper, we propose the Sequential Neural Variational Agent (SeNeVA), a generative model that describes the distribution of future trajectories for a single moving object. Our approach can distinguish Out-of-Distribution data while quantifying uncertainty and achieving competitive performance compared to state-of-the-art methods on the Argoverse 2 and INTERACTION datasets. Specifically, a 0.446 meters minimum Final Displacement Error, a 0.203 meters minimum Average Displacement Error, and a 5.35% Miss Rate are achieved on the INTERACTION test set. Extensive qualitative and quantitative analysis is also provided to evaluate the proposed model. Our open-source code is available at https://github.com/PurdueDigitalTwin/seneva.

artificial intelligence, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2404.03789

Country: North America > United States (0.47)

Genre: Research Report (1.00)

Industry: Transportation (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.67)

Add feedback