AITopics | Cui, Can

Collaborating Authors

Cui, Can

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications

Cui, Can, Sheikh, Imran Ahamad, Sadeghi, Mostafa, Vincent, Emmanuel

arXiv.org Artificial IntelligenceMar-11-2024

Past studies on end-to-end meeting transcription have focused on model architecture and have mostly been evaluated on simulated meeting data. We present a novel study aiming to optimize the use of a Speaker-Attributed ASR (SA-ASR) system in real-life scenarios, such as the AMI meeting corpus, for improved speaker assignment of speech segments. First, we propose a pipeline tailored to real-life applications involving Voice Activity Detection (VAD), Speaker Diarization (SD), and SA-ASR. Second, we advocate using VAD output segments to fine-tune the SA-ASR model, considering that it is also applied to VAD segments during test, and show that this results in a relative reduction of Speaker Error Rate (SER) up to 28%. Finally, we explore strategies to enhance the extraction of the speaker embedding templates used as inputs by the SA-ASR system. We show that extracting them from SD output rather than annotated speaker segments results in a relative SER reduction up to 20%.

machine learning, natural language, template, (20 more...)

arXiv.org Artificial Intelligence

2403.0657

Country: Europe > France (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Large Language Models for Autonomous Driving: Real-World Experiments

Cui, Can, Yang, Zichong, Zhou, Yupeng, Ma, Yunsheng, Lu, Juanwu, Li, Lingxi, Chen, Yaobin, Panchal, Jitesh, Wang, Ziran

arXiv.org Artificial IntelligenceFeb-4-2024

Autonomous driving systems are increasingly popular in today's technological landscape, where vehicles with partial automation have already been widely available on the market, and the full automation era with "driverless" capabilities is near the horizon. However, accurately understanding humans' commands, particularly for autonomous vehicles that have only passengers instead of drivers, and achieving a high level of personalization remain challenging tasks in the development of autonomous driving systems. In this paper, we introduce a Large Language Model (LLM)-based framework Talk-to-Drive (Talk2Drive) to process verbal commands from humans and make autonomous driving decisions with contextual information, satisfying their personalized preferences for safety, efficiency, and comfort. First, a speech recognition module is developed for Talk2Drive to interpret verbal inputs from humans to textual instructions, which are then sent to LLMs for reasoning. Then, appropriate commands for the Electrical Control Unit (ECU) are generated, achieving a 100% success rate in executing codes. Real-world experiments show that our framework can substantially reduce the takeover rate for a diverse range of drivers by up to 90.1%. To the best of our knowledge, Talk2Drive marks the first instance of employing an LLM-based system in a real-world autonomous driving environment.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2312.09397

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs

Ma, Yunsheng, Cui, Can, Cao, Xu, Ye, Wenqian, Liu, Peiran, Lu, Juanwu, Abdelraouf, Amr, Gupta, Rohit, Han, Kyungtae, Bera, Aniket, Rehg, James M., Wang, Ziran

arXiv.org Artificial IntelligenceDec-7-2023

We present LaMPilot, a novel framework for planning in the field of autonomous driving, rethinking the task as a code-generation process that leverages established behavioral primitives. This approach aims to address the challenge of interpreting and executing spontaneous user instructions such as "overtake the car ahead," which have typically posed difficulties for existing frameworks. We introduce the LaMPilot benchmark specifically designed to quantitatively evaluate the efficacy of Large Language Models (LLMs) in translating human directives into actionable driving policies. We then evaluate a wide range of state-of-the-art code generation language models on tasks from the LaMPilot Benchmark. The results of the experiments showed that GPT-4, with human feedback, achieved an impressive task completion rate of 92.7% and a minimal collision rate of 0.9%. To encourage further investigation in this area, our code and dataset will be made available.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2312.04372

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.86)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

End-to-end Joint Rich and Normalized ASR with a limited amount of rich training data

Cui, Can, Sheikh, Imran Ahamad, Sadeghi, Mostafa, Vincent, Emmanuel

arXiv.org Artificial IntelligenceNov-29-2023

Joint rich and normalized automatic speech recognition (ASR), that produces transcriptions both with and without punctuation and capitalization, remains a challenge. End-to-end (E2E) ASR models offer both convenience and the ability to perform such joint transcription of speech. Training such models requires paired speech and rich text data, which is not widely available. In this paper, we compare two different approaches to train a stateless Transducer-based E2E joint rich and normalized ASR system, ready for streaming applications, with a limited amount of rich labeled data. The first approach uses a language model to generate pseudo-rich transcriptions of normalized training data. The second approach uses a single decoder conditioned on the type of the output. The first approach leads to E2E rich ASR which perform better on out-of-domain data, with up to 9% relative reduction in errors. The second approach demonstrates the feasibility of an E2E joint rich and normalized ASR system using as low as 5% rich training data with moderate (2.42% absolute) increase in errors.

artificial intelligence, machine learning, transcription, (15 more...)

arXiv.org Artificial Intelligence

2311.17741

Country: Europe > France (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)

Add feedback

A Survey on Multimodal Large Language Models for Autonomous Driving

Cui, Can, Ma, Yunsheng, Cao, Xu, Ye, Wenqian, Zhou, Yang, Liang, Kaizhao, Chen, Jintai, Lu, Juanwu, Yang, Zichong, Liao, Kuei-Da, Gao, Tianren, Li, Erlong, Tang, Kun, Cao, Zhipeng, Zhou, Tong, Liu, Ao, Yan, Xinrui, Mei, Shuqi, Cao, Jianguo, Wang, Ziran, Zheng, Chao

arXiv.org Artificial IntelligenceNov-20-2023

With the emergence of Large Language Models (LLMs) and Vision Foundation Models (VFMs), multimodal AI systems benefiting from large models have the potential to equally perceive the real world, make decisions, and control tools as humans. In recent months, LLMs have shown widespread attention in autonomous driving and map systems. Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors to apply in LLM driving systems. In this paper, we present a systematic investigation in this field. We first introduce the background of Multimodal Large Language Models (MLLMs), the multimodal models development using LLMs, and the history of autonomous driving. Then, we overview existing MLLM tools for driving, transportation, and map systems together with existing datasets and benchmarks. Moreover, we summarized the works in The 1st WACV Workshop on Large Language and Vision Models for Autonomous Driving (LLVM-AD), which is the first workshop of its kind regarding LLMs in autonomous driving. To further promote the development of this field, we also discuss several important problems regarding using MLLMs in autonomous driving systems that need to be solved by both academia and industry.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2311.1232

Country:

Asia > China (0.67)
North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
North America > United States > Indiana > Tippecanoe County (0.14)
(2 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.46)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MACP: Efficient Model Adaptation for Cooperative Perception

Ma, Yunsheng, Lu, Juanwu, Cui, Can, Zhao, Sicheng, Cao, Xu, Ye, Wenqian, Wang, Ziran

arXiv.org Artificial IntelligenceNov-7-2023

Vehicle-to-vehicle (V2V) communications have greatly enhanced the perception capabilities of connected and automated vehicles (CAVs) by enabling information sharing to "see through the occlusions", resulting in significant performance improvements. However, developing and training complex multi-agent perception models from scratch can be expensive and unnecessary when existing single-agent models show remarkable generalization capabilities. In this paper, we propose a new framework termed MACP, which equips a single-agent pre-trained model with cooperation capabilities. We approach this objective by identifying the key challenges of shifting from single-agent to cooperative settings, adapting the model by freezing most of its parameters and adding a few lightweight modules. We demonstrate in our experiments that the proposed framework can effectively utilize cooperative observations and outperform other state-of-the-art approaches in both simulated and real-world cooperative perception benchmarks while requiring substantially fewer tunable parameters with reduced communication costs. Our source code is available at https://github.com/PurdueDigitalTwin/MACP.

cooperative perception, efficient model adaptation, macp

arXiv.org Artificial Intelligence

2310.1687

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback

End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis

Cui, Can, Sheikh, Imran Ahamad, Sadeghi, Mostafa, Vincent, Emmanuel

arXiv.org Artificial IntelligenceOct-16-2023

To the best of our knowledge, this is the first model cross-channel attention (MFCCA) [9] in the ASR encoder that efficiently integrates ASR and speaker identification to integrate information from different channels and hiddenlayer modules in a multichannel setting. On simulated mixtures embeddings from the ASR decoder to assist speaker of LibriSpeech data, our system reduces the word error rate identification. However, the ASR and the diarization modules (WER) by up to 12% and 16% relative compared to previously in this approach use different types of attention for proposed single-channel and multichannel approaches, multichannel fusion, leading to an increase in the number respectively. Furthermore, we investigate the impact of different of model parameters. Moreover, the communication of ASR input features, including multichannel magnitude and and speaker information is not bidirectional, since the speaker phase information, on the ASR performance. Finally, our information is not reciprocally exploited by ASR.

information, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.10106

Country: Europe > France (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles

Cui, Can, Ma, Yunsheng, Cao, Xu, Ye, Wenqian, Wang, Ziran

arXiv.org Artificial IntelligenceOct-12-2023

The fusion of human-centric design and artificial intelligence (AI) capabilities has opened up new possibilities for next-generation autonomous vehicles that go beyond transportation. These vehicles can dynamically interact with passengers and adapt to their preferences. This paper proposes a novel framework that leverages Large Language Models (LLMs) to enhance the decision-making process in autonomous vehicles. By utilizing LLMs' linguistic and contextual understanding abilities with specialized tools, we aim to integrate the language and reasoning capabilities of LLMs into autonomous vehicles. Our research includes experiments in HighwayEnv, a collection of environments for autonomous driving and tactical decision-making tasks, to explore LLMs' interpretation, interaction, and reasoning in various scenarios. We also examine real-time personalization, demonstrating how LLMs can influence driving behaviors based on verbal commands. Our empirical results highlight the substantial advantages of utilizing chain-of-thought prompting, leading to improved driving decisions, and showing the potential for LLMs to enhance personalized driving experiences through ongoing verbal feedback. The proposed framework aims to transform autonomous vehicle operations, offering personalized support, transparent decision-making, and continuous learning to enhance safety and effectiveness. We achieve user-centric, transparent, and adaptive autonomous driving ecosystems supported by the integration of LLMs into autonomous vehicles.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.08034

Country:

North America > United States > California (0.28)
North America > United States > Indiana > Tippecanoe County (0.14)
North America > United States > Illinois > Champaign County (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Government (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Cross-scale Multi-instance Learning for Pathological Image Diagnosis

Deng, Ruining, Cui, Can, Remedios, Lucas W., Bao, Shunxing, Womick, R. Michael, Chiron, Sophie, Li, Jia, Roland, Joseph T., Lau, Ken S., Liu, Qi, Wilson, Keith T., Wang, Yaohong, Coburn, Lori A., Landman, Bennett A., Huo, Yuankai

arXiv.org Artificial IntelligenceSep-21-2023

Analyzing high resolution whole slide images (WSIs) with regard to information across multiple scales poses a significant challenge in digital pathology. Multi-instance learning (MIL) is a common solution for working with high resolution images by classifying bags of objects (i.e. sets of smaller image patches). However, such processing is typically performed at a single scale (e.g., 20x magnification) of WSIs, disregarding the vital inter-scale information that is key to diagnoses by human pathologists. In this study, we propose a novel cross-scale MIL algorithm to explicitly aggregate inter-scale relationships into a single MIL network for pathological image diagnosis. The contribution of this paper is three-fold: (1) A novel cross-scale MIL (CS-MIL) algorithm that integrates the multi-scale information and the inter-scale relationships is proposed; (2) A toy dataset with scale-specific morphological features is created and released to examine and visualize differential cross-scale attention; (3) Superior performance on both in-house and public datasets is demonstrated by our simple cross-scale MIL strategy. The official implementation is publicly available at https://github.com/hrlblab/CS-MIL.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.00216

Country:

North America > United States > Tennessee (0.14)
North America > United States > North Carolina (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Diagnostic Medicine (0.93)
Health & Medicine > Therapeutic Area > Gastroenterology (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)

Add feedback

Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles

Cui, Can, Ma, Yunsheng, Cao, Xu, Ye, Wenqian, Wang, Ziran

arXiv.org Artificial IntelligenceSep-18-2023

The future of autonomous vehicles lies in the convergence of human-centric design and advanced AI capabilities. Autonomous vehicles of the future will not only transport passengers but also interact and adapt to their desires, making the journey comfortable, efficient, and pleasant. In this paper, we present a novel framework that leverages Large Language Models (LLMs) to enhance autonomous vehicles' decision-making processes. By integrating LLMs' natural language capabilities and contextual understanding, specialized tools usage, synergizing reasoning, and acting with various modules on autonomous vehicles, this framework aims to seamlessly integrate the advanced language and reasoning capabilities of LLMs into autonomous vehicles. The proposed framework holds the potential to revolutionize the way autonomous vehicles operate, offering personalized assistance, continuous learning, and transparent decision-making, ultimately contributing to safer and more efficient autonomous driving technologies.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2309.10228

Genre: Research Report (0.40)

Industry: Transportation > Passenger (0.53)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback