AITopics | Zhong, Zihan

Collaborating Authors

Zhong, Zihan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robust Cross-Etiology and Speaker-Independent Dysarthric Speech Recognition

Singh, Satwinder, Wang, Qianli, Zhong, Zihan, Mendes, Clarion, Hasegawa-Johnson, Mark, Abdulla, Waleed, Shahamiri, Seyed Reza

arXiv.org Artificial IntelligenceJan-24-2025

In this paper, we present a speaker-independent dysarthric speech recognition system, with a focus on evaluating the recently released Speech Accessibility Project (SAP-1005) dataset, which includes speech data from individuals with Parkinson's disease (PD). Despite the growing body of research in dysarthric speech recognition, many existing systems are speaker-dependent and adaptive, limiting their generalizability across different speakers and etiologies. Our primary objective is to develop a robust speaker-independent model capable of accurately recognizing dysarthric speech, irrespective of the speaker. Additionally, as a secondary objective, we aim to test the cross-etiology performance of our model by evaluating it on the TORGO dataset, which contains speech samples from individuals with cerebral palsy (CP) and amyotrophic lateral sclerosis (ALS). By leveraging the Whisper model, our speaker-independent system achieved a CER of 6.99% and a WER of 10.71% on the SAP-1005 dataset. Further, in cross-etiology settings, we achieved a CER of 25.08% and a WER of 39.56% on the TORGO dataset. These results highlight the potential of our approach to generalize across unseen speakers and different etiologies of dysarthria.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.14994

Country: North America > United States > Illinois > Champaign County (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data

Tang, Zhiqiang, Zhong, Zihan, He, Tong, Friedland, Gerald

arXiv.org Artificial IntelligenceDec-19-2024

This paper studies the best practices for automatic machine learning (AutoML). While previous AutoML efforts have predominantly focused on unimodal data, the multimodal aspect remains under-explored. Our study delves into classification and regression problems involving flexible combinations of image, text, and tabular data. We curate a benchmark comprising 22 multimodal datasets from diverse real-world applications, encompassing all 4 combinations of the 3 modalities. Across this benchmark, we scrutinize design choices related to multimodal fusion strategies, multimodal data augmentation, converting tabular data into text, cross-modal alignment, and handling missing modalities. Through extensive experimentation and analysis, we distill a collection of effective strategies and consolidate them into a unified pipeline, achieving robust performance on diverse datasets.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.16243

Country: Europe > Switzerland (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Information Technology (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

Tang, Zhiqiang, Fang, Haoyang, Zhou, Su, Yang, Taojiannan, Zhong, Zihan, Hu, Tony, Kirchhoff, Katrin, Karypis, George

arXiv.org Artificial IntelligenceApr-30-2024

AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundation models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite of functionalities spanning classification, regression, object detection, semantic matching, and image segmentation. Experiments across diverse datasets and tasks showcases AutoMM's superior performance in basic classification and regression tasks compared to existing AutoML tools, while also demonstrating competitive results in advanced tasks, aligning with specialized toolboxes designed for such purposes.

machine learning, natural language, text classification, (20 more...)

arXiv.org Artificial Intelligence

2404.16233

Country:

North America > United States (0.46)
North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Banking & Finance (0.67)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)
Information Technology > Services (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
(6 more...)

Add feedback

Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

Zhong, Zihan, Tang, Zhiqiang, He, Tong, Fang, Haoyang, Yuan, Chun

arXiv.org Artificial IntelligenceJan-31-2024

The Segment Anything Model (SAM) stands as a foundational framework for image segmentation. While it exhibits remarkable zero-shot generalization in typical scenarios, its advantage diminishes when applied to specialized domains like medical imagery and remote sensing. To address this limitation, this paper introduces Conv-LoRA, a simple yet effective parameter-efficient fine-tuning approach. By integrating ultra-lightweight convolutional parameters into Low-Rank Adaptation (LoRA), Conv-LoRA can inject image-related inductive biases into the plain ViT encoder, further reinforcing SAM's local prior assumption. Notably, Conv-LoRA not only preserves SAM's extensive segmentation knowledge but also revives its capacity of learning high-level image semantics, which is constrained by SAM's foreground-background segmentation pretraining. Comprehensive experimentation across diverse benchmarks spanning multiple domains underscores Conv-LoRA's superiority in adapting SAM to real-world semantic segmentation tasks.

large language model, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2401.17868

Country:

Europe (1.00)
Asia > China (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.63)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.94)
Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation

Ren, Yuxin, Zhong, Zihan, Shi, Xingjian, Zhu, Yi, Yuan, Chun, Li, Mu

arXiv.org Artificial IntelligenceMay-16-2023

It has been commonly observed that a teacher model with superior performance does not necessarily result in a stronger student, highlighting a discrepancy between current teacher training practices and effective knowledge transfer. In order to enhance the guidance of the teacher training process, we introduce the concept of distillation influence to determine the impact of distillation from each training sample on the student's generalization ability. In this paper, we propose Learning Good Teacher Matters (LGTM), an efficient training technique for incorporating distillation influence into the teacher's learning process. By prioritizing samples that are likely to enhance the student's generalization ability, our LGTM outperforms 10 common knowledge distillation baselines on 6 text classification tasks in the GLUE benchmark.

distillation, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.09651

Country: North America > United States (0.47)

Genre: Research Report (0.82)

Industry: Education > Teacher Education (0.44)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback