AITopics | Yuan, Hongyu

Collaborating Authors

Yuan, Hongyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition

Liu, Rui, Yuan, Hongyu, Li, Haizhou

arXiv.org Artificial IntelligenceJan-3-2025

Unlike traditional Automatic Speech Recognition (ASR), Audio-Visual Speech Recognition (AVSR) takes audio and visual signals simultaneously to infer the transcription. Recent studies have shown that Large Language Models (LLMs) can be effectively used for Generative Error Correction (GER) in ASR by predicting the best transcription from ASR-generated N-best hypotheses. However, these LLMs lack the ability to simultaneously understand audio and visual, making the GER approach challenging to apply in AVSR. In this work, we propose a novel GER paradigm for AVSR, termed AVGER, that follows the concept of ``listening and seeing again''. Specifically, we first use the powerful AVSR system to read the audio and visual signals to get the N-Best hypotheses, and then use the Q-former-based Multimodal Synchronous Encoder to read the audio and visual information again and convert them into an audio and video compression representation respectively that can be understood by LLM. Afterward, the audio-visual compression representation and the N-Best hypothesis together constitute a Cross-modal Prompt to guide the LLM in producing the best transcription. In addition, we also proposed a Multi-Level Consistency Constraint training criterion, including logits-level, utterance-level and representations-level, to improve the correction accuracy while enhancing the interpretability of audio and visual compression representations. The experimental results on the LRS3 dataset show that our method outperforms current mainstream AVSR systems. The proposed AVGER can reduce the Word Error Rate (WER) by 24% compared to them. Code and models can be found at: https://github.com/CircleRedRain/AVGER.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.04038

Country:

Asia > China (0.68)
Europe > Austria > Vienna (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Leveraging Retrieval Augment Approach for Multimodal Emotion Recognition Under Missing Modalities

Fan, Qi, Yuan, Hongyu, Zuo, Haolin, Liu, Rui, Gao, Guanglai

arXiv.org Artificial IntelligenceSep-18-2024

Multimodal emotion recognition utilizes complete multimodal information and robust multimodal joint representation to gain high performance. However, the ideal condition of full modality integrity is often not applicable in reality and there always appears the situation that some modalities are missing. For example, video, audio, or text data is missing due to sensor failure or network bandwidth problems, which presents a great challenge to MER research. Traditional methods extract useful information from the complete modalities and reconstruct the missing modalities to learn robust multimodal joint representation. These methods have laid a solid foundation for research in this field, and to a certain extent, alleviated the difficulty of multimodal emotion recognition under missing modalities. However, relying solely on internal reconstruction and multimodal joint learning has its limitations, especially when the missing information is critical for emotion recognition. To address this challenge, we propose a novel framework of Retrieval Augment for Missing Modality Multimodal Emotion Recognition (RAMER), which introduces similar multimodal emotion data to enhance the performance of emotion recognition under missing modalities. By leveraging databases, that contain related multimodal emotion data, we can retrieve similar multimodal emotion information to fill in the gaps left by missing modalities. Various experimental results demonstrate that our framework is superior to existing state-of-the-art approaches in missing modality MER tasks. Our whole project is publicly available on https://github.com/WooyoohL/Retrieval_Augment_MER.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2410.02804

Country: Asia (0.28)

Genre: Research Report > Promising Solution (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge

Nan, Yang, Xing, Xiaodan, Wang, Shiyi, Tang, Zeyu, Felder, Federico N, Zhang, Sheng, Ledda, Roberta Eufrasia, Ding, Xiaoliu, Yu, Ruiqi, Liu, Weiping, Shi, Feng, Sun, Tianyang, Cao, Zehong, Zhang, Minghui, Gu, Yun, Zhang, Hanxiao, Gao, Jian, Tang, Wen, Yu, Pengxin, Kang, Han, Chen, Junqiang, Lu, Xing, Zhang, Boyu, Mamalakis, Michail, Prinzi, Francesco, Carlini, Gianluca, Cuneo, Lisa, Banerjee, Abhirup, Xing, Zhaohu, Zhu, Lei, Mesbah, Zacharia, Jain, Dhruv, Mayet, Tsiry, Yuan, Hongyu, Lyu, Qing, Wells, Athol, Walsh, Simon LF, Yang, Guang

arXiv.org Artificial IntelligenceDec-21-2023

Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intricate honeycombing patterns present in the lung tissues of fibrotic lung disease patients exacerbate the challenges, often leading to various prediction errors. To address this issue, the 'Airway-Informed Quantitative CT Imaging Biomarker for Fibrotic Lung Disease 2023' (AIIB23) competition was organized in conjunction with the official 2023 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). The airway structures were meticulously annotated by three experienced radiologists. Competitors were encouraged to develop automatic airway segmentation models with high robustness and generalization abilities, followed by exploring the most correlated QIB of mortality prediction. A training set of 120 high-resolution computerised tomography (HRCT) scans were publicly released with expert annotations and mortality status. The online validation set incorporated 52 HRCT scans from patients with fibrotic lung disease and the offline test set included 140 cases from fibrosis and COVID-19 patients. The results have shown that the capacity of extracting airway trees from patients with fibrotic lung disease could be enhanced by introducing voxel-wise weighted general union loss and continuity loss. In addition to the competitive image biomarkers for prognosis, a strong airway-derived biomarker (Hazard ratio>1.5, p<0.0001) was revealed for survival prognostication compared with existing clinical measurements, clinician assessment and AI-based biomarkers.

artificial intelligence, machine learning, segmentation, (17 more...)

arXiv.org Artificial Intelligence

2312.13752

Country:

Asia > China (0.68)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.46)

Industry: Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback