AITopics | Zhang, Yuanyuan

Collaborating Authors

Zhang, Yuanyuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Biomedical Foundation Model: A Survey

Liu, Xiangrui, Zhang, Yuanyuan, Lu, Yingzhou, Yin, Changchang, Hu, Xiaoling, Liu, Xiaoou, Chen, Lulu, Wang, Sheng, Rodriguez, Alexander, Yao, Huaxiu, Yang, Yezhou, Zhang, Ping, Chen, Jintai, Fu, Tianfan, Wang, Xiao

arXiv.org Artificial IntelligenceMar-3-2025

Foundation models, first introduced in 2021, are large-scale pre-trained models (e.g., large language models (LLMs) and vision-language models (VLMs)) that learn from extensive unlabeled datasets through unsupervised methods, enabling them to excel in diverse downstream tasks. These models, like GPT, can be adapted to various applications such as question answering and visual understanding, outperforming task-specific AI models and earning their name due to broad applicability across fields. The development of biomedical foundation models marks a significant milestone in leveraging artificial intelligence (AI) to understand complex biological phenomena and advance medical research and practice. This survey explores the potential of foundation models across diverse domains within biomedical fields, including computational biology, drug discovery and development, clinical informatics, medical imaging, and public health. The purpose of this survey is to inspire ongoing research in the application of foundation models to health science.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.02104

Country:

Asia (0.67)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Indiana > Tippecanoe County (0.14)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.86)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Public Health (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)

Add feedback

radarODE-MTL: A Multi-Task Learning Framework with Eccentric Gradient Alignment for Robust Radar-Based ECG Reconstruction

Zhang, Yuanyuan, Yang, Rui, Yue, Yutao, Lim, Eng Gee

arXiv.org Artificial IntelligenceOct-11-2024

Millimeter-wave radar is promising to provide robust and accurate vital sign monitoring in an unobtrusive manner. However, the radar signal might be distorted in propagation by ambient noise or random body movement, ruining the subtle cardiac activities and destroying the vital sign recovery. In particular, the recovery of electrocardiogram (ECG) signal heavily relies on the deep-learning model and is sensitive to noise. Therefore, this work creatively deconstructs the radar-based ECG recovery into three individual tasks and proposes a multi-task learning (MTL) framework, radarODE-MTL, to increase the robustness against consistent and abrupt noises. In addition, to alleviate the potential conflicts in optimizing individual tasks, a novel multi-task optimization strategy, eccentric gradient alignment (EGA), is proposed to dynamically trim the task-specific gradients based on task difficulties in orthogonal space. The proposed radarODE-MTL with EGA is evaluated on the public dataset with prominent improvements in accuracy, and the performance remains consistent under noises. The experimental results indicate that radarODE-MTL could reconstruct accurate ECG signals robustly from radar signals and imply the application prospect in real-life situations. The code is available at: http://github.com/ZYY0844/radarODE-MTL.

artificial intelligence, machine learning, recovery, (19 more...)

arXiv.org Artificial Intelligence

2410.08656

Country: Asia > China (0.46)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Contextualization of ASR with LLM using phonetic retrieval-based augmentation

Lei, Zhihong, Na, Xingyu, Xu, Mingbin, Pusateri, Ernest, Van Gysel, Christophe, Zhang, Yuanyuan, Han, Shiyi, Huang, Zhen

arXiv.org Artificial IntelligenceSep-11-2024

Large language models (LLMs) have shown superb capability of modeling multimodal signals including audio and text, allowing the model to generate spoken or textual response given a speech input. However, it remains a challenge for the model to recognize personal named entities, such as contacts in a phone book, when the input modality is speech. In this work, we start with a speech recognition task and propose a retrieval-based solution to contextualize the LLM: we first let the LLM detect named entities in speech without any context, then use this named entity as a query to retrieve phonetically similar named entities from a personal database and feed them to the LLM, and finally run context-aware LLM decoding. In a voice assistant task, our solution achieved up to 30.2% relative word error rate reduction and 73.6% relative named entity error rate reduction compared to a baseline system without contextualization. Notably, our solution by design avoids prompting the LLM with the full named entity database, making it highly efficient and applicable to large named entity databases.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2409.15353

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

AI-Based Beam-Level and Cell-Level Mobility Management for High Speed Railway Communications

Li, Wen, Chen, Wei, Wang, Shiyue, Zhang, Yuanyuan, Matthaiou, Michail, Ai, Bo

arXiv.org Artificial IntelligenceJul-5-2024

High-speed railway (HSR) communications are pivotal for ensuring rail safety, operations, maintenance, and delivering passenger information services. The high speed of trains creates rapidly time-varying wireless channels, increases the signaling overhead, and reduces the system throughput, making it difficult to meet the growing and stringent needs of HSR applications. In this article, we explore artificial intelligence (AI)-based beam-level and cell-level mobility management suitable for HSR communications, including the use cases, inputs, outputs, and key performance indicators (KPI)s of AI models. Particularly, in comparison to traditional down-sampling spatial beam measurements, we show that the compressed spatial multi-beam measurements via compressive sensing lead to improved spatial-temporal beam prediction. Moreover, we demonstrate the performance gains of AI-assisted cell handover over traditional mobile handover mechanisms. In addition, we observe that the proposed approaches to reduce the measurement overhead achieve comparable radio link failure performance with the traditional approach that requires all the beam measurements of all cells, while the former methods can save 50% beam measurement overhead.

artificial intelligence, communication, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2407.04336

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Rail (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications > Mobile (0.76)

Add feedback

Improving child speech recognition with augmented child-like speech

Zhang, Yuanyuan, Yue, Zhengjun, Patel, Tanvina, Scharenborg, Odette

arXiv.org Artificial IntelligenceJun-12-2024

State-of-the-art ASRs show suboptimal performance for child speech. The scarcity of child speech limits the development of child speech recognition (CSR). Therefore, we studied child-to-child voice conversion (VC) from existing child speakers in the dataset and additional (new) child speakers via monolingual and cross-lingual (Dutch-to-German) VC, respectively. The results showed that cross-lingual child-to-child VC significantly improved child ASR performance. Experiments on the impact of the quantity of child-to-child cross-lingual VC-generated data on fine-tuning (FT) ASR models gave the best results with two-fold augmentation for our FT-Conformer model and FT-Whisper model which reduced WERs with ~3% absolute compared to the baseline, and with six-fold augmentation for the model trained from scratch, which improved by an absolute 3.6% WER. Moreover, using a small amount of "high-quality" VC-generated data achieved similar results to those of our best-FT models.

artificial intelligence, speech, speech recognition, (17 more...)

arXiv.org Artificial Intelligence

2406.10284

Country:

Europe > Netherlands (0.14)
Europe > Italy (0.14)
Europe > Greece (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Acoustic Model Fusion for End-to-end Speech Recognition

Lei, Zhihong, Xu, Mingbin, Han, Shiyi, Liu, Leo, Huang, Zhen, Ng, Tim, Zhang, Yuanyuan, Pusateri, Ernest, Hannemann, Mirko, Deng, Yaqiao, Siu, Man-Hung

arXiv.org Artificial IntelligenceOct-10-2023

Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, trained exclusively on text corpora, into the E2E system has proven to be beneficial. However, the application of LM fusion presents certain drawbacks, such as its inability to address the domain mismatch issue inherent to the internal AM. Drawing inspiration from the concept of LM fusion, we propose the integration of an external AM into the E2E system to better address the domain mismatch. By implementing this novel approach, we have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across varied test sets. We also discovered that this AM fusion approach is particularly beneficial in enhancing named entity recognition.

end-to-end speech recognition, natural language, text processing, (2 more...)

arXiv.org Artificial Intelligence

2310.07062

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)

Add feedback

Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for Low-Resource Speech Recognition with Transducers

Silovsky, Jan, Deng, Liuhui, Argueta, Arturo, Arvizo, Tresi, Hsiao, Roger, Kuznietsov, Sasha, Lin, Yiu-Chang, Xiao, Xiaoqiang, Zhang, Yuanyuan

arXiv.org Artificial IntelligenceMay-22-2023

Voice technology has become ubiquitous recently. However, the accuracy, and hence experience, in different languages varies significantly, which makes the technology not equally inclusive. The availability of data for different languages is one of the key factors affecting accuracy, especially in training of all-neural end-to-end automatic speech recognition systems. Cross-lingual knowledge transfer and iterative pseudo-labeling are two techniques that have been shown to be successful for improving the accuracy of ASR systems, in particular for low-resource languages, like Ukrainian. Our goal is to train an all-neural Transducer-based ASR system to replace a DNN-HMM hybrid system with no manually annotated training data. We show that the Transducer system trained using transcripts produced by the hybrid system achieves 18% reduction in terms of word error rate. However, using a combination of cross-lingual knowledge transfer from related languages and iterative pseudo-labeling, we are able to achieve 35% reduction of the error rate.

artificial intelligence, machine learning, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

2305.13652

Country: Europe > Germany (0.14)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Deep Fusion: An Attention Guided Factorized Bilinear Pooling for Audio-video Emotion Recognition

Zhang, Yuanyuan, Wang, Zi-Rui, Du, Jun

arXiv.org Machine LearningJan-15-2019

Automatic emotion recognition (AER) is a challenging task due to the abstract concept and multiple expressions of emotion. Although there is no consensus on a definition, human emotional states usually can be apperceived by auditory and visual systems. Inspired by this cognitive process in human beings, it's natural to simultaneously utilize audio and visual information in AER. However, most traditional fusion approaches only build a linear paradigm, such as feature concatenation and multi-system fusion, which hardly captures complex association between audio and video. In this paper, we introduce factorized bilinear pooling (FBP) to deeply integrate the features of audio and video. Specifically, the features are selected through the embedded attention mechanism from respective modalities to obtain the emotion-related regions. The whole pipeline can be completed in a neural network. Validated on the AFEW database of the audio-video sub-challenge in EmotiW2018, the proposed approach achieves an accuracy of 62.48%, outperforming the state-of-the-art result.

deep learning, emotion recognition, neural network, (17 more...)

arXiv.org Machine Learning

1901.04889

Country: Asia > China > Anhui Province (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.96)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.95)

Add feedback

XJTLUIndoorLoc: A New Fingerprinting Database for Indoor Localization and Trajectory Estimation Based on Wi-Fi RSS and Geomagnetic Field

Zhong, Zhenghang, Tang, Zhe, Li, Xiangxing, Yuan, Tiancheng, Yang, Yang, Wei, Meng, Zhang, Yuanyuan, Sheng, Renzhi, Grant, Naomi, Ling, Chongfeng, Huan, Xintao, Kim, Kyeong Soo, Lee, Sanghyuk

arXiv.org Machine LearningOct-16-2018

Abstract--In this paper, we present a new location fingerprinting database comprised of Wi-Fi received signal strength (RSS) and geomagnetic field intensity measured with multiple devices at a multi-floor building in Xi'an Jiatong-Liverpool University, Suzhou, China. We also provide preliminary results of localization and trajectory estimation based on convolutional neural network (CNN) and long short-term memory (LSTM) network with this database. For localization, we map RSS data for a reference point to an image-like, two-dimensional array and then apply CNN which is popular in image and video analysis and recognition. For trajectory estimation, we use a modified random way point model to efficiently generate continuous step traces imitating human walking and train a stacked twolayer LSTM network with the generated data to remember the changing pattern of geomagnetic field intensity against (x, y) coordinates. Experimental results demonstrate the usefulness of our new database and the feasibility of the CNN and LSTMbased localization and trajectory estimation with the database. Index Terms--Indoor localization, trajectory estimation, received signal strength, Wi-Fi fingerprinting, deep learning, CNN, LSTM, geomagnetic field. With the increasing demands for location-aware services and proliferation of smart phones with embedded highprecision sensors, indoor localization has attracted lots of attention from the research community. Global navigation satellite system (GNSS) like global positioning system (GPS), which provides accurate geo-spatial positioning, cannot be used indoors as the radio signals from satellites is easily blocked in an indoor environment.

database, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

1810.07377

Country:

Asia > China > Shaanxi Province > Xi'an (0.25)
North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Media (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback