AITopics | Huang, Yifan

Collaborating Authors

Huang, Yifan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multimodal Magic Elevating Depression Detection with a Fusion of Text and Audio Intelligence

Gan, Lindy, Huang, Yifan, Gao, Xiaoyang, Tan, Jiaming, Zhao, Fujun, Yang, Tao

arXiv.org Artificial IntelligenceJan-31-2025

ABSTRACT This study proposes an innovative multimodal fusion model based on a teacherstudent architecture to enhance the accuracy of depression classification. Our designed model addresses the limitations of traditional methods in feature fusion and modality weight allocation by introducing multi-head attention mechanisms and weighted multimodal transfer learning. Leveraging the DAIC-WOZ dataset, the student fusion model, guided by textual and auditory teacher models, achieves significant improvements in classification accuracy. Ablation experiments demonstrate that the proposed model attains an F1 score of 99. 1% on the test set, significantly outperforming unimodal and conventional approaches. Our method effectively captures the complementarity between textual and audio features while dynamically adjusting the contributions of the teacher models to enhance generalization capabilities. The experimental results highlight the robustness and adaptability of the proposed framework in handling complex multimodal data. This research provides a novel technical framework for multimodal large model learning in depression analysis, offering new insights into addressing the limitations of existing methods in modality fusion and feature extraction. INTRODUCTION Depression is a significant global health concern that affects millions of individuals across various demographics, leading to considerable social, economic, and health-related impacts. According to the World Health Organization (WHO), depression is one of the leading causes of disability worldwide, with over 264 million people affected.

machine learning, natural language, teacher model, (19 more...)

arXiv.org Artificial Intelligence

2501.16813

Country:

Asia (0.14)
Europe > Russia (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Comprehensive Online Training and Deployment for Spiking Neural Networks

Hao, Zecheng, Huang, Yifan, Xu, Zijie, Yu, Zhaofei, Huang, Tiejun

arXiv.org Artificial IntelligenceOct-9-2024

Spiking Neural Networks (SNNs) are considered to have enormous potential in the future development of Artificial Intelligence (AI) due to their brain-inspired and energy-efficient properties. In the current supervised learning domain of SNNs, compared to vanilla Spatial-Temporal Back-propagation (STBP) training, online training can effectively overcome the risk of GPU memory explosion and has received widespread academic attention. However, the current proposed online training methods cannot tackle the inseparability problem of temporal dependent gradients and merely aim to optimize the training memory, resulting in no performance advantages compared to the STBP training models in the inference phase. To address the aforementioned challenges, we propose Efficient Multi-Precision Firing (EM-PF) model, which is a family of advanced spiking models based on floating-point spikes and binary synaptic weights. We point out that EM-PF model can effectively separate temporal gradients and achieve full-stage optimization towards computation speed and memory footprint. Experimental results have demonstrated that EM-PF model can be flexibly combined with various techniques including random back-propagation, parallel computation and channel attention mechanism, to achieve state-of-the-art performance with extremely low computational overhead in the field of online learning.

artificial intelligence, machine learning, neural network, (12 more...)

arXiv.org Artificial Intelligence

2410.07547

Genre: Research Report (0.70)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SCUNet++: Swin-UNet and CNN Bottleneck Hybrid Architecture with Multi-Fusion Dense Skip Connection for Pulmonary Embolism CT Image Segmentation

Chen, Yifei, Zou, Binfeng, Guo, Zhaoxin, Huang, Yiyu, Huang, Yifan, Qin, Feiwei, Li, Qinhai, Wang, Changmiao

arXiv.org Artificial IntelligenceJan-2-2024

Pulmonary embolism (PE) is a prevalent lung disease that can lead to right ventricular hypertrophy and failure in severe cases, ranking second in severity only to myocardial infarction and sudden death. Pulmonary artery CT angiography (CTPA) is a widely used diagnostic method for PE. However, PE detection presents challenges in clinical practice due to limitations in imaging technology. CTPA can produce noises similar to PE, making confirmation of its presence time-consuming and prone to overdiagnosis. Nevertheless, the traditional segmentation method of PE can not fully consider the hierarchical structure of features, local and global spatial features of PE CT images. In this paper, we propose an automatic PE segmentation method called SCUNet++ (Swin Conv UNet++). This method incorporates multiple fusion dense skip connections between the encoder and decoder, utilizing the Swin Transformer as the encoder. And fuses features of different scales in the decoder subnetwork to compensate for spatial information loss caused by the inevitable downsampling in Swin-UNet or other state-of-the-art methods, effectively solving the above problem. We provide a theoretical analysis of this method in detail and validate it on publicly available PE CT image datasets FUMPE and CAD-PE. The experimental results indicate that our proposed method achieved a Dice similarity coefficient (DSC) of 83.47% and a Hausdorff distance 95th percentile (HD95) of 3.83 on the FUMPE dataset, as well as a DSC of 83.42% and an HD95 of 5.10 on the CAD-PE dataset. These findings demonstrate that our method exhibits strong performance in PE segmentation tasks, potentially enhancing the accuracy of automatic segmentation of PE and providing a powerful diagnostic tool for clinical physicians. Our source code and new FUMPE dataset are available at https://github.com/JustlfC03/SCUNet-plusplus.

artificial intelligence, machine learning, segmentation, (20 more...)

arXiv.org Artificial Intelligence

2312.14705

Country:

Europe (0.93)
Asia > China (0.68)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.92)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.66)

Add feedback

Effective Audio Classification Network Based on Paired Inverse Pyramid Structure and Dense MLP Block

Chen, Yunhao, Zhu, Yunjie, Yan, Zihui, Huang, Yifan, Ren, Zhen, Shen, Jianlu, Chen, Lifang

arXiv.org Artificial IntelligenceMay-30-2023

Recently, massive architectures based on Convolutional Neural Network (CNN) and self-attention mechanisms have become necessary for audio classification. While these techniques are state-of-the-art, these works' effectiveness can only be guaranteed with huge computational costs and parameters, large amounts of data augmentation, transfer from large datasets and some other tricks. By utilizing the lightweight nature of audio, we propose an efficient network structure called Paired Inverse Pyramid Structure (PIP) and a network called Paired Inverse Pyramid Structure MLP Network (PIPMN) to overcome these problems. The PIPMN reaches 95.5% of Environmental Sound Classification (ESC) accuracy on the UrbanSound8K dataset and 93.2% of Music Genre Classification (MGC) on the GTAZN dataset, with only 1 million parameters. Both of the results are achieved without data augmentation or transfer learning. The PIPMN can achieve similar or even exceeds other state-ofthe-art models with much less parameters under this setting. The Code is available on the https://github.com/JNAIC/PIPMN Keywords: audio classification multi-stage structure skip connection multi-layer perceptron (MLP).

artificial intelligence, information, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2211.0294

Country: Europe > United Kingdom (0.14)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Triadic Temporal Exponential Random Graph Models (TTERGM)

Huang, Yifan, Barham, Clayton, Page, Eric, Douglas, Pamela K

arXiv.org Machine LearningNov-29-2022

Temporal exponential random graph models (TERGM) are powerful statistical models that can be used to infer the temporal pattern of edge formation and elimination in complex networks (e.g., social networks). TERGMs can also be used in a generative capacity to predict longitudinal time series data in these evolving graphs. However, parameter estimation within this framework fails to capture many real-world properties of social networks, including: triadic relationships, small world characteristics, and social learning theories which could be used to constrain the probabilistic estimation of dyadic covariates. Here, we propose triadic temporal exponential random graph models (TTERGM) to fill this void, which includes these hierarchical network relationships within the graph model. We represent social network learning theory as an additional probability distribution that optimizes Markov chains in the graph vector space. The new parameters are then approximated via Monte Carlo maximum likelihood estimation. We show that our TTERGM model achieves improved fidelity and more accurate predictions compared to several benchmark methods on GitHub network data.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Machine Learning

2211.16229

Country: North America > United States > Florida (0.29)

Genre: Research Report (1.00)

Industry:

Telecommunications > Networks (0.34)
Information Technology > Networks (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
(3 more...)

Add feedback