CoughViT: A Self-Supervised Vision Transformer for Cough Audio Representation Learning
Luong, Justin, Xue, Hao, Salim, Flora D.
–arXiv.org Artificial Intelligence
Physicians routinely assess respiratory sounds during the diagnostic process, providing insight into the condition of a patient's airways. In recent years, AI-based diagnostic systems operating on respiratory sounds, have demonstrated success in respiratory disease detection. These systems represent a crucial advancement in early and accessible diagnosis which is essential for timely treatment. However, label and data scarcity remain key challenges, especially for conditions beyond COVID-19, limiting diagnostic performance and reliable evaluation. In this paper, we propose CoughViT, a novel pre-training framework for learning general-purpose cough sound representations, to enhance diagnostic performance in tasks with limited data. To address label scarcity, we employ masked data modelling to train a feature encoder in a self-supervised learning manner. We evaluate our approach against other pre-training strategies on three diagnostically important cough classification tasks. Experimental results show that our representations match or exceed current state-of-the-art supervised audio representations in enhancing performance on downstream tasks.
arXiv.org Artificial Intelligence
Aug-7-2025
- Country:
- North America (0.28)
- Oceania > Australia
- New South Wales (0.14)
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Technology: