Breathing and Semantic Pause Detection and Exertion-Level Classification in Post-Exercise Speech
Wang, Yuyu, Xia, Wuyue, Yao, Huaxiu, Nie, Jingping
–arXiv.org Artificial Intelligence
Post-exercise speech contains rich physiological and linguistic cues, often marked by semantic pauses, breathing pauses, and combined breathing-semantic pauses. Detecting these events enables assessment of recovery rate, lung function, and exertion-related abnormalities. However, existing works on identifying and distinguishing different types of pauses in this context are limited. In this work, building on a recently released dataset with synchronized audio and respiration signals, we provide systematic annotations of pause types. Using these annotations, we systematically conduct exploratory breathing and semantic pause detection and exertion-level classification across deep learning models (GRU, 1D CNN-LSTM, AlexNet, VGG16), acoustic features (MFCC, MFB), and layer-stratified Wav2Vec2 representations. We evaluate three setups-single feature, feature fusion, and a two-stage detection-classification cascade-under both classification and regression formulations. Results show per-type detection accuracy up to 89$\%$ for semantic, 55$\%$ for breathing, 86$\%$ for combined pauses, and 73$\%$overall, while exertion-level classification achieves 90.5$\%$ accuracy, outperformin prior work.
arXiv.org Artificial Intelligence
Sep-22-2025
- Country:
- Asia > China
- Hong Kong (0.05)
- North America > United States
- New York > New York County
- New York City (0.04)
- North Carolina > Orange County
- Chapel Hill (0.04)
- New York > New York County
- Asia > China
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Health & Medicine (0.50)
- Technology: