AaPE: Aliasing-aware Patch Embedding for Self-Supervised Audio Representation Learning

Dec-4-2025–arXiv.org Machine Learning

Abstract--Transformer-based audio SSL (self-supervised learning) models often treat spectrograms as images, applying convolutional patchification with heavy temporal downsampling. This lowers the effective Nyquist frequency and introduces aliasing, while na ıve low-pass filtering removes task-relevant high-frequency cues. AaPE augments standard patch tokens with features produced by a band-limited complex sinusoidal kernel using a two-sided exponential window that dynamically targets alias-prone bands. Frequency and decay parameters of the kernel are estimated from the input, enabling parallel, adaptive subband analysis whose outputs are fused with the standard patch tokens. AaPE integrates seamlessly into the masked teacher-student self-supervised learning. In addition, we combine a multi-mask strategy with a contrastive objective to enforce consistency across diverse mask patterns, stabilizing training. Pre-training on AudioSet followed by fine-tuning evaluation across diverse downstream benchmarks, which spanned categories, such as environmental sounds and other common audio domains. Complementary linear probing evaluation mirrors this pattern, yielding clear gains on several benchmarks and strong performance elsewhere. The collective analysis of these results indicates that AaPE serves to mitigate the effects of aliasing without discarding of informative high-frequency content. Index T erms--Self-supervised learning, masked audio modeling, transformers, aliasing, structured state-space models. ECENT advances in natural language processing (NLP) and computer vision demonstrate the effectiveness of self-supervised learning (SSL), thereby training neural networks from unlabeled data via auxiliary objectives.

aape, aliasing-aware patch embedding, international conference, (14 more...)

arXiv.org Machine Learning

Dec-4-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Asia > Japan
  - Honshū > Kantō
    - Tokyo Metropolis Prefecture > Tokyo (0.14)
    - Saitama Prefecture > Saitama (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found