JEPA as a Neural Tokenizer: Learning Robust Speech Representations with Density Adaptive Attention