Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation