Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance