Optimizing Speech Language Models for Acoustic Consistency