Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment