Systematic Characterization of the Effectiveness of Alignment in Large Language Models for Categorical Decisions