Scaling Item-to-Standard Alignment with Large Language Models: Accuracy, Limits, and Solutions