Effective Proxy for Human Labeling: Ensemble Disagreement Scores in Large Language Models for Industrial NLP