Scaling behavior of large language models in emotional safety classification across sizes and tasks