Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability