Investigating Training and Generalization in Faithful Self-Explanations of Large Language Models

Open in new window