Evaluating the Goal-Directedness of Large Language Models