Large Language Models Often Know When They Are Being Evaluated