The Self-Execution Benchmark: Measuring LLMs' Attempts to Overcome Their Lack of Self-Execution

Open in new window