SOCK: A Benchmark for Measuring Self-Replication in Large Language Models

Open in new window