SKATE, a Scalable Tournament Eval: Weaker LLMs differentiate between stronger ones using verifiable challenges