Scaling Up Active Testing to Large Language Models