Active Testing of Large Language Model via Multi-Stage Sampling