A Multi-agent Large Language Model Framework to Automatically Assess Performance of a Clinical AI Triage Tool