From Promising Capability to Pervasive Bias: Assessing Large Language Models for Emergency Department Triage