EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees