Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and Toolbox