Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking

Neural Information Processing Systems 

Large Language Models that are evaluated on multiple metrics.