A Bayesian Bradley-Terry model to compare multiple ML algorithms on multiple data sets