Evaluating Artificial Systems for Pairwise Ranking Tasks Sensitive to Individual Differences