Learning with Average Top-k Loss

Fan, Yanbo, Lyu, Siwei, Ying, Yiming, Hu, Baogang

Neural Information Processing Systems 

In this work, we introduce the average top-$k$ (\atk) loss as a new ensemble loss for supervised learning. The \atk loss provides a natural generalization of the two widely used ensemble losses, namely the average loss and the maximum loss. Furthermore, the \atk loss combines the advantages of them and can alleviate their corresponding drawbacks to better adapt to different data distributions. We show that the \atk loss affords an intuitive interpretation that reduces the penalty of continuous and convex individual losses on correctly classified data. The \atk loss can lead to convex optimization problems that can be solved effectively with conventional sub-gradient based method.