Supplementary Material: Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition

Neural Information Processing Systems 

We organize the supplementary materials as follows: Appendix A: the proofs for Theorem 1. Appendix B: the pseudo-code of the proposed method. Appendix E: more ablation studies on expert learning and the proposed inverse softmax loss. We first recall several key notations and define some new notations. As shown in Eq. (4), the optimization objective of our test-time self-supervised aggregation method Meanwhile, the mutual information between predictions ˆ Y and labels Y can be represented by: I ( ˆ Y; Y) = H ( ˆ Y) H( ˆ Y |Y). In this appendix, we provide more details on experimental settings.