A Appendix
–Neural Information Processing Systems
Since P@k treats all the labels equally, it doesn't reveal the performance of the model on tail labels. Amazon-3M using only 4 GPUs as compared to XR-Transformer which leverages 8 GPUs. Table 10: Time taken to train the ensembles of the respective models. X-Transformer and XR-Transformer have been reported using 8 NVidia V100 GPUs. Thus, following XR-Transformer's lead we combine the features trained by CasadeXML with DiSMEC across datasets (Table: 1), we find DiSMEC to be more resource efficient than XR-Linear.
Neural Information Processing Systems
Nov-13-2025, 08:38:09 GMT