0e0157ce5ea15831072be4744cbd5334-Supplemental-Conference.pdf
–Neural Information Processing Systems
Ep denotes the total number of epochs needed to fine-tunemodeloverthedataset. Consequently,itcan be minimized using asecond-order Newtonmethod. Wecan also detect some qualitativedifferences inthe attention maps atdifferent resolutions: The entropy, i.e. how much the attention is concentrated or spread across different tokens, changes significantlybetweenlevels. Hence, the similarity between consecutive representations is expected to be strong. On the other hand, when only looking atCascadeXML'spoints (inblue) inFigure 4,weobservethat thetasks in the first meta-classifier and the extreme classifier are substantially different. As shown in Table 11, the shortlisting achieves very good recall rates.
Neural Information Processing Systems
Feb-7-2026, 11:22:33 GMT
- Technology: