Notice that, in the GO datasets, it is commonto have n > 10, given that the hierarchies12
–Neural Information Processing Systems
While the max5 function has already been used, nobody so far has shown how to deploy iteffectively, which is what the com-6 bination ofMCM and MCLoss does. In general, consider a classA with ancestors A1...An in the hierar-8 chy. The highern the more likely it is that a neural network (NN) withMCM trained withL will remain stuck9 in bad local optima. Figure 1: From left to right: (i) rectangles disposition, (ii) decision boundaries forA5 of h+MCM trained withL, and(iii)decisionboundariesforA5 ofC-HMCNN(h). Thus,allpoints19 in rectangle 3 belong to all classes, and if a datapoint20 belongs to a rectangle, then it also belongs to classA5.21
Neural Information Processing Systems
Feb-8-2026, 19:59:26 GMT
- Technology: