Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems 

They motivate their approach by first showing that under some assumptions, the discriminant function over a fully connected graph on the labels can be expressed as the uniform expectation of the discriminant functions over random spanning trees. Through a sampling result, they then show that with high probability over samples of random spanning trees, there is a conical combination of these trees which achieve a substantial fraction of the margin of a predictor which uses the complete graph, and then prove a related risk bound for conical combination over random trees. This motivates to optimize the margin for conical combination of trees as predictors, and the author proposes a primal (and dual) formulation for this optimization problem (somewhat analog to the structured SVM), for which a standard dual subgradient method is proposed as in previous work. They then show that the maximizing joint label for the combination of trees (inference) can be done exactly (under an assumption that be checked at run-time) by looking through the K-best list for each spanning tree (the latter can be obtained by dynamic programming, as was already mentioned in Tsochantaridis et al. [JMLR 2005]). Experiments on standard multilabel datasets show a small improvement over alternative methods.