Hierarchical Multiclass Decompositions with Application to Authorship Determination
El-Yaniv, Ran, Etzion-Rosenberg, Noam
–arXiv.org Artificial Intelligence
This paper is mainly concerned with the question of how to decompose multiclass classification problems into binary subproblems. We extend known Jensen-Shannon bounds on the Bayes risk of binary problems to hierarchical multiclass problems and use these bounds to develop a heuristic procedure for constructing hierarchical multiclass decomposition for multinomials. We test our method and compare it to the well known "all-pairs" decomposition. Our tests are performed using a new authorship determination benchmark test of machine learning authors. The new method consistently outperforms the all-pairs decomposition when the number of classes is small and breaks even on larger multiclass problems. Using both methods, the classification accuracy we achieve, using an SVM over a feature set consisting of both high frequency single tokens and high frequency token-pairs, appears to be exceptionally high compared to known results in authorship determination.
arXiv.org Artificial Intelligence
Oct-11-2010
- Country:
- North America > United States
- California > San Francisco County > San Francisco (0.04)
- Asia > Middle East
- Israel (0.04)
- North America > United States
- Genre:
- Research Report (0.64)
- Technology: