too-clever ranking method
A Too-Clever Ranking Method
I developed what I thought was an extremely clever method for detecting "bad" training instances. Each instance was scored, and those with the lowest scores could be removed before running C4.5 to build a decision tree with the remainder. I ran an experiment in which I removed the bottom 10 percent of the instances in a University of California, Irvine (UCI) data set. The resulting tree was smaller and more accurate (as measured by 10-fold CV) than the tree built on the full data set. Then I removed the bottom 20 percent of the instances and got a tree that was smaller than the last one and just as accurate.
A Too-Clever Ranking Method
Oates, Tim (University of Maryland Baltimore County)
I developed what I scored, and those with the lowest scores could be removed before running C4.5 to build a decision tree with the remainder. I ran an experiment in which I removed the bottom 10 percent of the instances in a University of California, Irvine (UCI) data set. The resulting tree was smaller and more accurate (as measured by 10-fold CV) than the tree built on the full data set. Then I removed the bottom 20 percent of the instances and got a tree that was smaller than the last one and just as accurate. At that point I had the feeling that this was going to make a great paper for the International Conference on Machine Learning (ICML).
- North America > United States > California > Orange County > Irvine (0.26)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.06)
- North America > United States > Maryland > Baltimore County (0.06)
- North America > United States > Maryland > Baltimore (0.06)