Comparing 179 Machine Learning Categorizers on 121 Data Sets
It is often argued that the algorithm used for machine learning is less important than the amount of data used to train the algorithm (e.g., Domingos, 2012; "More data beats a cleverer algorithm"). In a monumental study, Fernández-Delgado and colleagues tested 179 machine learning categorizers on 121 data sets. They found that a large majority of them, were essentially identical in their accuracy. In fact, 121 of them (that's a coincidence) were within 5 percentage points of one another averaging all of the data sets. The following two graphs show the same data organized either by family (color and order) or by accuracy (order) and family (color). Each family relies on the same core classifiers but may use different parameters or different transformations of the data.
Nov-27-2017, 07:10:09 GMT
- Technology: