AMF: Aggregated Mondrian Forests for Online Learning
Mourtada, Jaouad, Gaïffas, Stéphane, Scornet, Erwan
Introduced by Breiman (2001), Random Forests (RF) is one of the algorithms of choice in many supervised learning applications. The appeal of these methods comes from their remarkable accuracy in a variety of tasks, the small number (or even the absence) of parameters to tune, their reasonable computational cost at training and prediction time, and their suitability in highdimensional settings. Most commonly used RF algorithms, such as the original random forest procedure (Breiman, 2001), extra-trees (Geurts et al., 2006), or conditional inference forest (Hothorn et al., 2010) are batch algorithms, that require the whole dataset to be available at once. Several online random forests variants have been proposed to overcome this issue and handle data that come sequentially. Utgoff (1989) was the first to extend Quinlan's ID3 batch decision tree algorithm (see Quinlan, 1986) to an online setting. Later on, Domingos and Hulten (2000) introduce Hoeffding Trees that can be easily updated: since observations are available sequentially, a cell is split when (i) enough observations have fallen into this cell, (ii) the best split in the cell is statistically relevant (a generic Hoeffding inequality being used to assess the quality of the best split). Since random forests are known to exhibit better empirical performances than individual decision trees, online random forests have been proposed (see, e.g., Saffari et al., 2009; Denil et al., 2013).
Jun-25-2019
- Country:
- Europe (0.28)
- North America > United States (0.28)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Education > Educational Setting > Online (0.50)
- Technology: