Goto

Collaborating Authors

 gcforest


Adaptive Generation Model: A New Ensemble Method

Ruan, Jiacheng, Li, Jiahao

arXiv.org Machine Learning

As a common method in Machine Learning, Ensemble Method is used to train multiple models from a data set and obtain better results through certain combination strategies. Stacking method, as representatives of Ensemble Learning methods, is often used in Machine Learning Competitions such as Kaggle. This paper proposes a variant of Stacking Model based on the idea of gcForest, namely Adaptive Generation Model (AGM). It means that the adaptive generation is performed not only in the horizontal direction to expand the width of each layer model, but also in the vertical direction to expand the depth of the model. For base models of AGM, they all come from preset basic Machine Learning Models. In addition, a feature augmentation method is added between layers to further improve the overall accuracy of the model. Finally, through comparative experiments on 7 data sets, the results show that the accuracy of AGM are better than its previous models.


An Adaptive Weighted Deep Forest Classifier

Utkin, Lev V., Konstantinov, Andrei V., Chukanov, Viacheslav S., Kots, Mikhail V., Meldo, Anna A.

arXiv.org Machine Learning

A modification of the confidence screening mechanism based on adaptive weighing of every training instance at each cascade level of the Deep Forest is proposed. The idea underlying the modification is very simple and stems from the confidence screening mechanism idea proposed by Pang et al. to simplify the Deep Forest classifier by means of updating the training set at each level in accordance with the classification accuracy of every training instance. However, if the confidence screening mechanism just removes instances from training and testing processes, then the proposed modification is more flexible and assigns weights by taking into account the classification accuracy. The modification is similar to the AdaBoost to some extent. Numerical experiments illustrate good performance of the proposed modification in comparison with the original Deep Forest proposed by Zhou and Feng.



Off the Beaten path – Using Deep Forests to Outperform CNNs and RNNs

@machinelearnbot

Summary: How about a deep learning technique based on decision trees that outperforms CNNs and RNNs, runs on your ordinary desktop, and trains with relatively small datasets. This could be a major disruptor for AI. Suppose I told you that there is an algorithm that regularly beats the performance of CNNs and RNNs at image and text classification. Well there is one just announced by researchers Zhi-Hua Zhou and Ji Feng of the National Key Lab for Novel Software Technology, Nanjing University, Nanjing, China. This is the first installment in a series of "Off the Beaten Path" articles that will focus on methods that advance data science that are off the mainstream of development.


Deep Forest: Towards An Alternative to Deep Neural Networks

Zhou, Zhi-Hua, Feng, Ji

arXiv.org Machine Learning

In this paper, we propose gcForest, a decision tree ensemble approach with performance highly competitive to deep neural networks. In contrast to deep neural networks which require great effort in hyper-parameter tuning, gcForest is much easier to train. Actually, even when gcForest is applied to different data from different domains, excellent performance can be achieved by almost same settings of hyper-parameters. The training process of gcForest is efficient and scalable. In our experiments its training time running on a PC is comparable to that of deep neural networks running with GPU facilities, and the efficiency advantage may be more apparent because gcForest is naturally apt to parallel implementation. Furthermore, in contrast to deep neural networks which require large-scale training data, gcForest can work well even when there are only small-scale training data. Moreover, as a tree-based approach, gcForest should be easier for theoretical analysis than deep neural networks.


Discriminative Metric Learning with Deep Forest

Utkin, Lev V., Ryabinin, Mikhail A.

arXiv.org Machine Learning

A Discriminative Deep Forest (DisDF) as a metric learning algorithm is proposed in the paper. It is based on the Deep Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest modification. The case of the fully supervised learning is studied when the class labels of individual training examples are known. The main idea underlying the algorithm is to assign weights to decision trees in random forest in order to reduce distances between objects from the same class and to increase them between objects from different classes. The weights are training parameters. A specific objective function which combines Euclidean and Manhattan distances and simplifies the optimization problem for training the DisDF is proposed. The numerical experiments illustrate the proposed distance metric algorithm.


A Siamese Deep Forest

Utkin, Lev V., Ryabinin, Mikhail A.

arXiv.org Machine Learning

A Siamese Deep Forest (SDF) is proposed in the paper. It is based on the Deep Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest modification. It can be also regarded as an alternative to the well-known Siamese neural networks. The SDF uses a modified training set consisting of concatenated pairs of vectors. Moreover, it defines the class distributions in the deep forest as the weighted sum of the tree class probabilities such that the weights are determined in order to reduce distances between similar pairs and to increase them between dissimilar points. We show that the weights can be obtained by solving a quadratic optimization problem. The SDF aims to prevent overfitting which takes place in neural networks when only limited training data are available. The numerical experiments illustrate the proposed distance metric method.