Reviews: Alternating optimization of decision trees, with application to learning sparse oblique trees

Neural Information Processing Systems 

Their method requires an initial decision tree. The topology of this tree will be fixed, and only the decision rules at each node will be adjusted. The idea behind the proposed adjustment is based on the observation that, fixing all of the parameters of all the nodes except the parameters of node i, the likelihood function for the whole tree reduces to the likelihood function of a simple K-classes classifier. This simple classifier can be trained efficiently (using existing techniques) and doing so will always guarantee that the overall loss will decrease when compared to the loss for the initial decision tree.