Playing with Continuous uncertainty in Decision Trees • /r/MachineLearning
Classically, for decision trees we define a split or various "buckets" to transform continuous data into discrete data. The data I am currently processing has uncertainty associated with it (each data point comes from an aggregate set). As such, I might define a boundary- let's say N, where a data's uncertainty could place it in multiple buckets (say the parameter value N? Normally these boundaries are binary, but I was considering using the probability of these'overlapping instances' towards both buckets weighted by their respective probabilities. This doesn't seem to violate the entropy term (total probability will still sum to 1). However, I can't place half an instance within a branch- which would destroy the meaning behind the term.
May-1-2016, 00:19:34 GMT
- Technology: