KSL-92-01
## Probability Estimation for Classification Trees

**Reference: **
Walker, M. G. Probability Estimation for Classification Trees. Knowledge Systems Laboratory, January, 1992.

**Abstract:** Consider a problem in which you wish to know the conditional probability that
an object belongs to a class. For example, you may wish to know the
conditional probability that you will survive open-heart surgery, given your
age and blood pressure, or you may wish to know the conditional probability
that a peptide binds to a receptor, given the amino-acid composition of the
peptide. You could begin by generating a classification tree or a neural
network to determine partitions in feature space and class assignments for
each partition. The conventional approach to estimating the conditional
probabilities of class membership in the partitions is to tabulate the data
points that are correctly and incorrectly classified in each partition (a
resubstitutional estimate). Unfortunately, this resubstitution method often
gives conditional probability estimates that are insufficiently accurate, and
thus can lead to incorrect decisions.
I have implemented and compared alternative methods for estimating conditional
probability in classification trees. These alternative methods use
proportional error assignment, repeated cross-validation, or bootstrapping.
In Monte Carlo simulations with synthetic data sets, the alternative methods
are substantially more accurate than in resubstitiution. Breiman's method
modified to use a repeated cross-validation estimate of the global
misclassification rate is most accurate overall. An exception is that, for
data sets with low Bayes' error (less than 0.1), either a local bootstrap
0.632 estimate or Breiman's method modified to use a bootstrap estimate of the
global misclassification rate is most accurate, although the Breiman estimate
using repeated cross-validation is quite competitive for these distributions.

*Jump to*...
[KSL]
[SMI]
[Reports by Author]
[Reports by KSL Number]
[Reports by Year]

Send mail to:
ksl-info@ksl.stanford.edu to send a message to the maintainer of the
KSL Reports.