As I’m working on a Decision Tree tutorial, I picked up the foundational text: Classification and Regression Trees by Breiman, Friedman, Stone, and Olshen. In a nutshell, this book is a math heavy, history lesson on the invention of the decision tree algorithms.
[table]
,
Get this book if…, You want to learn more about the invention of decision trees.
Don’t get this book if…, You’re looking for an easy intro to decision trees.
[/table]
I’m already familiar with how a decision tree works but I wanted to really master the concepts and understand some (though far from all) of the theoretical side. The CART book has theory in spades. If you wanted to implement a decision tree algorithm yourself, this book would help you understand the how and why of each step.
In fact, the rpart R package is based on the functionality of this book! One of my professors at DePaul really encouraged us to fully understand what the software is doing. I really appreciate that idea now that I’ve read through the CART book.
My Takeaways
Post-Pruning is Better than Pre-Pruning: The authors emphasize that it’s better to grow a large tree and then prune branches. That can include:
- Using cross-validation or test sample estimates to remove branches that reduce model accuracy.
- Remove redundant branches.
Original Algo Had More Options: Very cool to see that the authors were looking at using linear combinations (e.g. VarX * 0.75 + VarY * 1.32 >= #) as part of the search for split criteria. There is also an option for combining categorical variables in one step. However, these ideas are not realized in the rpart package.
Cost Complexity: This is something I have not seen in many other Decision Tree references. The Cost Complexity measures the misclassification rate of potential branches plus the number of branches times some scale variable. This is useful in pre-pruning (i.e. stopping the tree growth before it gets too complex).
Regression Trees Should Have Large Terminal Nodes: I thought this was interesting! If your final nodes are small, you’re more likely to have node that is biased by a single outlier.
Bottom Line: Insightful book if you’re already familiar with Decision Trees and you want to learn how they work and are comfortable with interpreting mathematical notation.