Definition:
-
- Each node test a feature (attribute) which leads to other attributes
- Leaf nodes are final decision
- With all the features, a hypothesis h∈H is the shape of the tree such minimizes the training error
Top-Down Induction of Decision Trees:
-
- We find F, best decision feature for CurrentNode by choosing the attribute with
- Best classification accuracy:
- Find the feature with the accuracy out of all features
- accuracy of feature A’s edge 1 = nb of majority label / total labels
- accuracy of feature A = ∑iaccuracy of feature A’s edge i∗P(edge i)
- ID3: Highest information gain with Conditional Entropy
- Pick feature with the most Information Gain conditioned on that feature (maxIG(Xi)) to split the examples at each iteration
Inductive bias in decision tree learning
- The learning algorithm performs heuristic search through the space of all decision trees
- It stops at the smallest, acceptable tree
- Occam’s Razor: prefer the simplest hypothesis that fits the data
- However, a short hypothesis that fits the data is less likely to be a statistical coincidence