Definition:

    • Each node test a feature (attribute) which leads to other attributes
    • Leaf nodes are final decision
  • With all the features, a hypothesis is the shape of the tree such minimizes the training error

Top-Down Induction of Decision Trees:

  • DTtrain(examples for CurrentNode, features at CurrentNode):
        Find F, best decision feature for CurrentNode
        For each value of F, create new child node
      	  Assign training examples to child node
      	  If traning examples perfectly classified
      		  Stop
      	  Else
      		  Recursively apply DTtrain over this new child node
  • We find F, best decision feature for CurrentNode by choosing the attribute with
    • Best classification accuracy:
      • Find the feature with the accuracy out of all features
        • accuracy of feature A’s edge 1 = nb of majority label / total labels
        • accuracy of feature A =
    • ID3: Highest information gain with Conditional Entropy
      • Pick feature with the most Information Gain conditioned on that feature to split the examples at each iteration

Inductive bias in decision tree learning

  • The learning algorithm performs heuristic search through the space of all decision trees
  • It stops at the smallest, acceptable tree
  • Occam’s Razor: prefer the simplest hypothesis that fits the data
  • However, a short hypothesis that fits the data is less likely to be a statistical coincidence