Decision Tree Interview Q&A

Decision Tree Interview Q&A

1. Disadvantages of Decision Trees:

Overfitting: common problem with Decision Trees.

  • DT generally leads to overfitting of data which ultimately leads to wrong predictions for testing data points.
  • DT keeps generating new nodes in order to fit data including even noisy data and ultimately tree becomes too complex to interpret.
  • in this way, it loses its generalisation capabilities. so it performs well on training dataset but starts making a lot of mistakes on test dataset.

High variance: due to overfitting, there is more likely a chance of high variance in output which leads to many errors in final predictions and shows high inaccuracy in results. So, in order to achieve zero bias (overfitting), it leads to high variance due to bias-variance tradeoff.

Unstable: adding new data points DT can lead to regeneration of overall tree. so all nodes need to be recalculated and reconstructed

Not good if dataset is large: single tree may grow complex and lead to overfitting in this case, we should use Random Forest instead, an ensemble technique of a single Decision Tree.


2. Advantages of Decision Trees:

Easy to understand & Clear Visualisation: DT is simple to interpret and visualise, dt idea is mostly used in our daily lives.

  • output of a Decision Tree can easily be interpreted
  • DT works in same manner as simple if-else statements which are very easy to understand
  • DT can be used for both classification and regression problems
  • DT can handle both continuous and categorical variables

Feature Scaling not Required: standardisation and normalisation in case of Decision Tree not required, as DT uses a rule-based approach instead of calculation of distances

Handles Nonlinear parameters efficiently: Unlike curve-based algorithms, performance of decision trees can’t be affected by Non-linear parameters.

  • if there is high non-linearity present between independent variables, Decision Trees may outperform as compared to other curve-based algorithms

Can automatically handle Missing Values

Handles Outliers Automatically usually robust to outliers

Less Training Period training period of decision trees is less as compared to ensemble techniques like Random Forest because it generates only one tree unlike forest of trees.


3. Cases where  Decision Trees are most suitable

  • DT are most suitable for tabular data
  • when output is discrete
  • when explainability is required
  • when training data may contain errors, noisy data(outliers)
  • when training data may contain missing values

In healthcare industries:

  • can tell whether a patient is suffering from a disease or not based on conditions such as age, weight, sex and other factors
  • in deciding effect of medicine based on factors such as composition, period of manufacture
  • diagnosis of medical reports

In banking sectors:

  • loan eligibility based on financial status, family member, salary of an individual
  • credit card frauds
  • loan defaults

In educational Sectors :

  • Shortlisting of a student can be decided based upon his merit scores, attendance, overall score


4. Decision Tree handles continuous(numerical) features

  • DT handle continuous features by converting these continuous features to a threshold-based boolean feature.
  • to decide threshold value, we use concept of Information Gain, choosing that threshold that maximizes information gain


5. Feature Selection using Information Gain/Entropy Technique

  • aim of feature selection while building a DT is to select features or attributes (Decision nodes) which lead to a split in children nodes whose combined entropy adds up to lower entropy than entropy value of data segment before split. This implies higher information gain


6. Attribute selection measures

Information Gain: biased towards multivalued attributes

Gain ratio: prefers unbalanced splits in which one data segment is much smaller than other segment

Gini Index: biased to multivalued attributes, has difficulty when number of classes is large, tends to favour tests that result in equal-sized partitions and purity in both partitions


7. Requirement in Pruning in Decision Trees

  • after creating a Decision Tree we observe most of time leaf nodes have very high homogeneity i.e., properly classified data. However, this also leads to overfitting. Moreover, if enough partitioning is not carried out then it would lead to under-fitting
  • major challenge that arises is to find optimal trees which result in appropriate classification having acceptable accuracy. So to cater to those problems we first make decision tree and then use error rates to appropriately prune trees


8. Types of Pruning in a Decision Tree

When we remove sub-nodes of a Decision node, this process is called pruning or opposite process of splitting.

Two techniques which are widely used for pruning are - Post and Pre Pruning

Post Pruning:

  • used after construction of DT
  • when DT will have a very large depth and will show overfitting of model
  • also known as backward pruning
  • is used when we have an infinitely grown Decision Tree

Pre Pruning:

  • used before construction of DT
  • can be applied using hyperparameter tuning
  • helps in overcoming overfitting issue


9. Properties of Gini Impurity

Let X (discrete random variable) takes values ₊ve and ₋ve (two classes). Now, let’s consider different cases:

Case- 1:  When 100% observations belong to ₊ve . Then, Gini impurity of system would be: – Gin(x) = 1 - (1^2 + 0^2) = 0

Case- 2:  When 50% observations belong to y₊ . Then, the Gini impurity of the system would be: – Gin(x) = 1 - (0.5^2 + 0.5^2) = 0.5

Case- 3:  When 0% observations belong to ₊ve . Then, Gini impurity of the system would be: – Gin(x) = 1 - (0^2 + 1^2) = 0

No alt text provided for this image


10. Disadvantages of Information Gain

  • Information gain is defined as reduction in entropy due to selection of a particular attribute. Information gain biases Decision Tree against considering attributes with a large number of distinct values which might lead to overfitting
  • In order to solve this problem, Information Gain Ratio is used


11. Decision Tree handling missing values

Decision Trees handle missing values in following ways:

  • it fills missing attribute value by most common value of that attribute
  • it fills missing value by assigning a probability to each of possible values of attribute based on other samples


12. Decision Tree handling continuous(numerical) features

  • DT handle continuous features by converting these continuous features to a threshold-based boolean feature
  • to decide threshold value, we use concept of Information Gain, choosing that threshold that maximizes information gain


13. Inductive Bias of Decision Trees

  • ID3 algorithm preferred Shorter Trees over longer Trees
  • in DT, attributes having high information gain are placed close to root are preferred over those that do not


14. CART and ID3 difference

  • CART algorithm produces only binary Trees: non-leaf nodes always have two children
  • on contrary, other Tree algorithms such as ID3 can produce DT with nodes having more than two children


15. Gini impurity and Entropy which one to prefer

  • most of time it does not make a big difference
  • both lead to almost similar Trees
  • Gini impurity is a good default while implementing in sklearn since it is slightly faster to compute. However, when they work in a different way, then Gini impurity tends to isolate most frequent class in its own branch of the Tree, while entropy tends to produce slightly more balanced Trees


16. Reasons why decision tree accuracy may go low

Bad Data :- very important to use correct data for machine learning algorithms

Randomness :- most of time system is so complex that it is impossible to predict what will happen in future. In such cases, accuracy of decision tree will drop as well

Overfitting :- DT may not be able to capture uniqueness of data, and so it can be considered as a generalisation. If same data is used to adjust tree, it can over-fit data


17. Improving decision tree

  • ensuring that stoping criteria is always explicit
  • when stop criteria is not explicit it leaves one wondering if further exploration is necessary, and also leaves doubts about whether one should stop or not
  • DT should also be constructed in such a way that it becomes easy to follow and not confusing for business


18. Linear Regression and Decision Trees comparison

  • Linear Regression --> used to predict continuous outputs where there is a linear relationship between features of dataset and output variable
  • Decision Trees --> work by splitting dataset, in a tree-like structure, into smaller and smaller subsets and make predictions based on which subset new example falls into
  • Linear Regression --> used for regression problems where it predicts something with infinite possible answers such as price of a car
  • Decision Trees can be used to predict both regression and classification problems
  • Linear Regression --> prone to under-fitting data. Switching to polynomial regression will sometimes help in countering under-fitting.
  • Decision Trees are prone to overfit data. Pruning helps with overfitting problem

No alt text provided for this image


19. Greedy Splitting or Recursive Binary Splitting procedure

  • all features are considered and different split points are tried and tested using a cost function. Split with best cost (or lowest cost) is selected
  • all input variables and all possible split points are evaluated and chosen in a greedy manner (choosing lowest value of cost possible)


20. Post-pruning & Pre-pruning(early-stopping) methods

Pruning involves cutting back tree. After a tree has been built, it might overfit data. There are many ways to prune a tree, some of which are:

  • Minimum Error: tree is pruned back to point where cross-validation error is minimum
  • Smallest Tree: tree is pruned back slightly further than minimum error

Pre-pruning is also known as early-stopping. overfitting of data may also be prevented by stopping tree-building process early (before it produces leaf nodes with very small samples). Pre-pruning can under-fit data by stopping too early. Method to pre-prune a tree:

  • At each stage of splitting tree, cross-validation error can be checked. If error does not decrease significantly enough then process can be stopped.
  • Pruning and pre-pruning can both be used together, separately, or not at all.
  • Post-pruning is more mathematically rigorous when compared to pre-pruning


🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆🏆

Happy Learning😊

To view or add a comment, sign in

More articles by Mukesh Manral🇮🇳

Insights from the community

Others also viewed

Explore topics