High Number of Unique Values and Tree-Based Models

By Medium - 2021-03-12

Description

Having columns of data with high cardinality can adversely affect the performance of your models. The idea of this article stemmed from my personal experience of employing tree based solutions in…

Summary

  • How high cardinality affects your CART performance and interpretation Having columns of data with high cardinality can adversely affect the performance of your models.
  • By extracting all the nodes where occupation was used for splitting the node, we can see at which points data was used for splitting from the table on the right.
  • 0.5, 2.5, 3.5, 5.5, 8.5, 9.5, 10.5 Mapping it to the encoding table shows the decision boundaries that has been used throughout the tree to classify the data points.
  • Try different methods of encoding.

 

Topics

  1. Backend (0.25)
  2. Machine_Learning (0.22)
  3. Database (0.12)

Similar Articles

Regression for Imbalanced Data with Application

By Medium - 2020-07-17

Imbalanced data are the situation where the less represented observations of the data are of the main interest. In some contexts, they are expressed as “outliers” which is rather more dangerous. As a…