Naive Data Preparation

No alt text provided for this image
No alt text provided for this image

One of the common mistakes students make in my machine learning courses is in the data preprocessing stage. They usually neglect an important concept in data preprocessing. Do you see a difference in these two pieces of code? I will be happy if I know your opinion about this. I will give an explanation in the comments section soon.

Arash Bakhtiary

Data Science | Applied AI | Machine Learning | LLMs & Generative AI | Expert in Python, MLOps, Docker and Kubernetes

2y

It can be seen in the second code, the calculation of the minimum and maximum value for each input variable is calculated using only the training dataset instead of the entire dataset. This avoids Data Leakage. This can lead to overly optimistic results that do not replicate on future data points.

To view or add a comment, sign in

More articles by Arash Bakhtiary

Insights from the community

Others also viewed

Explore topics