Naive Data Preparation
One of the common mistakes students make in my machine learning courses is in the data preprocessing stage. They usually neglect an important concept in data preprocessing. Do you see a difference in these two pieces of code? I will be happy if I know your opinion about this. I will give an explanation in the comments section soon.
Data Science | Applied AI | Machine Learning | LLMs & Generative AI | Expert in Python, MLOps, Docker and Kubernetes
2yIt can be seen in the second code, the calculation of the minimum and maximum value for each input variable is calculated using only the training dataset instead of the entire dataset. This avoids Data Leakage. This can lead to overly optimistic results that do not replicate on future data points.