Key Pitfalls in Machine Learning
While performing EDA, most of us are aware of basic checks that needs to be performed, but we often neglect to thoroughly investigate imbalanced datasets which can significantly impact the performance of our machine learning models.
Mentioning few important pitfalls in machine learning: Categorizing them based on different stages of the machine learning process: DATA, MODEL and Deployment
DATA: Your Data shouldn't have any of the following:
1. Insufficient or too little data: Leading to overfitting or underfitting
2. Noisy: Containing incorrect or inconsistent data, which can distort the model's ability to learn patterns and make accurate predictions.
3. Biased Data: Containing unequal representation of different classes or groups, leading to biased predictions (Domingos, 2012).
MODEL:
Your model should be simple enough to not suffer from overfitting or underfitting, but complex enough to capture the underlying patterns in the data. We have to make model's life easy. This includes:
1. Choosing the wrong model: Selecting a model that is not suitable for the data and task at hand, which can result in poor performance.
2. Improper model evaluation: Using improper evaluation metrics or not properly validating the model can lead to inaccurate performance estimation and potential failure in real-world scenarios.
Deployment:
Even if you have a good model and clean data, there are still potential pitfalls in the deployment phase:
Through testing, monitoring and continuous improvement of the deployed machine learning model is crucial. This ensures that the model's performance remains optimal over time and avoids any potential issues or errors in its predictions.