How to select a machine learning model for your particular task
What I want to talk about today is - how to select a machine learning model for your particular task and why simple models would come in handy much more often then the complex ones. Usually there are 2 types of problems, that could be solved with the help of machine learning: classification and regression (not going to dive deeper into segmentation or extraction problems today).
So my personal rule of thumb is to start with the simplest models, such as: linear regression, when it comes to regression problems or naive Bayes/logistic regression when it come to classification problems. Why so? There is no much correlation between complex models and a better performance. Often simple models would outperform even deep learning models, especially with a good fine-tuning and boosting techniques. But when starting with a simple models you would greatly save your time, because simple models usually don't need large datasets to train and validate, don't need much time to fine-tune them, they are much easier to implement. For example: linear regression. Let's say, you have some features and you need to find out function of a dependent continuous variable (price, income etc.). This model is so simple: you need to find coefficients, let's say Q so that performing dot product on them by your features X would lead to a nicely fitted strait line to the real value of your dependent variable Y. All the training of linear regression does is finding this coefficients Q. You can find them with the help of gradient descent, or if you don't have much features - normal equation. This is it's formula:
Dot product of X transposed by X, then inverse the result and apply dot product on this result by product of X transposed by y(real values of dependent variable) Q = (XTX)-1XTY
🔍 Sr. Data Engineer (6+ Years of Experience) ❯ Data Lakes & Warehousing ❯ Big Data & ETL ❯ PySpark ❯ SQL ❯ Python ❯ Kafka ❯ Databricks ❯ AWS & Azure ❯ Writes @ BigDataLad.com
5yA great read. I do this stuff similary. However, I personally prefer to go 'a little' extra and check the model complexity. It often proves beneficial.