From the course: Recommendation Systems: A Practical Hands-On Introduction

The machine learning lifecycle

- During this course, we have drawn a beautiful puzzle. Now let's put all the pieces together. Imagine your task with a creation of our recommendation system from scratch. I'm going to give you a recipe of what you can do. The first thing I will do before any coding is to set the business objective and review the dependencies. The business objective is defining the value you expect to achieve for the business. For example, it could be increasing the conversion rate by 10%, or increasing monthly active usage by 1%. Remember, you'll be rewarded for providing value to your team, not for building a complex recommendation system. Then I will check the dependencies, both technical and human. Technical dependencies are all the requirements needed to successfully deploy the reco. For example, do you have access to usage data? Do you have a compute to do experiments? Do you have a way to deploy your solution and inject recommendations in the front end of your business? Do you have a way to measure the impact of your recommender? Now, human dependencies are all the teams and people that you rely on to build the solution. Make sure they are aligned with your project. You know, the most common mistake of smart engineers is building something that is not going to be used. Don't make that mistake. Now, the first reco I will build is a minimalistic, like GBM, based solution. Let me explain. What most people will do is to spend two to four weeks building a data pipeline, then another two to four weeks trying different algorithms, and then another two weeks doing the deployment. There is a better approach, which is working in end-to-end iterations. You want to build a complete version of a simple system. That first reco is a very simple data pipeline. Create some basic features with lead GBM. Maybe don't use the whole data set if you have gigabytes of data, and then productionize using the batch architecture. The deployment is an AB test with your recommender in one of the treatments and a random model in the other one. That's it. You test that in production and that is your version one. The development of this minimalistic reco should take you less than a week. Being able to align all your team and actually test the system in production will tell you more based on your dependencies. In less than a month, you should have your first reco in production, and maybe you start generating measurable value, for your company. The next thing I will do is to create a machine learning lifecycle solution. You attest in the data pipeline and in the deployment. You create Python libraries with utilities that support your code, and use Jupyter Notebooks as the canvas to do your experiments. Do not deploy the notebooks. Instead, extract functions and create scripts to do it. Jupyter notebooks are great tools for experimentation and data visualization, but not for deployment. A strong ML lifecycle is when your solution can work daily with minimal errors. That allows you to test very quickly new algorithms. This is version two of your system. You have a complete ML op solution with a simple algorithm. The next thing I will try is different algorithms. SAR is the best candidate to try if you have a lot of usage data. That will be your version three. After that, you can start trying new algorithms. In recommended repository, we have a ton of recommendation system algorithms you can try. You can also add more complex data pipelines and deployment architectures. Have you noticed what we have done? First, we created a minimalistic end-to-end solution and put it in production. We are trying to generate value from month one. Then we created the basic ML ops that allows you to quickly experiment and deploy new solutions, and then we start the experimentation process of algorithms, data and architecture. Work in iterations and you'll be successful.

Contents