Our MLOps journey
This article is a continuation of the previous article on Overview of MLOps. Here we will go through the details on how at CredAvenue we have started our journey with MLOps.
Quick recap
As a quick recap typical development of ML models involve two phases
ML Ops attempts to solve different issues occur in both the phases,
Where to start?
As we have seen in the previous article there are different areas to cover, different components to add to the ecosystem and various options available to set up each component. Given the complexity involved, what is the ideal place to get started and how to get started?
In this section we will discuss precisely on where as a team we have started the journey at CredAvenue.
To jump straight into the details, we have started with exploring a tool that can help us in tracking and versioning of our models. We have explored various tools and options and narrowed down MLFlow as an option to start with. After performing careful evaluation and comparing against few other options.
Before starting discussing MLFlow in detail, let’s try to understand why it makes sense to start from here. For this purpose let’s understand a bit on what kind of challenges and issues, lack of such a tool will create.
As can be seen some of these would become really crucial and critical issues. If not addressed with the right set of tools it will result in a huge gap in the entire ecosystem.
MLFlow Overview
MLFlow is an open source platform for managing the machine learning lifecycle. It provides the following capabilities,
As can be seen in the picture above MLFlow has a centralized tracking server backed by some data stores,
From the model development environment with very simple configurations we can start pointing to a MLFlow tracking server as can be seen below.
mlflow.set_tracking_uri("http://YOUR-SERVER:4040") mlflow.set_experiment("my-experiment")
Different iterations and runs of models are organized in the form of experiments. Every run or iteration of a given model is tracked through a uuid as a unique version.
All the parameters that are configured to be logged will be stored as part of the tracking server for reporting purposes. As can be seen in the picture above these can be either the hyperparameters or the metrics that measure the accuracy of the models. With simple log statements as can be seen below, we can make them part of the tracking reports automatically.
mlflow.log_param("alpha", alpha
mlflow.log_param("l1_ratio", l1_ratio)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("r2", r2)
mlflow.log_metric("mae", mae)
mlflow.log_metric("mape", mape))
The model binaries will be stored inside the tracking server. The model artifact stored inside the tracking server will look like the following. It will contain,
As can be seen the content of the yaml file will have the dependencies required to deploy and run the model.
This model artifact can be deployed either by using the mlflow serve component. It can also be exported as a container and run any container runtime as well.
With mlflow serve it can be deployed with a very simple command like the following,
mlflow models serve -m file:///<path-to-model-artifact>/mlruns/0/7f48936c8f4e440a9955af91a3f215db/artifacts/model
With all the dependencies packaged under a container the same model can be deployed under any container runtime like docker, kubernetes, ECS, etc.
When the model is deployed using this approach, it comes out with a REST API wrapper automatically. Without doing any hand coding of loading the pkl file and creating a REST API wrapper on top of it. This is a huge benefit in terms of avoiding a lot of repetitive work for every possible model that we deploy in production.
Now to choose the problem of which version of the model to be deployed, we can refer to the tracking reports as can be seen above. We can choose the model version with the best possible values of mae, rmse, etc. and deploy the same using this approach.
Advantages of MLFlow
As can be seen in the previous section, MLFlow acts as a platform that can manage the end to end lifecycle of ML model development. Starting with developing the models, versioning, tracking, exporting model binaries and deployment.
Without the use of such a tool,
Recommended by LinkedIn
MLFlow as a tool helps us solve all of these problems.
Team collaboration
As can be seen below several developers can collaborate together in a common experiment. The moment all of them point to a common centralized tracking repository, their work will be available to be shared among them and other parts of the team.
Supported libraries & flavours
On top of that MLFlow supports models developed using several possible libraries as well as can be seen below as per the documentation here. That means as a developer you won’t be restricted to the use of any specific libraries when you train models.
Similarly it supports a variety of data stores onto which the model artefacts can be stored and loaded from.
Easier integration and deployment
It also has support for integration with several deployment environments.
Using the build docker option as follows, the model binaries can be exported as docker containers and deployed on top of any docker compatible container runtime.
mlflow models build-docker -m "runs:/some-run-uuid/my-model" -n "my-image-name"
docker run -p 5001:8080 "my-image-name"
It also has options like the following to build containers that are compatible with Sagemaker for example.
mlflow sagemaker build-and-push-container
These options make the integration and deployment of the models with different tools really seamless.
Extension for non-standard ML models
Another important feature that MLFlow provides is that along with any standard ML models like regression, classification, etc. any simple/complex statistical functions can also be exported as a model binary. The same can be deployed like any other standard ML models.
This will be a really useful feature when there are a lot of statistical models being used along with the standard ML models. As can be seen below any model class with an implementation of a predict function can be easily exported and exposed through the same pipelines.
class AddN(mlflow.pyfunc.PythonModel)
def __init__(self, n):
self.n = n
def predict(self, context, model_input):
return model_input.apply(lambda column: column + self.n)
Sagemaker for deployment
Amazon Sagemaker is a fully managed service from AWS that supports building, training and deploying of ML models. Where the models can be deployed and scaled automatically on the need basis. As the underlying container runtime is fully managed by Sagemaker.
On top of this Sagemaker also provides so many other capabilities like,
In our case we have made use of Sagemaker as an option to deploy our models with an automated pipeline for now.
While model development, tracking & versioning, etc. are handled through MLFlow, the deployment of model binaries are handled through Sagemaker.
Integrated ML Pipeline
Our objective when we started this journey was to get to this fully automated and integrated ML pipeline. With very minimal manual intervention required.
As can be seen in the picture below, we have achieved precisely that with the use of tools like MLFlow, Sagemaker and Jenkins. As a result we have taken away several touch points of manual interventions that were required to take a model and integrate it into a production system.
Following would be the benefits that we get out of this integrated pipeline,
What next?
While we have tackled several aspects of the MLOps pipeline, there are few more areas where we need to add further capabilities.
Following are some of these areas which we would be exploring further here,
Founder, CEO at Saturam | Advanced Data Engineering | Data Reliability on Cloud
3yGood Articulation of the value creation with ML Ops Automation/pipelines