Our MLOps journey
Integrated ML Pipeline

Our MLOps journey


This article is a continuation of the previous article on Overview of MLOps. Here we will go through the details on how at CredAvenue we have started our journey with MLOps.

Quick recap

As a quick recap typical development of ML models involve two phases

  • Training phase: where the model learns from the data
  • Scoring phase: where the model is used for some prediction with the unseen data

ML Ops attempts to solve different issues occur in both the phases,

  • Model versioning & tracking: during the training phase to compare different versions of the models getting trained with the data
  • Model deployment: to streamline the process of exporting the model from the training environment and moving into the runtime for integrating the scoring or prediction with other applications
  • Post deployment monitoring: to monitor the model that is deployed in production and to handle redeploying of the models 

Where to start?

As we have seen in the previous article there are different areas to cover, different components to add to the ecosystem and various options available to set up each component. Given the complexity involved, what is the ideal place to get started and how to get started?

In this section we will discuss precisely on where as a team we have started the journey at CredAvenue

To jump straight into the details, we have started with exploring a tool that can help us in tracking and versioning of our models. We have explored various tools and options and narrowed down MLFlow as an option to start with. After performing careful evaluation and comparing against few other options.

Before starting discussing MLFlow in detail, let’s try to understand why it makes sense to start from here. For this purpose let’s understand a bit on what kind of challenges and issues, lack of such a tool will create.

  • With different teams involved in developing models, lack of a common repository of model binaries would result in a duplication of efforts. 
  • Tracking different versions of the models and their performance would become a huge challenge.
  • With no standardization around model export and binaries, there would be repetitive work involved in getting the models deployed

As can be seen some of these would become really crucial and critical issues. If not addressed with the right set of tools it will result in a huge gap in the entire ecosystem.

MLFlow Overview

MLFlow is an open source platform for managing the machine learning lifecycle. It provides the following capabilities,

  • Ability to track different experiments with different data points, features, hyperparameters, configuration, etc.
  • Ability to package, export the models into deployable binaries
  • Centralized repository for storing and tracking different versions of the model binaries
  • Ability to track and report different performance indicators and parameters of different versions of the models

MLFlow Overview

As can be seen in the picture above MLFlow has a centralized tracking server backed by some data stores,

  • Object stores like S3, minio, etc. to store the model artefacts and binaries
  • RDBMS like MySQL to store some of the model parameters and performance indicators for reporting

From the model development environment with very simple configurations we can start pointing to a MLFlow tracking server as can be seen below.

mlflow.set_tracking_uri("http://YOUR-SERVER:4040") mlflow.set_experiment("my-experiment")         

Different iterations and runs of models are organized in the form of experiments. Every run or iteration of a given model is tracked through a uuid as a unique version. 

All the parameters that are configured to be logged will be stored as part of the tracking server for reporting purposes. As can be seen in the picture above these can be either the hyperparameters or the metrics that measure the accuracy of the models. With simple log statements as can be seen below, we can make them part of the tracking reports automatically.

mlflow.log_param("alpha", alpha

mlflow.log_param("l1_ratio", l1_ratio)

mlflow.log_metric("rmse", rmse)

mlflow.log_metric("r2", r2)

mlflow.log_metric("mae", mae)

mlflow.log_metric("mape", mape))        

The model binaries will be stored inside the tracking server. The model artifact stored inside the tracking server will look like the following. It will contain,

  • The model binary, i.e. model.pkl
  • Couple of configuration files

No alt text provided for this image

As can be seen the content of the yaml file will have the dependencies required to deploy and run the model.

This model artifact can be deployed either by using the mlflow serve component. It can also be exported as a container and run any container runtime as well.

With mlflow serve it can be deployed with a very simple command like the following,

mlflow models serve -m file:///<path-to-model-artifact>/mlruns/0/7f48936c8f4e440a9955af91a3f215db/artifacts/model        

With all the dependencies packaged under a container the same model can be deployed under any container runtime like docker, kubernetes, ECS, etc.

When the model is deployed using this approach, it comes out with a REST API wrapper automatically. Without doing any hand coding of loading the pkl file and creating a REST API wrapper on top of it. This is a huge benefit in terms of avoiding a lot of repetitive work for every possible model that we deploy in production.

Now to choose the problem of which version of the model to be deployed, we can refer to the tracking reports as can be seen above. We can choose the model version with the best possible values of mae, rmse, etc. and deploy the same using this approach.

Advantages of MLFlow

As can be seen in the previous section, MLFlow acts as a platform that can manage the end to end lifecycle of ML model development. Starting with developing the models, versioning, tracking, exporting model binaries and deployment. 

Without the use of such a tool,

  • Tracking several versions of the models with different combinations of the hyperparameters and features would be a highly tedious, time consuming and error prone
  • Lot of repetitive work involved in exporting the models into pickle file and deploying with a hand coded REST API wrapper
  • Letting different developers from the team to work together on a common problem and share their work on a common problem would be an issue

MLFlow as a tool helps us solve all of these problems. 

Team collaboration

As can be seen below several developers can collaborate together in a common experiment. The moment all of them point to a common centralized tracking repository, their work will be available to be shared among them and other parts of the team.

Team Collaboration with MLFlow

Supported libraries & flavours

On top of that MLFlow supports models developed using several possible libraries as well as can be seen below as per the documentation here. That means as a developer you won’t be restricted to the use of any specific libraries when you train models.

Model flavors

Similarly it supports a variety of data stores onto which the model artefacts can be stored and loaded from.

No alt text provided for this image

Easier integration and deployment

It also has support for integration with several deployment environments.

  • Deploying standalone using mlflow serve component
  • Deploying on top of AzureML
  • Deploying on top of Amazon Sagemaker
  • Deploying as a Spark UDF

Using the build docker option as follows, the model binaries can be exported as docker containers and deployed on top of any docker compatible container runtime.

mlflow models build-docker -m "runs:/some-run-uuid/my-model" -n "my-image-name"

docker run -p 5001:8080 "my-image-name"
        

It also has options like the following to build containers that are compatible with Sagemaker for example.

mlflow sagemaker build-and-push-container        

These options make the integration and deployment of the models with different tools really seamless.

Extension for non-standard ML models

Another important feature that MLFlow provides is that along with any standard ML models like regression, classification, etc. any simple/complex statistical functions can also be exported as a model binary. The same can be deployed like any other standard ML models.

This will be a really useful feature when there are a lot of statistical models being used along with the standard ML models. As can be seen below any model class with an implementation of a predict function can be easily exported and exposed through the same pipelines.

class AddN(mlflow.pyfunc.PythonModel)

   def __init__(self, n):

       self.n = n

   def predict(self, context, model_input):

      return model_input.apply(lambda column: column + self.n)        

Sagemaker for deployment

Amazon Sagemaker is a fully managed service from AWS that supports building, training and deploying of ML models. Where the models can be deployed and scaled automatically on the need basis. As the underlying container runtime is fully managed by Sagemaker. 

On top of this Sagemaker also provides so many other capabilities like,

  • Seamless integration with CI/CD environment 
  • Autoscaling of the models deployed to manage the incoming load of scoring traffic
  • Support for different experimentation strategies to support A/B testing, shadow testing, etc.
  • Post deployment monitoring of the models to detect bias, model/data drift, etc.
  • Integrated ML model development environment to standardize the model development environment
  • Fully managed feature store that can let features shared among different models

In our case we have made use of Sagemaker as an option to deploy our models with an automated pipeline for now. 

While model development, tracking & versioning, etc. are handled through MLFlow, the deployment of model binaries are handled through Sagemaker.

Integrated ML Pipeline

Our objective when we started this journey was to get to this fully automated and integrated ML pipeline. With very minimal manual intervention required. 

As can be seen in the picture below, we have achieved precisely that with the use of tools like MLFlow, Sagemaker and Jenkins. As a result we have taken away several touch points of manual interventions that were required to take a model and integrate it into a production system.

Integrated Pipeline

Following would be the benefits that we get out of this integrated pipeline,

  • Faster turnaround time for taking models into production
  • Productivity improvement for the team developing models
  • Opportunity to iterate through several possible techniques, hyperparameters and feature combinations
  • Reduced manual intervention and dependencies among different teams
  • Centralized tracking of all the models with no experiments lost over time

What next?

While we have tackled several aspects of the MLOps pipeline, there are few more areas where we need to add further capabilities. 

Following are some of these areas which we would be exploring further here,

  • Standardized model development environment with all required integrations with the data pipelines, tools & libraries, compute power required for model training, etc.
  • Common feature store that can be shared across different models
  • To provide support for different experimentation strategies like A/B testing, shadow models, etc.
  • Post deployment monitoring capabilities of models to monitor bias, drift, etc.


Muralidharan K

Founder, CEO at Saturam | Advanced Data Engineering | Data Reliability on Cloud

3y

Good Articulation of the value creation with ML Ops Automation/pipelines

Like
Reply

To view or add a comment, sign in

More articles by Vivek Murugesan

  • CISC, RISC and GPU architecture

    Introduction If you are working on building machine learning, deep learning and applications that leverage these…

    2 Comments
  • Astonishing numbers from the game of chess

    As you are aware the game of chess has been played for several centuries. But every time you play, you may end up…

    3 Comments
  • Introduction to MLOps

    ML Ops is a set of practices that combine efforts from Machine Learning, DevOps and Data Engineering teams to get the…

    6 Comments
  • Overview of Computer Vision

    Background This article is for people who wonder what this Computer Vision is all about and why there is so much hype…

    9 Comments
  • Introduction to NoSQL systems

    I am writing this article, As an extension to my previous article on NoSQL systems. While I focused on some specific…

    9 Comments
  • An introduction to Event Driven Architecture

    Event Driven Architecture (EDA) is a software architecture pattern promoting the production, detection, consumption of…

    7 Comments
  • Part2: Does math really help with coding?

    This is a continuation of the article I published a few days ago. Following are items I promised to capture in the…

  • Does math really help with coding?

    Idea behind this article is to talk about the importance of mathematical models/functions and their importance in…

    2 Comments
  • Evolution of Eventual Consistency

    Consistency is one of the really critical aspects of the legacy, Database systems. But some of the modern day…

    6 Comments
  • Scalable Graph Computation for Data Science

    1. Background Typically aspiring Data Scientists and some of the experienced Data Scientists as well, overlook the…

    2 Comments

Insights from the community

Others also viewed

Explore topics