Yet ANOther MAchine Learning OPerations Article (YANO MALOPA)
But a subset MADness

Yet ANOther MAchine Learning OPerations Article (YANO MALOPA)

In this article, I will describe the Machine Learning Operations (MLOps) aspect of Aleksei Luchinsky and my project called "Brain Tumor Detection with Topological Data Analysis and Machine Learning Operations". If you want a more in-depth discussion about Topological Data Analysis (TDA), request from Aleksei Luchinsky to write that article.

For my part, your first question will be: What is MLOps? I find it helpful to break MLOps into its two pieces: 1) Machine Learning and 2) Operations.

Machine Learning

Let's first contrast machine learning with traditional programming/learning.

Traditional programming takes in data and rules, regulations, steps, procedures to produce an outcome. In other words:

Traditional Programming = Data + Program Outcomes.

Machine learning takes in data and outcomes to produce rules, regulations, steps, and procedures. In other words:

Machine Learning = Data + Outcome Program.

Hopefully, this disambiguation demystifies machine learning algorithms. When you hear a data scientist mention a random forest or neural network, remember, the program/model is really a collection of rules, regulations, steps, and procedures of the best mapping of the data to their respective outcomes.

Article content
Nice visualization. Earliest version of image I found comes from link when you click image.

(More can be said about machine learning, such as "What criteria is used to define 'best mapping'?", or "What tuning can be done for different algorithms?", or "How does the program explain the mapping between data and output?" All good questions, none answered here.)

Operations

"But these people have never heard of Our Ford, and they aren't civilized," a quote from Brave New World. Aldous Huxley used Henry Ford as a metaphor for mass consumption, religious zeal, and technocracy. I include the quote as a homage to Fordism which I'll use, in turn, as an allusion to operations.

The evolution from traditional machine learning practices to MLOps mirrors the historical shift from artisanal/craft production to Fordism in manufacturing. Just like Fordism revolutionized industrial production with its emphasis on efficiency, standardization, and scalability; MLOps transformed machine learning by embedding similar principles into the development and operation of ML models.

Article content
ChatGPT doing its creative thing.

Let's review assembly line production. An assembly line, in an MLOps context, is where different stages of the machine learning lifecycle are streamlined into a continuous integration and continuous deployment (CI/CD) pipeline. Each phase of the pipeline—from data collection and preprocessing to training, validating, and monitoring models—is optimized and automated.

Similar to the assembly line function, MLOps breaks down the machine learning lifecycle into distinct, standardized phases: data ingestion, model training, model tracking & registration, deployment, and monitoring. Each phase is designed to be repeatable and scalable, supported by automated pipelines.

Furthermore, through automation and the use of scalable technologies, models can be retrained and redeployed to handle growing data volumes or to adapt to new conditions without extensive manual effort. The operational efficiency of MLOps, with continuous monitoring and automated adjustments, ensures that models maintain high performance even as operational conditions change.

MLOps in Brain Tumor Detection

The architecture of our MLOps implementation attempts to utilize only open-source technologies. I believe many readers will be familiar with technologies in the bottom-center and bottom-left sections, so descriptions of those, I will skip.

Article content
Click image to see the full MAD landscape.

Docker is a tool to build, package, and deploy applications in a consistent and isolated environment. Docker Compose is tool that lets you define and run multi-container Docker applications. You set up everything you need for your app in a simple YAML file, and start everything with one command. MLflow is a platform that helps you manage the whole machine learning lifecycle, including experiment tracking, model deployment, and workflow automation. Evidently is a tool that can generate detailed reports to monitor your machine learning models in training and production, helping you keep an eye on things like data drift and model performance. Apache Airflow is the king of technologies in our particular MLOp. Airflow helps you plan, schedule, and monitor complex workflows. You set up tasks and dependencies in a clear, logical sequence and Airflow manages the execution for you.

Airflow is called an orchestrator. In Airflow, you write pipelines to perform a particular task. These pipelines are called a Directed Acyclic Graph or DAGs. We have three DAGs that compose our MLOp:

  1. Data Ingestion and Preprocessing
  2. Model Training and Experimentation
  3. New Data Triggering Model Monitoring

Article content
The three DAGs and how they connect.

1) Data Ingestion and Preprocessing

Sequentially, the first DAG takes in a Kaggle dataset of MRI images. The images undergo preprocessing, viz. trimming the image and standardizing color, shape, and extension. Then, it creates each MRI's Persistent Diagrams (PD) through which it performs TDA extraction techniques such as Persistence Landscape, Persistence Blocks, Functional Data Analysis, and a summary technique. This PD to TDA-extraction-technique converts the MRI image into a structured tabular format conducive for a set of machine learning algorithms.

(You see why you should nudge Aleksei Luchinsky to write a post?)

2) Model Training and Experimentation

Once the data is ingested and preprocessed, our second DAG begins. This stage involves training various machine learning models on the extracted data, specifically, XGBoost, Gradient Boost, Random Forest, Logistic Regression, and Support Vector Machine. The DAG manages the train/test splitting (80/20), Kfold (k=5) cross-validation, regressor standardization, and a grid search per algorithm. MLflow is used to track each experiment, capture performance metrics, handle model versioning, and serve predictions.

3) New Data Triggering Model Monitoring

The third DAG triggers when new data is placed into one of two folders. These folders represent a doctor filing whether she found an MRI to contain a tumor. Once an image is added, the DAG sets off a series of events:

  • Searches for the best MLflow registration of models corresponding to the best results from previous experiments;
  • Depending on the selected model, the corresponding TDA extraction process runs, using the data from the triggered folder;
  • The extracted MRIs, along with the test data of the respective extraction, are passed to MLflow to serve predictions; and
  • With these predictions, it produces an Evidently AI report on data drift and target drift, classification metrics, and dummy metrics.

Article content
Architecture as described above.

The image above represents the architecture described above. The blue colored blocks represent the first DAG, the red colored blocks represent the second DAG, and the yellow colored block represents the third DAG.

A final and important note not able to be depicted in the image; when new data is added, each extraction is ran and appended to its respective extraction's master table. In effect, when the Model Training and Experimentation DAG is reran (can be triggered after a certain amount of new data or data drift becoming unwieldy, etc), the data being fed into machine learning models includes all the Kaggle data in addition to every new data that has been added by the doctor. This is part of the continuous improvement aspect of MLOps.

Results

Results for an MLOps process must demonstrate how the system performs when new data is introduced. New data comes from another Kaggle MRI dataset. The new MRI dataset was meant as a multi-classification problem. MRIs are labeled as either no-tumor, glioma, meningioma, and pituitary tumors. For reference, the original dataset contained no-tumor and meningioma tumors, only. We batch-introduced "yes" and "no" images according to their respective folders for four different batches.

Experiment 1: Subset of Sames

In the first experiment, we added a subset of the training images from the new data: 13 meningioma MRIs and 12 non-tumorous MRIs. Results are show below:

Article content
Batch 1 results

"Dummy (by reference)" is a naive labeler according to the distribution of "yes" and "no" labels in the test data. "Dummy (by current)" is a naive labeler according to the distribution of "yes" and "no" labels in the new data. "Model" is how the model performed on the new data.

Experiment 2: Glioma Test

In the second experiment, two things were different.

  1. Combined data, train and test, was used as the reference data, and
  2. For the 9 "yes" images, glioma tumors were used.

In general, glioma does not look like meningioma. Meningioma have bright spots that indicate a tumor, while glioma tumors are outlined by a mostly visible circle, not brightness. Results are shown below:

Article content
Batch 2 results.

Given the "yes" folder MRIs are completely unseen by the model, unsurprisingly, the model fit was worse. I take this as a good sign if only because the results from are lining up with the intuition that the model won't perform as well.

Experiment 3: Triple Training

In the third experiment, we added the full training folder from the new data. This time, in the "yes" folder, we kept only meningioma instead of adding either glioma or pituitary tumors. In all, there are 252 "No" images and 214 "Yes" images, nearly tripling our original data. Results are shown below:

Article content
Batch 3 results

With experiment 3, we added many more images to our dataset. We did this to prepare experiment 4. Before monitoring experiment 4, we reran Model the Training and Experimentation DAG (can be triggered after a certain amount of new data or data drift becoming unwieldy, etc). Recall the process of the Training and Experimentation DAG, effectively: 1) split extractions into test and train, and 2) run, fit, and register each extraction's best model.

Experiment 4: CI/CD Re-train Re-monitor

In the fourth experiment, we added the full testing folder from the new data. Here again, in the "yes" folder, we kept only meningioma instead of adding glioma or pituitary tumors. In all, there are 89 "No" images and 79 "Yes" images. For reference data, we switched back to using the conventional testing data. Recall that testing comprises 20% of the reference data. This 20% is split between 86 "No" and and 60 "Yes", after all previous batches were added to the original data.

The evaluative tables below are slightly different than the previous three. I wanted to show these tables to see how the best model fit to both 1) the new data testing folder (current), and 2) how the newly trained best model fit to the newly partitioned test data.

Article content
Batch 4 results

Turns out, even with the nearly tripling of our data, the best model still came from Persistence Block extraction. Had it have been from another extraction, the Monitoring DAG would have evaluated that extraction's best model on the new data's extraction of the same type.

Overall, the results are exciting; on the relatively small test data, we get an F1 score of 0.762. More importantly, on the new incoming data, the F1 score is 0.883. This is very good and I think a well crafted test case for MLOps in the real-world.

Summary

In conclusion, the successful implementation of MLOps was a exciting process. Not only was it exciting, but we demonstrated the effectiveness of our pipelines by introducing new data in batches and evaluating the performance of the best model on the new data. The results showed that our models maintained a high F1 score throughout many batches. We demonstrated the continuous improvement aspect in our MLOps system by retraining models on all the new data the doctor inputted. Overall, this implementation showcases the power of MLOps to automate machine learning workflows, ensure reproducibility, and adapt to new data in real-world applications.

Any questions about the project or process, feel free to write me, I'm happy to answer any questions. Thanks for reading.

Marcelo Grebois

☰ Infrastructure Engineer ☰ DevOps ☰ SRE ☰ MLOps ☰ AIOps ☰ Helping companies scale their platforms to an enterprise grade level

1y

The article delves into MLOps for Brain Tumor Detection, explaining concepts and tools. It's a compelling read for those intrigued by data science or technology innovations. Jaryt S.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics