Introduction to MLOPS with Microsoft Azure

August 23, 2022

 

Introduction to MLOPS with Microsoft Azure

Teams in a Machine Learning Project

In a medium to large organizations, who have a mature IT infrastructure a machine learning project cannot be done by a single team of data scientists, in this process, many teams have to be involved. Usually, you will see these teams getting engaged in an ML project.

Challenges for Successful Machine Learning Projects

In their report,” Predicts 2020: Artificial Intelligence — the Road to Production”, Gartner had mentioned “A litmus test of organizations’ maturity is how quickly and repeatedly they can get these AI systems into production. Our surveys are showing that organizations are not managing to do this as quickly as they had hoped”.

1. Lack of Team Collaborations

On boarding many teams in a project can sometimes become a recipe for failure. Sometimes data scientists are multi skilled to engineer basic data pipelines for data collection and do in-depth testing, but they cannot do away with the involvement of IT Teams and Operations.

2. Unable to do Iterations for Improvements

Just like any software, the initial versions of ML models are always far from stable. In traditional software, new iterations of developments are easy because it only requires code changes. But in the machine learning system, the new version will require not only coding but also data collection for the training of the model, which increases the timeline for the development of a new version.

Life Cycle of Machine Learning Project -(Source)

Meet MLOPs — The DevOps for Machine Learning Projects

Not very far back, the software industry was also grappling with many similar issues while executing software projects. They came up with the process of DevOps to streamline development and operations and also to automate several manuals touch points with what is known as DevOps pipeline.

MLOPs Pipelines

Much is said about the DevOps pipelines but breaking down the hype, a pipeline is nothing but a series of predefined steps that are executed from start to end automatically, every time you plan to introduce new changes.

1) Data Pipeline

To train any ML model you need to collect data from one or many sources through a process known as the ETL process which stands for Extract, Transform, and Load. In this process, data is extracted from the source system(s), transformed for cleaning and preprocessing, and then loaded into a target data store (data lake). This entire process is encapsulated in the Data Pipe Line which is usually invoked in the initial stage of other MLOPs Pipeline for data collection.

2) Environment Pipeline

Your development environment may require not only external libraries but even some custom libraries and dependencies. If these libraries are not imported, then the code will not run. An Environment Pipeline ensures that all these dependent libraries are loaded properly and consistently every time before proceeding further in the MLOPs Pipeline.

3) Training Pipeline

This pipeline is responsible for doing the actual training of our ML model. Before the execution of this pipeline, you will usually call the data pipeline and environment pipeline for provisioning data store and loading dependencies. Once training is completed the weights of the models created are saved at the end so that they can be used further in the subsequent pipelines.

4) Continuous Integration Pipeline

This pipeline enforces the continuos integration culture of DevOps where it is required to commit small changes into the code repository frequently instead of a big bang commit. Each of these small commits has to undergo the scrutiny of code quality and unit testing so that any issue can be revealed to the project team early in the development phase. In Python, for code quality, we can use linting libraries like PyLint and Flake8.

5) Testing Pipeline

The model created should be tested properly before it can be deployed in live because your model will have to work with the existing production code of the application. So unless your model passes all the testing and validations it will not be allowed to pass through this pipeline and the overall execution of MLOPs pipeline is aborted and the failure report is sent to the data scientist for review. Only if testing is successful, the model is allowed for the deployment pipeline.

6) Continuous Deployment Pipeline

This pipeline takes care of the deployment process and is usually invoked at the last when the earlier pipelines have run and the model is given a green chit for deployment. Depending upon the deployment strategy and architecture, it can be deployed directly into a server or packaged in a container like Docker or even Kubernetes across clusters.

MLOPs with Microsoft Azure

Microsoft Azure provides many robust services in its ecosystem to create an end to end MLOps pipeline through its product Azure DevOps. It was launched in 2018 but existed in a crude form as Visual Studio Teams since 2006. Azure DevOps comes with the following features and services -

Creating Pipelines with Azure DevOps

Shifting our focus to the earlier discussion of MLOPs pipelines, we can create these pipelines by using the Azure Pipeline service in Azure DevOps.

Deployment Pipeline for Docker in Azure Ops

We mentioned earlier that the strategy for the deployment pipeline may include Docker which is used to create containers for packaging all your code and dependencies. Let us see how we can do this in Azure DevOps.

Conclusion

In this article, we understood why it is so difficult to successfully deliver an ML project into production and how MLOPs pipelines, a concept derived from DevOPs, can help in the successful delivery of the ML projects. We discussed various types of pipelines possible in MLOPs and also saw how to create these pipelines in Azure DevOps platform.


                                                                                                                                                                    credits :Pradeep Natarajan 

Reference

https://www.slideshare.net/zobeide/mlops-how-to-bring-your-data-science-to-production


No comments:

Powered by Blogger.