MLops which stands for Machine Learning operation deals with managing Machine learning models from training to production, maintenance, and helping in collaboration with data scientists and development teams.
In this Blog, we will discuss the differences between Mlops and DevOps, MLOps workflow and the tools that are available in the market.
What is the difference Between MLOps and DevOps?
DevOps is a set of practices used in the software development life cycle that helps the IT team to maintain, build and Integrate new features with continuous delivery and maintaining high software quality.
MLOps is automating the Machine learning process from training to production and other ML workflows. MLOps is a subset of DevOps that is specialized for Machine learning applications.
To conclude DevOps and MLOps is a software development workflows that help to build solutions with continuous delivery, High software quality and smooth maintenance.
MLOps Workflow
The MLOps Workflow can be segmented mainly into 3 parts:
- Building.
- Deployment
- Monitoring
Building
The steps involved in model training include:
- Data Ingestion: This is the basic step where you build pipelines to fetch data from different data sources such as data lake, data warehouse, Database, etc.. This process usually uses the ETL (extract transform load) approach. Ex Loading user's orders data from the database.
- EDA and Data Preparation: Once the data is available we can perform a basic analysis and prepare the data for model training. Ex Getting a basic understanding of the orders data and preparing the data for train and testing.
- Model Training: The train data developed in the above steps are used to build supervised or Unsupervised models. Ex Building a classification model to classify users based on the orders they have made.
- Model Testing: Once the model is trained it's time to test the performance of the model on test/validation data to evaluate the model performance.
- Model Registry: A model registry is a place where a containerized version of the model can be saved along with all the dependency files that are required to run the model Successfully.
Deployment
The containerized model can be used for deployment. Initially, the model will be deployed in a development environment for testing.
The model can be deployed using any of the deployment environments such as:
- Scalable Kubernetes cluster
- Deployment on an edge device
- Virtual machines instance
- Container instance
The models in the development environment are deployed as an API and streaming endpoint using the above environment and then released to the production environment for practical usage.
Monitoring
Once the model is Deployed for practical usage its important to monitor and evaluate its performance with some pre-defined metrics.
Monitoring: Monitoring the performance of the model deployed in the production team so that the business and AI team can identify any potential risk before impact.
Analyzing: As we Monitor the model its also important to Analyze the characteristics of the model such as fairness, bias trust, and transparency.
Governance: We monitor and analyze the performance to ensure optimal performance for the purpose it's built. It is also import have some governing framework/rules such as compliance with local and international law standards.
Tools that are used for MLOps
Let's see some of the MLOps tools available in the market for you to get started in MLOps.
- MLFlow: MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. MLflow is library-agnostic. You can use it with any machine learning library, and in any programming language since all functions are accessible through a REST API and CLI. For convenience, the project also includes a Python API, R API, and Java API.
- Weights and Bias: Weights & Biases make it easy to track your experiments, manage & version your data, and collaborate with your team so you can focus on building the best models. Visualize results with relevant context. Drag & drop analysis to uncover insights — your next best model is just a few clicks away.
- Perfect: Prefect is a modern workflow orchestration tool for coordinating all of your data tools. Orchestrate and observe your dataflow using Prefect’s open-source Python library, the glue of the modern data stack. Scheduling, executing, and visualizing your data workflows has never been easier.
- Data Version Control (DVC): DVC is a tool for data science that takes advantage of existing software engineering toolsets. It helps machine learning teams manage large datasets, make projects reproducible, and collaborate better.
Conclusion
Finally, we are at the end of this blog, to conclude we can say now you have a basic understanding of what is MLOps and its workflow along with the tools that are used in the market.
Hope you all enjoyed this blog if you want more of this please let me know in the comment section.