MLOps eliminates manual toil and mistakes working with multiple versions of data preprocessing and feature engineering pipelines, datasets, respective train/validation/inference source code, and models by automation and high-level orchestration.

Why would you want to automate ML operations?

Suppose your team works on multiple DS/AI directions and approaches. In that case, it will face the chaos of versions of data preprocessing, datasets, features, and models in no time. You will also discover that Data Scientists are slow at infrastructure works – for instance, deploying training and operationalization of ML models on a cloud Kubernetes cluster. Automation of ML operations is a long-term investment. It pays off in the following way:

  • Fast and reliable iterations of experiments.
    No need to rewrite source code on every iteration of ML experiments. Set up train and validation workflows once, automate them, and toggle hyperparameters from now until desired results.
  • Flexible scaling of computational resources.
    There are times when computational resources are in high demand. But your team doesn’t need them spinning during the night or on the weekend. Flexible scaling releases unused resources, automatically cutting down the infrastructure keeping cost. And there is always somebody who forgot to shut down that expensive GPU instance – if the shutdown was not automated, of course. Do you like bills for something that wasn’t used on purpose? Of course, you don’t.
  • Consistent track of experiments’ results.
    Hyperparameters are aligned with validation results and logged automatically.
  • Seamless integration and deployment.
    Your service requires iterative updates. Believe us, you don’t want to do that manually.
  • Risk mitigation.
    Compare new ML models to the existing ones, detect and avoid input drift. Eliminate human mistakes at all stages of the ML roadmap.
  • Automatic monitoring and feedback loop.
    The live model scoring environment and the model training environment are distinct. 
    As a result, test environment scores are likely to be different from those in reality.
    This risk mitigation exposes the model to the actual environment as a canary deployment. Observing and recording real-world performance metrics and predictions allows detecting corner cases, iterating, and developing a better model – without human intervention.
Delivarables Data groomed for analytics and AI The best ML model is trained and selected The model is ready for production ML service is integrated and deployed to production Tracking the service performance and environment feedback MODEL DEVELOPMENT RUNTIME ENVIRONMENT ENGINEERING DEPLOYMENT AND INTEGRATION MONITORING AND FEEDBACK DATA WAREHOUSE MLOps Monitoring Feedback predictions, features, metrics model upgrade RUNTIME ENVIRONMENT ENGINEERING MODEL DEVELOPMENT DEPLOYMENT AND INTEGRATION MONITORING AND FEEDBACK DATA WAREHOUSE model upgrade predictions, features, metrics

Data Warehouse

Data warehouse is a persistent, secure, enterprise-grade storage for the groomed data that have been extracted and transformed at the ETL stage. The data is ready for analytics and AI tasks.

Model Development

The outcomes of the Model Development stage are:

  • the finest trained model stored as a binary file
  • supporting source code pushed to the repository
  • recorded results of data exploration and model evaluation metrics

The following ML routines are automated:

  1. Data Exploration
    At this stage, the following questions are answered:
    How does your data look?
    Does it allow effectively solve your problem?
    What gaps does it have?
  2. Feature Engineering
    Record the current list of features and how they were computed to replay or analyze them later.
  3. Recording Architectures, Hyperparameters, and Performance Metrics
    During the ML trials stage, engineers test multiple architectures, hyperparameters, and metrics. It is best to keep track of all outcomes automatically as people tend to overlook things. Having a comprehensive log of trials simplifies the choice of the most competent model.
  4. Model Evaluation
    Before facing the destination environment, the model performance is evaluated on a separate data subset that was not used in training. Usually, this subset is the best possible approximation of the real-world inputs. Obtained metrics are a significant indicator of the model’s generalization capabilities and set expectations for the model behavior in the desired environment.
  5. Selection of the best model
    Once all experiments’ performance metrics and predictions have been recorded, selecting the best model becomes trivial and is done without human endeavors.

Runtime Environment Engineering

The main deliverable of the Runtime Environment Engineering stage is a standalone service where the binary file of the trained ML model file, supplementing source code, and other required artifacts are wrapped up into executable runtime. The service is ready for integration into the system infrastructure.

The following steps are automated:

  1. Runtime Environment building
    Whether it would be a custom-built service, data science platform, dedicated services like TensorFlow Serving, low-level infrastructure like Kubernetes cluster, JVM on embedded system, or multiple heterogeneous production environments coexistent.
  2. Quality Assurance
    Elimination of bugs and errors in data preparation as well as model design, train, and evaluation Validation of technical compatibility of the model and its runtime environment Verification of origins of all input datasets, pre-trained models, and other assets, as they could be subject to regulations or copyrights. Automation of the validation operations to ensure their appropriateness and consistency while maintaining the ability to deploy quickly.
  3. Reproducibility and audibility
    Provide the ability to rerun the same experiment and get the same results easily. Model architecture and hyperparameters aligned with the data used for training and testing, the metrics reached, plus the full specification of the training environment.
  4. Security
    Machine learning introduces a new range of potential threats where an attacker intentionally provides malicious data designed to cause the model to make a mistake.

Preparing for Production

The main deliverable of the Preparing for Production step is a standalone service where the binary file of the trained ML model file, supplementing source code, and other required artifacts are wrapped up. The service is not facing the environment for now, but it is ready to.

The following steps are automated:

  1. Runtime Environment establishment
    Whether it would be a custom-built service, data science platform, dedicated services like TensorFlow Serving, low-level infrastructure like Kubernetes cluster, JVM on embedded system, or multiple heterogeneous production environments coexistent.
  2. Quality Assurance
    Elimination of bugs and errors in data preparation as well as model design, train, and evaluation Validation of technical compatibility of the model and its runtime environment Verification of origins of all input datasets, pre-trained models, and other assets, as they could be subject to regulations or copyrights. Automation of the validation operations to ensure their appropriateness and consistency while maintaining the ability to deploy quickly
  3. Reproducibility and audibility
    Provide the ability to rerun the same experiment and get the same results easily. Model architecture and hyperparameters aligned with the data used for training and testing, the metrics reached, plus the full specification of the training environment.
  4. Security
    Machine learning introduces a new range of potential threats where an attacker intentionally provides malicious data designed to cause the model to make a mistake.

Deployment and Integration

After successful model development, a data scientist pushes the code, metadata, and documentation to a central repository. This action triggers an integration and deployment workflow.

During the Deploying to Production stage, the following routines are automated:

  1. Building the model
    Building the model artifacts.
    Pushing the artifacts to a container image repository.
    Running basic checks (smoke tests and sanity checks).
    Generating fairness and explainability reports.
  2. Deploy to a test environment
    Validation of the metrics, and computational performance.
  3. Deployment and integration into the production environment
    Deploying the model as a canary.
    Verify correct behavior and metrics.
    Full production model deployment.

Monitoring and Feedback Loop

Production machine learning models can degrade in quality fast and without warning – until it’s too late – it has a potentially negative impact on the business. That’s why model monitoring is a crucial step in the ML model life cycle and a critical piece of MLOps.

Monitoring aims to address two major concerns:

  1. Technical
    Is the system alive?
    Are the CPU, RAM, network usage, and disk space as expected?
    Are requests being processed at the expected rate?
  2. Performance
    Is the model still accurate?
    Is it performing as well as it did during the design phase?
    Automatic detection of the production data distribution drift.

The Feedback Loop is the information flow from the production environment back to the model training environment for further improvement.

The production feedback is constantly recorded. The feedback is used to detect the model’s performance degradation and to augment the training dataset. Once the degradation is detected, the update is triggered. The update includes retraining the model on the augmented dataset or developing a new model with additional features.