MLOps eliminates manual toil and mistakes working with multiple versions of data preprocessing and feature engineering pipelines, datasets, respective train/validation/inference source code, and models by automation and high-level orchestration.

Why would you want to do MLOps?

If your team works on multiple DS/AI directions and/or approaches, it will face the chaos of versions of data preprocessing, datasets, features, and models in no time. You will also be surprised to discover that Data Scientists are not as good as you want them to be at infrastructure works – for instance, making Kubernetes deployments for training and operationalization of ML models.

MLOps is a long-term investment too.
The ROIs are the following:

  • Fast and reliable iterations of experiments.
    No need to rewrite Python/Java/R code on every iteration.
    Set up train and validation workflows once, automate them with MLOps, and toggle hyperparameters from now on.
  • Flexible scaling of computational resources.
    There are times when computational resources are in high demand.
    But your team doesn’t need them spinning during the night or a weekend, for example.
    And there is always somebody who forgot to shut down that expensive GPU instance – if the shutdown was not automated, of course.
    Do you like bills for something that wasn’t used on purpose? Hope you don’t.
  • Consistent track of experiments’ results.
    Hyperparameters are aligned with validation results and logged automatically.
    They always forget about it, you should know.
  • Seamless integration and deployment.
    Your service requires iterative updates.
    Believe us, you don’t want to do that manually.
  • Risk mitigation.
    Compare new ML models to the existing ones, detect and avoid input drift.
    Eliminate human mistakes at all stages of the ML roadmap.
  • Automatic monitoring and feedback loop.
    The live model scoring environment and the model training environment are distinct. 
    As a result, test environment scores are likely to be different from those in reality.
    This risk mitigation exposes the model to the actual environment as a canary deployment. Observing and recording real-world performance metrics and predictions allows detecting corner cases, iterating, and developing a better model. 
    Without human intervention.
Delivarables Data groomed for analytics and AI The best ML model is trained and selected The model is ready for production ML service is integrated and deployed to production Tracking the service performance and environment feedback MODEL DEVELOPMENT RUNTIME ENVIRONMENT ENGINEERING DEPLOYMENT AND INTEGRATION MONITORING AND FEEDBACK DATA WAREHOUSE MLOps Monitoring Feedback predictions, features, metrics model upgrade RUNTIME ENVIRONMENT ENGINEERING MODEL DEVELOPMENT DEPLOYMENT AND INTEGRATION MONITORING AND FEEDBACK DATA WAREHOUSE model upgrade predictions, features, metrics

Data Warehouse

Data warehouse is a persistent, secure, enterprise-grade storage for the groomed data that have been extracted and transformed at the ETL stage. The data is ready for analytics and AI tasks.

Model Development

The outcomes of the Model Development stage are:

  • the finest trained model stored as a binary file
  • supporting source code pushed to the repository
  • recorded results of data exploration and model evaluation metrics

The following ML routines are automated:

  1. Data Exploration
    At this stage, the following questions are answered: How does your data look? Does it allow effectively solve your problem? What gaps does it have?
  2. Feature Engineering
    Record the current list of features and how they were computed to replay or analyze them later.
  3. Recording Architectures, Hyperparameters, and Performance Metrics
    During the ML trials stage, engineers test multiple architectures, hyperparameters, and metrics. It is best to keep track of all outcomes automatically as people tend to overlook things. Having a comprehensive log of trials simplifies the choice of the most competent model.
  4. Model Evaluation
    Before facing the destination environment, the model performance is evaluated on a separate data subset that was not used in training. Usually, this subset is the best possible approximation of the real-world inputs. Obtained metrics are a significant indicator of the model’s generalization capabilities and set expectations for the model behavior in the desired environment.
  5. Selection of the best model
    Once all experiments’ performance metrics and predictions have been recorded, selecting the best model becomes trivial and is done without human endeavors.

Preparing for Production

The main deliverable of the Preparing for Production step is a standalone service where the binary file of the trained ML model file, supplementing source code, and other required artifacts are wrapped up. The service is not facing the environment for now, but it is ready to.

The following steps are automated:

  1. Runtime Environment establishment
    Whether it would be a custom-built service, data science platform, dedicated services like TensorFlow Serving, low-level infrastructure like Kubernetes cluster, JVM on embedded system, or multiple heterogeneous production environments coexistent.
  2. Quality Assurance
    Elimination of bugs and errors in data preparation as well as model design, train, and evaluation Validation of technical compatibility of the model and its runtime environment Verification of origins of all input datasets, pre-trained models, and other assets, as they could be subject to regulations or copyrights. Automation of the validation operations to ensure their appropriateness and consistency while maintaining the ability to deploy quickly
  3. Reproducibility and audibility
    Provide the ability to rerun the same experiment and get the same results easily. Model architecture and hyperparameters aligned with the data used for training and testing, the metrics reached, plus the full specification of the training environment.
  4. Security
    Machine learning introduces a new range of potential threats where an attacker intentionally provides malicious data designed to cause the model to make a mistake.

Deployment and Integration

After successful model development, a data scientist pushes the code, metadata, and documentation to a central repository. This action triggers an integration and deployment workflow.

During the Deploying to Production stage, the following routines are automated:

  1. Build the model
    Build the model artifacts
    Send the artifacts to long-term storage
    Run basic checks (smoke tests and/or sanity checks)
    Generate fairness and explainability reports
  2. Deploy to a test environment
    Run tests to validate ML performance, computational performance
  3. Deploy to the production environment
    Deploy the model as canary
    Verify correct behavior and metrics
    Fully deploy the model

Monitoring and Feedback Loop

Production machine learning models can degrade in quality fast and without warning – until it’s too late – it has a potentially negative impact on the business. That’s why model monitoring is a crucial step in the ML model life cycle and a critical piece of MLOps.

Monitoring aims to address two major concerns:

  1. Technical
    Is the system alive?
    Are the CPU, RAM, network usage, and disk space as expected?
    Are requests being processed at the expected rate?
  2. Performance
    Is the model still accurate?
    Is it performing as well as it did during the design phase?

The Feedback Loop is the information flow from the production environment back to the model training environment for further improvement.

The production feedback is constantly recorded. The feedback is used to detect the model’s performance degradation and to augment the training dataset. Once the degradation is detected, the update is triggered. The update includes retraining the model on the augmented dataset or developing a new model with additional features.