Machine Learning Operations (MLOps)
Set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. The word is a compound of “machine learning” and the continuous development practice of DevOps in the software field
# Resources
- https://en.wikipedia.org/wiki/MLOps
- https://github.com/GokuMohandas/MadeWithML
- https://github.com/visenger/awesome-mlops
- https://github.com/EthicalML/awesome-production-machine-learning
- More Data, More Problems: Using DVC to handle data versioning for a computer vision problem
- What is MLOps and how to get started? | MLOps series
# Courses
- #COURSE Machine Learning Systems Design (CS 329S, Stanford)
- #COURSE Deploying Machine Learning Models in Production (Coursera, DeepLearning.AI)
- #COURSE Effective MLOps - Model development (Weights & Biases)
- #COURSE CI/CD for Machine Learning (GitOps) (Weights & Biases)
- #COURSE MLOps Course
# Code
- #CODE
MUSE
- Open source, stable-diffusion production server to show how to deploy diffusion models in a real production environment with: load-balancing, gpu-inference, performance-testing, micro-services orchestration and more. All handled easily with the Lightning Apps framework
# Experiment tracking
- See AI/Supervised Learning/Model selection and tuning
- https://neptune.ai/blog/best-ml-experiment-tracking-tools
- #CODE
Weights & Biases - A tool for visualizing and tracking your machine learning experiments
- https://docs.wandb.com/
- Tool to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
- #CODE Aim - The open-source tool for ML experiment comparison
- #CODE ClearML
# Visualization and UI
- #CODE kedro-viz - Visualise your Kedro data and machine-learning pipelines and track your experiments
- #CODE Gradio - Create UIs for your machine learning model in Python in 3 minutes
# Workflow managers
- #CODE Kedro - A Python framework for creating reproducible, maintainable and modular data science code
- #CODE MLrun - The Open-Source MLOps Orchestration Framework
- #CODE
Metaflow - Human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects
- Originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning
- https://metaflow.org/
- #CODE metaflow-ui
- #CODE
Flyte - Kubernetes-native workflow automation platform for complex, mission-critical data and ML processes at scale
- It has been battle-tested at Lyft, Spotify, Freenome, and others and is truly open-source
- https://flyte.org/
- #CODE MLFlow
- #CODE Airflow - Apache Airflow is a platform to programmatically author, schedule, and monitor workflows
- #CODE Luigi (Spotify)
- #CODE Kubeflow - cloud-native platform for machine learning operations - pipelines, training and deployment
- #CODE Azkaban
- #CODE PredictionIO (Apache)
# ML platforms
- Azure (Microsoft)
- Google Cloud Platform
- Vertex AI - Google Cloud’s unified ML platform
- Pick your AI/ML Path on Google Cloud
- https://codelabs.developers.google.com/
- https://cloud.google.com/products/ai/
- https://medium.com/google-cloud/jupyter-tensorflow-nvidia-gpu-docker-google-compute-engine-4a146f085f17
- Cloud AI building blocks
- Cloud ML Engine
- AI Hub
- Cloud AutoML
- Amazon web services (AWS)
- Watson (IBM)
- Dataiku DSS
- Domino DataLab
- RapidMiner
- Knime