Machine Learning Operations (MLOps)

Last updated Nov 14, 2024 Edit Source

Set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. The word is a compound of “machine learning” and the continuous development practice of DevOps in the software field See LLM Ops

# Resources

# Courses

#COURSE MLOps Coding Course
- https://github.com/MLOps-Courses/mlops-coding-course
- https://github.com/fmind/mlops-python-package
#COURSE Machine Learning Systems Design (CS 329S, Stanford)
#COURSE Deploying Machine Learning Models in Production (Coursera, DeepLearning.AI)
#COURSE Effective MLOps - Model development (Weights & Biases)
#COURSE CI/CD for Machine Learning (GitOps) (Weights & Biases)
#COURSE MLOps Course
- https://madewithml.com/#mlops

# Code

#CODE MUSE
- Open source, stable-diffusion production server to show how to deploy diffusion models in a real production environment with: load-balancing, gpu-inference, performance-testing, micro-services orchestration and more. All handled easily with the Lightning Apps framework

# Experiment tracking

See AI/Supervised Learning/Model selection and tuning
https://neptune.ai/blog/best-ml-experiment-tracking-tools
#CODE Weights & Biases - A tool for visualizing and tracking your machine learning experiments
- https://docs.wandb.com/
- Tool to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
#CODE Aim - The open-source tool for ML experiment comparison
- https://aimstack.io/
#CODE ClearML
- https://clear.ml/

# Visualization and UI

#CODE kedro-viz - Visualise your Kedro data and machine-learning pipelines and track your experiments
#CODE Gradio - Create UIs for your machine learning model in Python in 3 minutes
- https://gradio.app/

# Workflow managers

#CODE Kedro - A Python framework for creating reproducible, maintainable and modular data science code
- https://kedro.readthedocs.io/
#CODE MLrun - The Open-Source MLOps Orchestration Framework
- https://docs.mlrun.org/en/stable/
#CODE Metaflow - Human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects
- Originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning
- https://metaflow.org/
- #CODE metaflow-ui
  - https://netflixtechblog.com/open-sourcing-a-monitoring-gui-for-metaflow-75ff465f0d60
#CODE Flyte - Kubernetes-native workflow automation platform for complex, mission-critical data and ML processes at scale
- It has been battle-tested at Lyft, Spotify, Freenome, and others and is truly open-source
- https://flyte.org/
#CODE MLFlow
- An open source platform for the machine learning lifecycle
#CODE Airflow - Apache Airflow is a platform to programmatically author, schedule, and monitor workflows
- http://nerds.airbnb.com/airflow/
- https://medium.com/datasd/why-data-automation-matters-4391d59e1952
#CODE Luigi (Spotify)
- https://luigi.readthedocs.io/en/latest/
#CODE Kubeflow - Machine Learning Toolkit for Kubernetes
- Cloud-native platform for machine learning operations - pipelines, training and deployment
- https://www.kubeflow.org/
#CODE Azkaban
#CODE PredictionIO (Apache)

# ML platforms

See AI/DS and DataEng/Cloud platforms

CarlosGG's Knowledge Garden 🪴

Machine Learning Operations (MLOps)

# Resources

# Courses

# Code

# Experiment tracking

# Visualization and UI

# Workflow managers

# ML platforms

Backlinks

Interactive Graph