Machine Learning (ML)

Last updated Mar 30, 2023 Edit Source

The use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyse and draw inferences from patterns in data

# Resources

# Cheatsheets and notes

# Naive/homemade implementations

# Open datasets (for ML, DL and DS)

See AI/DS and DataEng/Open ML data

# Books

# Courses

# Code

See AI/DS and DataEng/ML Ops
#CODE Benchmarks of ML libraries
#CODE Ludwig - declarative machine learning framework
- https://medium.com/predibase/ludwig-automl-for-text-classification-7c1759f3b150
#CODE Scikit-learn
- http://scikit-learn.org/stable/
- Contrib packages
- #TALK PyData tutorial by Sebastian Raschka
- #CODE Lightning
  - Large-scale linear classification, AI/Supervised Learning/Regression and ranking (AI/Learning to rank] in Python
- #CODE MAPIE
  - #PAPER MAPIE: an open-source library for distribution-free uncertainty quantification
  - A scikit-learn-compatible module for estimating prediction intervals for single-output regression or multi-class classification settings
- #CODE scikit-learn-intelex
  - Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
  - https://intel.github.io/scikit-learn-intelex/
#CODE mlinsights
- http://www.xavierdupre.fr/app/mlinsights/helpsphinx/notebooks/piecewise_linear_regression.html
#CODE PyCaret
- https://pycaret.org/
- PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows
- PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and many more
- PyCaret >= 2.2 provides the option to use GPU for select model training and hyperparameter tuning
#CODE Hypertools - Python toolbox for visualizing and manipulating high-dimensional data
- http://hypertools.readthedocs.io/en/latest/
- https://github.com/ContextLab/hypertools-paper-notebooks
- http://blog.kaggle.com/2017/04/10/exploring-the-structure-of-high-dimensional-data-with-hypertools-in-kaggle-kernels/
#CODE PySAL - Python Spatial Analysis Library Meta-Package
- http://pysal.org/pysal/
#CODE MLxtend - Library of extension and helper modules for Python’s data analysis and machine learning libraries
#CODE H2O
- http://www.h2o.ai/
- https://github.com/h2oai/h2o-3
- https://github.com/h2oai/h2o4gpu
- https://github.com/h2oai/h2o-tutorials
- #TALK Getting started with H2O on Python (pydata)
- http://www.jowanza.com/post/156015716294/why-h2o-sparkling-water
- https://github.com/h2oai/h2o-3/tree/master/h2o-py/demos
#CODE Dlib (C++ with python interface)
#CODE Shogun
#CODE Vowpal Wabbit
- ML system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning
- http://hunch.net/~vw/
- https://github.com/JohnLangford/vowpal_wabbit/wiki
#CODE DMTK (Microsoft)
- http://www.dmtk.io/
- Light LDA - Scalable, fast, and lightweight system for large-scale topic modeling
#CODE [RAPIDS]( https://github.com/rapidsai, https://rapids.ai/) - GPU data science
- #CODE cuML - RAPIDS Machine Learning Library
- #CODE cuspatial - CUDA-accelerated GIS and spatiotemporal algorithms
- #CODE cuSignal - RAPIDS Signal Processing Library
- #CODE cuGraph - RAPIDS Graph Analytics Library
- #CODE cuDF - GPU DataFrame Library
- #CODE CuPy - NumPy-like API accelerated with CUDA
#CODE ArrayFire - High performance library for parallel computing with an easy-to-use API
- It enables users to write scientific computing code that is portable across CUDA, OpenCL and CPU devices. This project provides Python bindings for the ArrayFire library.
- https://arrayfire.com/
#CODE ThunderSVM - A Fast SVM Library on GPUs and CPUs
#CODE PyGAM - Generalized Additive Models in Python
- https://pygam.readthedocs.io
#CODE SurPRISE - A Python scikit for building and analyzing recommender systems
#CODE Facets
- visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive. The visualizations are implemented as Polymer web components, backed by Typescript code and can be easily embedded into Jupyter notebooks or webpages
- https://pair-code.github.io/facets/
#CODE PyCM
- PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters
- http://www.pycm.ir/
#CODE Pycircular - Python module for circular data analysis
- https://towardsdatascience.com/introducing-pycircular-a-python-library-for-circular-data-analysis-bfd696a6a42b

# References

# Subtopics

CarlosGG's Knowledge Garden 🪴

Machine Learning (ML)

# Resources

# Cheatsheets and notes

# Naive/homemade implementations

# Open datasets (for ML, DL and DS)

# Books

# Courses

# Code

# References

# Subtopics

# Feature selection

# Feature learning

# Anomaly and Outlier Detection

# Time Series analysis and forecasting

# AutoML

# Deep Learning

# Reinforcement learning

# Unsupervised learning

# Supervised learning

# Weakly-supervised learning

# One, few-shot learning

# Self-supervised learning

# Learning to rank and ordinal regression

# Multi task learning

# Generative modelling

# Explainable AI

# Federated learning

# Quantum ML

Backlinks

Interactive Graph