Machine Learning (ML)
The use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyse and draw inferences from patterns in data
# Resources
- https://github.com/josephmisiti/awesome-machine-learning
- The Illustrated Machine Learning website
- Rules of ML (Google)
- Jason’s Machine Learning 101 (Google)
- Machine Learning Glossary (Google)
- ML Resources (MIT student)
- Machine Learning & Deep Learning Tutorials
- A visual introduction to machine learning
- ML Algorithms: Strengths and Weaknesses
- A friendly introduction to linear algebra for ML (ML Tech Talks)
- Best practices for ML engineering (Google)
- Recommendation System Algorithms
- Training Machine Learning Models More Efficiently with Dataset Distillation
- Codelabs - AI & ML
# Cheatsheets and notes
- https://github.com/afshinea/stanford-cs-229-machine-learning/blob/master/en/super-cheatsheet-machine-learning.pdf
- https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks
- ML-AI guide
# Naive/homemade implementations
- https://github.com/trekhleb/homemade-machine-learning
- https://github.com/anhquan0412/basic_model_scratch
- https://github.com/rushter/MLAlgorithms
- https://github.com/ahmedbesbes/Neural-Network-from-scratch
- https://github.com/eriklindernoren/ML-From-Scratch
# Open datasets (for ML, DL and DS)
See AI/DS and DataEng/Open ML data
# Books
- #BOOK An Introduction to Statistical Learning (James 2013, SPRINGER)
- #BOOK The elements of statistical learning (Hastie 2015, SPRINGER)
- #BOOK Recommender Systems - The Textbook (Aggarwal, 2016 SPRINGER)
- #BOOK Mathematics for ML (Deisenroth, 2020 CAMBRIDGE)
- #BOOK Introduction to Machine Learning with Python - A Guide for Data Scientists (Muller, 2016 O’REILLY) - https://github.com/amueller/introduction_to_ml_with_python
- #BOOK Machine Learning for Dummies (Hurwitz, 2018 WILEY-IBM)
- #BOOK Python Machine Learning (Raschka 2019, PACKT)
- #BOOK Mastering Machine Learning with scikit-learn (Hackeling 2014, PACKT)
- #BOOK Designing Machine Learning Systems with Python (Julian 2016, PACKT)
- #BOOK Evaluating Machine Learning Models (Zheng 2015, OREILLY)
- #BOOK Introduction to Machine Learning Interviews Book
# Courses
- #COURSE Machine Learning (CS229, Stanford)
- #COURSE Machine Learning (Coursera-Stanford)
- #COURSE Machine Learning Crash Course with TensorFlow APIs (Google)
- #COURSE Data Mining and Machine Learning (STAT 365/665, Yale)
- #COURSE Applied machine learning (U Columbia)
- #COURSE L’apprentissage face à la malédiction de la grande dimension (College de France)
- #COURSE The Machine Learning Summer School, MLSS Tubingen 2020 (virtual)
# Code
- See AI/DS and DataEng/ML Ops and AI/DS and DataEng/Cloud platforms
- #CODE Benchmarks of ML libraries
- #CODE Ludwig - declarative machine learning framework
- #CODE
Scikit-learn
- http://scikit-learn.org/stable/
- Contrib packages
- #TALK PyData tutorial by Sebastian Raschka
- #CODE
Lightning
- Large-scale linear classification, AI/Supervised Learning/Regression and ranking (AI/Learning to rank] in Python
- #CODE
MAPIE
- #PAPER MAPIE: an open-source library for distribution-free uncertainty quantification
- A scikit-learn-compatible module for estimating prediction intervals for single-output regression or multi-class classification settings
- #CODE
scikit-learn-intelex
- Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
- https://intel.github.io/scikit-learn-intelex/
- #CODE mlinsights
- #CODE
PyCaret
- https://pycaret.org/
- PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows
- PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and many more
- PyCaret >= 2.2 provides the option to use GPU for select model training and hyperparameter tuning
- #CODE Hypertools - Python toolbox for visualizing and manipulating high-dimensional data
- #CODE PySAL - Python Spatial Analysis Library Meta-Package
- #CODE MLxtend - Library of extension and helper modules for Python’s data analysis and machine learning libraries
- #CODE H2O
- #CODE Dlib (C++ with python interface)
- #CODE Shogun
- #CODE
Vowpal Wabbit
- ML system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning
- http://hunch.net/~vw/
- https://github.com/JohnLangford/vowpal_wabbit/wiki
- #CODE DMTK (Microsoft)
- #CODE [RAPIDS]( https://github.com/rapidsai, https://rapids.ai/) - GPU data science
- #CODE
ArrayFire - High performance library for parallel computing with an easy-to-use API
- It enables users to write scientific computing code that is portable across CUDA, OpenCL and CPU devices. This project provides Python bindings for the ArrayFire library.
- https://arrayfire.com/
- #CODE ThunderSVM - A Fast SVM Library on GPUs and CPUs
- #CODE PyGAM - Generalized Additive Models in Python
- #CODE SurPRISE - A Python scikit for building and analyzing recommender systems
- #CODE
Facets
- visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive. The visualizations are implemented as Polymer web components, backed by Typescript code and can be easily embedded into Jupyter notebooks or webpages
- https://pair-code.github.io/facets/
- #CODE
PyCM
- PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters
- http://www.pycm.ir/
- #CODE Pycircular - Python module for circular data analysis
# References
- See AI/AI-ML-DL for scientific discovery
- #PAPER Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence (Raschka 2020)
- #PAPER How to avoid machine learning pitfalls: a guide for academic researchers (Lones 2021)
- #PAPER Pen and Paper Exercises in Machine Learning (Gutmann 2022)
- #PAPER How to avoid machine learning pitfalls: a guide for academic researchers (Lones 2023)
- #PAPER
Questionable practices in machine learning (2024)
- The paper categorizes the 43 questionable research practices (QRPs) in machine learning into several broad areas, such as:
- Data Handling: Issues like cherry-picking data, inappropriate data splits, and using test data in training.
- Model Evaluation: Inadequate baselines, selective reporting of results, and misuse of metrics.
- Experimental Design: Running multiple experiments and only reporting successful ones.
- Reproducibility: Lack of transparency in code and data sharing.
- Publication Practices: Hype-driven narratives and insufficient detail in methods sections.
- The paper categorizes the 43 questionable research practices (QRPs) in machine learning into several broad areas, such as:
# Subtopics
# Feature selection
See AI/Supervised Learning/Feature selection
# Feature learning
# Anomaly and Outlier Detection
See AI/Anomaly and Outlier Detection
# Time Series analysis and forecasting
See AI/Time Series analysis and AI/Forecasting
# AutoML
See AI/AutoML
# Deep Learning
# Reinforcement learning
# Unsupervised learning
See AI/Unsupervised learning/Unsupervised learning
# Supervised learning
See AI/Supervised Learning/Supervised learning
# Weakly-supervised learning
See AI/Weakly-supervised learning. It includes these topics: AI/Semi-supervised learning, AI/Active learning and AI/Transfer learning
# One, few-shot learning
# Self-supervised learning
See AI/Self-supervised learning
# Learning to rank and ordinal regression
# Multi task learning
# Generative modelling
# Explainable AI
See AI/XAI
# Federated learning
# Quantum ML
See AI/QML