Dimensionality reduction and low-rank modeling

Last updated Sep 7, 2022 Edit Source

# Resources and references

The Beginner’s Guide to Dimensionality Reduction
Distances, Neighborhoods, or Dimensions? Projection Literacy for the Analysis of Multivariate Data
Matrix Factorization: A Simple Tutorial and Implementation in Python
Sklearn - Decomposing signals in components (matrix factorization problems)
Projection techniques transform high-dimensional data to a lower-dimensional space while preserving its main structure. Often, the data is transformed to two-dimensional space and visualized as a scatter plot as a means to analyze and understand the data
Two categories: linear and non-linear projection techniques.

Linear projection techniques produce a linear transformation of data dimensions in lower-dimensional space. Proximity between data points indicates similarity. The more similar data points are, the closer they are located to each other and vice versa. This is why linear projection techniques are also known as global techniques.

https://en.wikipedia.org/wiki/Dynamic_mode_decomposition
linear dimensionality reduction technique for high-dimensional time-series originating from fluid dynamics. DMD combines the best of two worlds: PCA and Fourier transform. Mathematically, it is related to a fundamental operator in dynamical system theory known as the Koopman operator
A case against PCA for time-series analysis
- Recent studies have shown that DMD behaves as a source separation algorithm (e.g. ICA), although this framework can be more flexible
- For a similar computational cost, it moreover provides a far more interpretable model than PCA

Non-linear projection techniques, also known as local projection techniques, aim at preserving the local neighborhoods across the features in the data. Hereby, proximity highlights differences and coherences between observations and is not to put on the same level as similarity

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. It refers to a set of related ordination techniques used in information visualization, in particular to display the information contained in a distance matrix. It is a form of non-linear dimensionality reduction.

https://en.wikipedia.org/wiki/Self-organizing_map
unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher dimensional data set while preserving the topological structure of the data
https://stackabuse.com/self-organizing-maps-theory-and-implementation-in-python-with-numpy/

http://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
t-SNE is a tool to visualize high-dimensional data. It converts similarities between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data. t-SNE has a cost function that is not convex, i.e. with different initializations we can get different results.
t-SNE is a technique for nonlinear dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets. The technique can be implemented via Barnes-Hut approximations, allowing it to be applied on large real-world datasets. It is particularly well-suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points.
https://mark-borg.github.io/blog/2016/tsne/
https://blog.alookanalytics.com/2017/02/28/analytical-market-segmentation-with-t-sne-and-clustering-pipeline/
How to Use t-SNE Effectively (Interactive)

#PAPER UMAP - Uniform Manifold Approximation and Projection for Dimension Reduction (McInnes 2020)
- Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction
- #CODE Umap
- https://umap-learn.readthedocs.io/en/latest/
Understanding UMAP
- Nice interactive visualizations

#CODE Nimfa
#CODE Pymf
#CODE HyperSpy
- https://hyperspy.readthedocs.io/en/stable/user_guide/mva.html
- HyperSpy provides easy access to several “machine learning” algorithms that can be useful when analysing multi-dimensional data. In particular, decomposition algorithms, such as principal component analysis (PCA), or blind source separation (BSS) algorithms, such as independent component analysis (ICA), are available