Dimensionality Reduction

I want to go through the Wikipedia series on Machine Learning and Data mining. Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

Date Created:
0 56



Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension.

Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.

Methods are commonly divided into linear and nonlinear approaches.

The process of feature selection aims to find a suitable subset of the input variables (features or attributes) for the task at hand. The three strategies are: the filter strategy (e.g., information gain), the wrapper strategy (e.g., accuracy-guided search), and the embedded strategy (features are added or removed while building the model based on prediction errors).

Feature Projection

Feature projection transforms the data from the high-dimensional space to a space of fewer dimensions. The data transformation may be linear, as in principal component analysis (PCA), but many nonlinear dimensionality reduction techniques also exist. For multidimensional data, tensor representation can be used in dimensionality reduction through multilinear subspace learning.

Visual Depiction of the Resulting PCA projection for a set of 2D points

A visual depiction of the resulting LDA projection for a set of 2D points.

Principal Component Analysis

The main linear technique for dimensionality reduction, principal component analysis, performs a linear mapping of the data to a lower-dimensional space in such a way that the variance of the data in the low-dimensional representation is maximized. In practice, the covariance matrix of the data is constructed and the eigenvectors of this matrix are computed. The eigen vectors that correspond to the largest eigenvalues (the principal components) can now be used to reconstruct a large fraction of the variance of the original data.

Non-negative matrix factorization (NMF)

NMF decomposes a non-negative matrix to the product of two non-negative ones, which has been a promising tool in fields where only non-negative signals exist, such as astronomy. NMF is well known since the multiplicative update rule by Lee & Seung, which has been continuously developed: the inclusion of uncertainties, the consideration of mussing data and parallel computation, sequential construction which lead s to the stability and linearity of NMF, as well as other updates including handling missing data in digital image processing.

Kernel PCA

Principal component analysis can be employed in a nonlinear way by means of the kernel trick. The resulting technique is capable of constructing nonlinear mappings that maximize the variance in the data. The resulting technique is called kernel PCA.

Graph-Based kernel PCA

Other prominent nonlinear techniques include manifold learning techniques such as Isomap, locally linear embedding (LLE), Hessian LLE, Laplacian eigenmaps, and methods based on tangent space analysis. These techniques construct a low-dimensional data representation using a cost function that retains local properties of the data, and can be viewed as defining a graph-based kernel for Kernel PCA.

Linear Discriminant Analysis

Linear Discriminant analysis (LDA) is a generalization of Fisher's linear discriminant, a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that categorizes and separates two or more classes of objects or events.

Generalized Discriminant Analysis

GDA deals with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to SVM.


T-distributed Stochastic Neighbor Embedding (t-SNE) is a nonlinear dimensionality reduction technique useful for the visualization of high-dimensional datasets.

You can read more about how comments are sorted in this blog post.

User Comments

Insert Math Markup

About Inserting Math Content
Display Style:

Embed News Content

About Embedding News Content

Embed Youtube Video

Embedding Youtube Videos

Embed TikTok Video

Embedding TikTok Videos

Embed X Post

Embedding X Posts

Embed Instagram Post

Embedding Instagram Posts

Insert Details Element


Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table


Insert Horizontal Rule



View Content At Different Sizes


Edit Style of Block Nodes


Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.


Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

Vertical Align:
Border Style:

Edit Table


Upload Files


Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload Jupyter Notebook


Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML


Edit Image


Insert Columns Layout

Column Type:

Select Code Language

Select Coding Language

Upload Previous Version of Editor State