What I've Learned Building Interactive Embedding Visualizations

I want to read a blog post about building interactive embedding visualizations because it is something that I want to do.

Date Created:
1 41



Note: When I is used in blockquotes / quotes in this note, it is referring to the author of the article linked above.

After completing my most recent attempt, I believe I've come up with a solid process for building high-quality interactive embedding visualizations for a variety of different kinds of entity relationship data.

Embeddings are a way of representing entities as points in N-dimensional space. These entities can be things such as words, products, people, tweets - anything that can be related to something else. The idea is to pick coordinates for each entity such that similar/related entities are near each other and vice versa.

This blog post suggests somewhere between 10k-50k entities for creating embeddings. Entities are an individual data value - one value per entity per instance. The foundation of the embedding-building process involves creating a co-occurrence matrix out of the raw source data. This is a square matrix where the size is equal to the number of entities you're embedding. The idea is that every time you find entity_n and entity_m in the same collection, you increment cooc_matrix[n][m]. For some types of entities, you may have some additional data available that can be used to determine to what degree two entities are related. Since co-occurrence matrices are square, they grow exponentially with the number of entities being embedded (Might be a good idea to use a sparse matrix in Python). The process of generating the co-occurrence matrix, since it is , can be computationally intensive. Once you've built your co-occurrence matrix, you have all the data that you need to create the embedding.

PyMDE is a Python library for implementing an algorithm called Minimum Distortion Embedding. It's the main workhorse of the embedding-generation process and very powerful + versatile. It can embed high-dimensional vectors or graphs natively. embedding_dim is probably the most important parameter in pre-processing and perhaps the most important parameter in the whole embedding process. This blog post suggests embedding into higher dimensions as an intermediary step and the using a different algorithm to project that embedding down to 2 dimensions.

This blog post recommends UMAP and t-SNE for projecting embeddings down to 2-D and 3-D.

WebGL or WebGL/WebGPU-powered libraries like Pixi.JS are great choices for building these kinds of visualizations [visualizations for embeddings].

You can read more about how comments are sorted in this blog post.

User Comments

Insert Math Markup

About Inserting Math Content
Display Style:

Embed News Content

About Embedding News Content

Embed Youtube Video

Embedding Youtube Videos

Embed TikTok Video

Embedding TikTok Videos

Embed X Post

Embedding X Posts

Embed Instagram Post

Embedding Instagram Posts

Insert Details Element


Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table


Insert Horizontal Rule



View Content At Different Sizes


Edit Style of Block Nodes


Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.


Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

Vertical Align:
Border Style:

Edit Table


Upload Files


Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload Jupyter Notebook


Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML


Edit Image


Insert Columns Layout

Column Type:

Select Code Language

Select Coding Language

Upload Previous Version of Editor State