What I've Learned Building Interactive Embedding Visualizations

I want to read a blog post about building interactive embedding visualizations because it is something that I want to do.

Date Created:
1 17

References



Notes


Note: When I is used in blockquotes / quotes in this note, it is referring to the author of the article linked above.

After completing my most recent attempt, I believe I've come up with a solid process for building high-quality interactive embedding visualizations for a variety of different kinds of entity relationship data.

Embeddings are a way of representing entities as points in N-dimensional space. These entities can be things such as words, products, people, tweets - anything that can be related to something else. The idea is to pick coordinates for each entity such that similar/related entities are near each other and vice versa.

This blog post suggests somewhere between 10k-50k entities for creating embeddings. Entities are an individual data value - one value per entity per instance. The foundation of the embedding-building process involves creating a co-occurrence matrix out of the raw source data. This is a square matrix where the size is equal to the number of entities you're embedding. The idea is that every time you find entity_n and entity_m in the same collection, you increment cooc_matrix[n][m]. For some types of entities, you may have some additional data available that can be used to determine to what degree two entities are related. Since co-occurrence matrices are square, they grow exponentially with the number of entities being embedded (Might be a good idea to use a sparse matrix in Python). The process of generating the co-occurrence matrix, since it is , can be computationally intensive. Once you've built your co-occurrence matrix, you have all the data that you need to create the embedding.

PyMDE is a Python library for implementing an algorithm called Minimum Distortion Embedding. It's the main workhorse of the embedding-generation process and very powerful + versatile. It can embed high-dimensional vectors or graphs natively. embedding_dim is probably the most important parameter in pre-processing and perhaps the most important parameter in the whole embedding process. This blog post suggests embedding into higher dimensions as an intermediary step and the using a different algorithm to project that embedding down to 2 dimensions.

This blog post recommends UMAP and t-SNE for projecting embeddings down to 2-D and 3-D.

WebGL or WebGL/WebGPU-powered libraries like Pixi.JS are great choices for building these kinds of visualizations [visualizations for embeddings].


Comments

You have to be logged in to add a comment

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Insert Chart

ESC

Use the search box below

Upload Previous Version of Article State

ESC