Database model for hierarchical content

Reading this as I am having trouble implementing nested comments.

Date Created:
1 44

References



Notes


You’re probably familiar with multi-level comments sections such as on Facebook, Reddit, or Hackernews — a user can reply to a post, and the system allows multiple levels of nested replies. I recently needed to implement it myself for a side project, and I found a few ways of modeling the database, with different complexities and tradeoffs. In this article, I’m going to cover a few of them.

Current Schema

CREATE TABLE comments (
id serial PRIMARY KEY,
author text,
content text,
created_at timestamp,
updated_at timestamp
);

1-Level Depth Replies

1-Level Depth Replies

In the case of one-level depth comments, you need to extend the Comments table by adding a parent_id column to implement this scenario. The new column will have a foreign key relation to the id of the column. It will be a many-to-one relation, which means a comment can have multiple comments that point to it as a parent.

To fetch the comments, you could either:

  1. Fetch the top level comments then fetch their replies:
-- select base comments
SELECT * FROM "Comments" WHERE parent_id is NULL;
-- fetch the replies to the content
SELECT * FROM "Comments" WHERE parent_id = 'comment_id';
  1. Fetch all the comments at once with a bit more complex SQL query - which can be implemented in many different ways.
SELECT	c.*,	replies
FROM "Comments" c
LEFT JOIN (
SELECT
parent_id,
COALESCE(json_agg(row_to_json(replies)), '[]'::JSON) AS replies
FROM
"Comments" AS replies
GROUP BY
parent_id) AS replies ON c.id = replies.parent_id
WHERE c.parent_id IS NULL;

Multi-level Replies

Choosing the suitable data model and implementation depends not only on the ease of the backend implementation but also on the frontend. How the comments are shown can influence DB model decisions. DO you need all nested comments at once vs can you show more?

Nested Comment Implementations

Multiple approaches to nested comments:

  1. Fetch only the base comments and have a show replies button below each. After a user clicks the show replies button, we fetch the replies and then each reply will have its show replies button. And then it can be as nested as you want. You can use the same queries as we mentioned earlier.
  2. We can also bring a bit more data at first - the base comments and 1-level replies, and then we can have a button below each reply.
  3. We fetch the base comments and a maximum of one reply to each, with the show more button to get all the replies. It can also happen on a more nested level.

Using a Common Recursive Expression

Common table expressions allow you to create a helper query that you can use in the main query.

WITH cte_replies AS (
SELECT
parent_id,
COALESCE(json_agg(row_to_json(replies)),
'[]'::JSON) AS replies
FROM
"Comments" AS replies
GROUP BY
parent_id
)
SELECT
*
FROM
"Comments" c
LEFT JOIN cte_replies ON c.id = cte_replies.parent_id
WHERE
c.parent_id IS NULL;

It is the same query we used before getting comments and replies but written using a common table expression. A recursive common table expression can reference itself, which is helpful when querying hierarchical data.

Hierarchical Comments

If we ant to fetch the comments and all of their replies, then the tricky part is calculating the hierarchies. It sounds like a lot of mapping through the list to obtain the comments tree as an object. That's when a recursive table expression can help. We can construct a query to get results with a hierarchy collected as a path. The base comment will have an empty path as it doesn't have a parent, and the replies will have it constructed from the previous comments' ids. In the above diagram, comment 8 would have a path of value 5/6/7.

WITH RECURSIVE comments_cte (
id,
path,
content,
author
) AS (
SELECT
id,
'',
content,
author
FROM
"Comments"
WHERE
parent_id IS NULL
UNION ALL
SELECT
r.id,
concat(path, '/', r.parent_id),
r.content,
r.author
FROM
"Comments" r
JOIN comments_cte ON comments_cte.id = r.parent_id
)
SELECT
*
FROM
comments_cte;

The above query recursively constructs the comment's path by repeated joins with self. It returns the comments' id, path, content, and author. The result from the diagram above would be:

id

path

1


2


3

/1

4

/1

5


6

/5

7

/5/6

8

/5/6/7

9


This approach can get out of hand when adding pagination on top of that.

Using an additional path column

Another solution that doesn't require calculating them while fetching comments includes adding a new path column.

ALTER TABLE comments ADD COLUMN path TEXT;

The new schema means changing the way comments are added to the database. Regarding fetching the comments, you can query all of the entries if you need all of them. However, if you need pagination, this solution can be a good fit.

Query to select ten entries with an empty path and all entries where the path starts with the selected base comments:

WITH base_comments AS (
SELECT
*
FROM
"Comments"
WHERE
path IS NULL
LIMIT 10 -- optional
OFFSET 0 -- optional
) (
SELECT
*
FROM
"Comments" replies
WHERE
replies.path ~ ANY (
SELECT
id
FROM
base_comments))
UNION ALL
SELECT
*
FROM
base_comments;

Using ltree

If you’re using Postgres, you might be interested in checking out a ltree data type. It represents labels of data stored in a hierarchical tree-like structure. Postgres provides many ltree operators to search through the hierarchical data, for example, searching for ancestors or descendants.

A ltree's label is a sequence of alphanumeric characters and underscores less than 256 bytes long. A label path is a sequence of labels separated by dots, e.g. 1.2.3. To use this data type, we need to add an extension to Postgres and create a new column:

CREATE EXTENSION IF NOT EXISTS ltree;
ALTER TABLE "Comments" ADD COLUMN path ltree;

We can then use the <@ operator to get all nested replies of a particular comment:

SELECT * FROM "Comments" WHERE path <@ 'comment_id';

ltree


This module implements a data type ltree for representing labels of data stored in a hierarchical tree-like structure. Extensive facilities for searching through label trees are provided.

A label is a sequence of alphanumeric characters and underscores. Labels must be less than 256 bytes long. A label path is a sequence of zero or more labels separated by dots, e.g. L1.L2.L3, representing a path from the root of a hierarchical tree to a particular node. The length of the label should be kept at under 2kB. The ltree module provides several data types:

  • ltree stores a label path
  • lquery represents a regular-expression-like-pattern for matching ltree values. A simple word matches that label within a path. A star symbol (*) matches zero or more levels.
foo         Match the exact label path foo
*.foo.* Match any label path containing the label foo
*.foo Match any label path whose last label is foo

Star symbols can also be quantified to restrict how many labels they can match:

*{n}        Match exactly n labels
*{n,} Match at least n labels
*{n,m} Match at least n but not more than m labels
*{,m} Match at most m labels — same as *{0,m}

There are several modifiers that can be put at the end of a non-star label in lquery to make it match more than just the exact match:

@           Match case-insensitively, for example a@ matches A
* Match any label with this prefix, for example foo* matches foobar
% Match initial underscore-separated words

Also, you can write several possibly-modified labels separated with | (OR) to match any of those labels, and you can put ! (NOT) at the start to match any label that doesn't match and of the alternatives.

An annotated example:

Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
a. b. c. d. e.
      1. begins with the label Top
      2. and next has zero to two labels before
      3. a label beginning with the case-insensitive prefix sport
      4. then a label not matching football nor tennis
      5. and then ends with a label beginning with Russ or exactly matching Spain

Operator

Returns

Description

ltree @> ltree

boolean

is left argument an ancestor of right (or equal)?

ltree <@ ltree

boolean

is left argument a descendant of right (or equal)?

ltree ~ lquery

boolean

does ltree match lquery?

lquery ~ ltree

boolean

does ltree match lquery?

ltree ? lquery[]

boolean

does ltree match any lquery in array?

lquery[] ? ltree

boolean

does ltree match any lquery in array?

ltree @ ltxtquery

boolean

does ltree match ltxtquery?

ltxtquery @ ltree

boolean

does ltree match ltxtquery?

ltree || ltree

ltree

concatenate ltree paths

ltree || text

ltree

convert text to ltree and concatenate

text || ltree

ltree

convert text to ltree and concatenate

ltree[] @> ltree

boolean

does array contain an ancestor of ltree?

ltree <@ ltree[]

boolean

does array contain an ancestor of ltree?

ltree[] <@ ltree

boolean

does array contain a descendant of ltree?

ltree @> ltree[]

boolean

does array contain a descendant of ltree?

ltree[] ~ lquery

boolean

does array contain any path matching lquery?

lquery ~ ltree[]

boolean

does array contain any path matching lquery?

ltree[] ? lquery[]

boolean

does ltree array contain any path matching any lquery?

lquery[] ? ltree[]

boolean

does ltree array contain any path matching any lquery?

ltree[] @ ltxtquery

boolean

does array contain any path matching ltxtquery?

ltxtquery @ ltree[]

boolean

does array contain any path matching ltxtquery?

ltree[] ?@> ltree

ltree

first array entry that is an ancestor of ltree; NULL if none

ltree[] ?<@ ltree

ltree

first array entry that is a descendant of ltree; NULL if none

ltree[] ?~ lquery

ltree

first array entry that matches lquery; NULL if none

ltree[] ?@ ltxtquery

ltree

first array entry that matches ltxtquery; NULL if none

ltree supports several types of indexes that can speed up the indicated operators:

  • B-tree index over ltree: <, <=, =, >=, >
  • GiST index over ltree: <, <=, =, >=, >, @>, <@, @, ~, ?


You can read more about how comments are sorted in this blog post.

User Comments

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Files

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language

Upload Previous Version of Editor State

ESC