Database model for hierarchical content
Reading this as I am having trouble implementing nested comments.
References
Notes
You’re probably familiar with multi-level comments sections such as on Facebook, Reddit, or Hackernews — a user can reply to a post, and the system allows multiple levels of nested replies. I recently needed to implement it myself for a side project, and I found a few ways of modeling the database, with different complexities and tradeoffs. In this article, I’m going to cover a few of them.
Current Schema
CREATE TABLE comments (
id serial PRIMARY KEY,
author text,
content text,
created_at timestamp,
updated_at timestamp
);
1-Level Depth Replies
In the case of one-level depth comments, you need to extend the Comments
table by adding a parent_id
column to implement this scenario. The new column will have a foreign key relation to the id of the column. It will be a many-to-one relation, which means a comment can have multiple comments that point to it as a parent.
To fetch the comments, you could either:
- Fetch the top level comments then fetch their replies:
-- select base comments
SELECT * FROM "Comments" WHERE parent_id is NULL;
-- fetch the replies to the content
SELECT * FROM "Comments" WHERE parent_id = 'comment_id';
- Fetch all the comments at once with a bit more complex SQL query - which can be implemented in many different ways.
SELECT c.*, replies
FROM "Comments" c
LEFT JOIN (
SELECT
parent_id,
COALESCE(json_agg(row_to_json(replies)), '[]'::JSON) AS replies
FROM
"Comments" AS replies
GROUP BY
parent_id) AS replies ON c.id = replies.parent_id
WHERE c.parent_id IS NULL;
Multi-level Replies
Choosing the suitable data model and implementation depends not only on the ease of the backend implementation but also on the frontend. How the comments are shown can influence DB model decisions. DO you need all nested comments at once vs can you show more
?
Multiple approaches to nested comments:
- Fetch only the base comments and have a show replies button below each. After a user clicks the
show replies
button, we fetch the replies and then each reply will have itsshow replies
button. And then it can be as nested as you want. You can use the same queries as we mentioned earlier. - We can also bring a bit more data at first - the base comments and 1-level replies, and then we can have a button below each reply.
- We fetch the base comments and a maximum of one reply to each, with the
show more
button to get all the replies. It can also happen on a more nested level.
Using a Common Recursive Expression
Common table expressions allow you to create a helper query that you can use in the main query.
WITH cte_replies AS (
SELECT
parent_id,
COALESCE(json_agg(row_to_json(replies)),
'[]'::JSON) AS replies
FROM
"Comments" AS replies
GROUP BY
parent_id
)
SELECT
*
FROM
"Comments" c
LEFT JOIN cte_replies ON c.id = cte_replies.parent_id
WHERE
c.parent_id IS NULL;
It is the same query we used before getting comments and replies but written using a common table expression. A recursive common table expression can reference itself, which is helpful when querying hierarchical data.
If we ant to fetch the comments and all of their replies, then the tricky part is calculating the hierarchies. It sounds like a lot of mapping through the list to obtain the comments tree as an object. That's when a recursive table expression can help. We can construct a query to get results with a hierarchy collected as a path. The base comment will have an empty path as it doesn't have a parent, and the replies will have it constructed from the previous comments' ids. In the above diagram, comment 8 would have a path of value 5/6/7
.
WITH RECURSIVE comments_cte (
id,
path,
content,
author
) AS (
SELECT
id,
'',
content,
author
FROM
"Comments"
WHERE
parent_id IS NULL
UNION ALL
SELECT
r.id,
concat(path, '/', r.parent_id),
r.content,
r.author
FROM
"Comments" r
JOIN comments_cte ON comments_cte.id = r.parent_id
)
SELECT
*
FROM
comments_cte;
The above query recursively constructs the comment's path by repeated joins with self. It returns the comments' id, path, content, and author. The result from the diagram above would be:
id | path |
---|---|
1 | |
2 | |
3 | /1 |
4 | /1 |
5 | |
6 | /5 |
7 | /5/6 |
8 | /5/6/7 |
9 |
This approach can get out of hand when adding pagination on top of that.
Using an additional path
column
Another solution that doesn't require calculating them while fetching comments includes adding a new path
column.
ALTER TABLE comments ADD COLUMN path TEXT;
The new schema means changing the way comments are added to the database. Regarding fetching the comments, you can query all of the entries if you need all of them. However, if you need pagination, this solution can be a good fit.
Query to select ten entries with an empty path and all entries where the path starts with the selected base comments:
WITH base_comments AS (
SELECT
*
FROM
"Comments"
WHERE
path IS NULL
LIMIT 10 -- optional
OFFSET 0 -- optional
) (
SELECT
*
FROM
"Comments" replies
WHERE
replies.path ~ ANY (
SELECT
id
FROM
base_comments))
UNION ALL
SELECT
*
FROM
base_comments;
Using ltree
If you’re using Postgres, you might be interested in checking out altree
data type. It represents labels of data stored in a hierarchical tree-like structure. Postgres provides manyltree
operators to search through the hierarchical data, for example, searching for ancestors or descendants.
A ltree's label is a sequence of alphanumeric characters and underscores less than 256 bytes long. A label path is a sequence of labels separated by dots, e.g. 1.2.3
. To use this data type, we need to add an extension to Postgres and create a new column:
CREATE EXTENSION IF NOT EXISTS ltree;
ALTER TABLE "Comments" ADD COLUMN path ltree;
We can then use the <@
operator to get all nested replies of a particular comment:
SELECT * FROM "Comments" WHERE path <@ 'comment_id';
ltree
This module implements a data type ltree
for representing labels of data stored in a hierarchical tree-like structure. Extensive facilities for searching through label trees are provided.
A label is a sequence of alphanumeric characters and underscores. Labels must be less than 256 bytes long. A label path is a sequence of zero or more labels separated by dots, e.g. L1.L2.L3
, representing a path from the root of a hierarchical tree to a particular node. The length of the label should be kept at under 2kB. The ltree
module provides several data types:
ltree
stores a label pathlquery
represents a regular-expression-like-pattern for matchingltree
values. A simple word matches that label within a path. A star symbol (*
) matches zero or more levels.
foo Match the exact label path foo
*.foo.* Match any label path containing the label foo
*.foo Match any label path whose last label is foo
Star symbols can also be quantified to restrict how many labels they can match:
*{n} Match exactly n labels
*{n,} Match at least n labels
*{n,m} Match at least n but not more than m labels
*{,m} Match at most m labels — same as *{0,m}
There are several modifiers that can be put at the end of a non-star label in lquery
to make it match more than just the exact match:
@ Match case-insensitively, for example a@ matches A
* Match any label with this prefix, for example foo* matches foobar
% Match initial underscore-separated words
Also, you can write several possibly-modified labels separated with |
(OR) to match any of those labels, and you can put !
(NOT) at the start to match any label that doesn't match and of the alternatives.
An annotated example:
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
a. b. c. d. e.
- begins with the label
Top
- and next has zero to two labels before
- a label beginning with the case-insensitive prefix
sport
- then a label not matching
football
nortennis
- and then ends with a label beginning with
Russ
or exactly matchingSpain
- begins with the label
Operator | Returns | Description |
---|---|---|
|
| is left argument an ancestor of right (or equal)? |
|
| is left argument a descendant of right (or equal)? |
|
| does ltree match lquery? |
|
| does ltree match lquery? |
|
| does ltree match any lquery in array? |
|
| does ltree match any lquery in array? |
|
| does ltree match ltxtquery? |
|
| does ltree match ltxtquery? |
|
| concatenate ltree paths |
|
| convert text to ltree and concatenate |
|
| convert text to ltree and concatenate |
|
| does array contain an ancestor of ltree? |
|
| does array contain an ancestor of ltree? |
|
| does array contain a descendant of ltree? |
|
| does array contain a descendant of ltree? |
|
| does array contain any path matching lquery? |
|
| does array contain any path matching lquery? |
|
| does ltree array contain any path matching any lquery? |
|
| does ltree array contain any path matching any lquery? |
|
| does array contain any path matching ltxtquery? |
|
| does array contain any path matching ltxtquery? |
|
| first array entry that is an ancestor of ltree; NULL if none |
|
| first array entry that is a descendant of ltree; NULL if none |
|
| first array entry that matches lquery; NULL if none |
|
| first array entry that matches |
ltree
supports several types of indexes that can speed up the indicated operators:
- B-tree index over ltree:
<
,<=
,=
,>=
,>
- GiST index over
ltree
:<
,<=
,=
,>=
,>
,@>
,<@
,@
,~
,?
Comments
You can read more about how comments are sorted in this blog post.
User Comments
There are currently no comments for this article.