PostgreSQL Common Table Expressions
As you begin to implement more complex functionality, you start to want to implement common table expressions more and more to simplify things. I want to learn more about PostgreSQL common table expressions because I oftentimes get the syntax wrong on first try.
References
- Another name for the table that the
CTE name
in the image above references istemporary table
Notes
WITH
provides a way to write auxiliary statements for use in a larger query. These statements, which are often referred to as Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query. Each auxiliary statement in aWITH
clause can beSELECT
,INSERT
,UPDATE
,DELETE
orMERGE
; and theWITH
clause itself is attached to a primary statement that can also be aSELECT
,INSERT
,UPDATE
,DELETE
, orMERGE
.
Select in WITH
The basic value of SELECT
in WITH
is to break down complicated queries into simpler parts:
WITH regional_sales AS (
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
), top_regions AS (
SELECT region
FROM regional_sales
WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
)
SELECT region,
product,
SUM(quantity) AS product_units,
SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;
Recursive Queries
The optional RECURSIVE
modifier changes WITH
from a mere syntactic convenience into a feature that accomplished things not otherwise possible in standard SQL. With RECURSIVE
, a WITH
query can refer to its own output. A very simple example is this query to sum the integers from 1 through 100:
WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;
The general form for a recursive WITH
query is always a non-recursive term, the UNION
(or UNION ALL
), then a recursive term, where only the recursive term can contain a reference to the query's own output.
Recursive Query Evaluation
- Evaluate the non-recursive term, For
UNION
(but notUNION ALL
), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table. - So long as the working table is not empty, repeat these steps:
- Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For
UNION
(but notUNION ALL
), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table. - Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
- Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For
In the example above, the working table has just a single row in each step, and it takes on the values from 1 through 100 in successive steps. In the 100th step, there is no output because of the WHERE
clause, and so the query terminates.
Recursive queries are typically used to deal with hierarchical or tree-structured data. A useful example is this query to find all the direct and indirect sub-parts of a product, given only a table that show immediate inclusions:
WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
UNION ALL
SELECT p.sub_part, p.part, p.quantity * pr.quantity
FROM included_parts pr, parts p
WHERE p.part = pr.sub_part
)
SELECT sub_part, SUM(quantity) as total_quantity
FROM included_parts
GROUP BY sub_part
Search Order
When computing a tree traversal using a recursive query, you might want to order the results in either depth-first or breadth-first order. This can be done by computing an ordering column alongside the other data columns and using that to sort the results at the end.
Bread First Ordering
WITH RECURSIVE search_tree(id, link, data) AS (
SELECT t.id, t.link, t.data
FROM tree t
UNION ALL
SELECT t.id, t.link, t.data
FROM tree t, search_tree st
WHERE t.id = st.link
)
SELECT * FROM search_tree;
Depth First Ordering
WITH RECURSIVE search_tree(id, link, data, path) AS (
SELECT t.id, t.link, t.data, ARRAY[t.id]
FROM tree t
UNION ALL
SELECT t.id, t.link, t.data, path || t.id
FROM tree t, search_tree st
WHERE t.id = st.link
)
SELECT * FROM search_tree ORDER BY path;
Cycle Detection
When working with recursive queries, it is important to be sure that the recursive part of the query will eventually return no tuples, or else the query will loop indefinitely. Sometimes, using UNION
instead of UNION ALL
can accomplish this by discarding rows that duplicate previous output rows.
There us built-in syntax to simplify cycle detection:
WITH RECURSIVE search_graph(id, link, data, depth) AS (
SELECT g.id, g.link, g.data, 1
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1
FROM graph g, search_graph sg
WHERE g.id = sg.link
) CYCLE id SET is_cycle USING path
SELECT * FROM search_graph;
The CYCLE
clause specifies first the list of columns to track for cycle detection, then a column name that will show whether a cycle has been detected, and finally the name of another column that will track the path. The cycle and path columns will implicitly be added to the output rows of the CTE. A helpful trick for testing queries when you are not certain if they might loop is to place a LIMIT
in the parent query.
WITH RECURSIVE t(n) AS (
SELECT 1
UNION ALL
SELECT n+1 FROM t
)
SELECT n FROM t LIMIT 100;
PostgreSQL's implementation evaluates only as many rows of a WITH
query as are actually fetched by the parent query.
Common Table Expression Materialization
A useful property of WITH
queries is that they are normally evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH
queries. Thus, expensive calculates that are needed in multiple places can be placed within a WITH
query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. If a WITH
query is non-recursive and side-effect free (that is, it is a SELECT
containing no volatile functions) then it can be folded into the parent query, allowing joint optimization of the two query levels.
By default, this happens if the parent query references theWITH
query just once, but not if it references theWITH
query more than once. You can override that decision by specifyingMATERIALIZED
to force separate calculation of theWITH
query, or by specifyingNOT MATERIALIZED
to force it to be merged into the parent query.
You probably want to use NOT MATERIALIZED
when querying a CTE name returned from the CTE body where the CTE name table has an index that matters, and you want to use materialized when the CTE body contains an expensive function.
Data Modifying Statements in WITH
You can use data-modifying statements (INSERT
, UPDATE
, DELETE
, or MERGE
) in WITH
. This allows you to perform several different operations in the same query:
WITH moved_rows AS (
DELETE FROM products
WHERE
"date" >= '2010-10-01' AND
"date" < '2010-11-01'
RETURNING *
)
INSERT INTO products_log
SELECT * FROM moved_rows;
This query effectively moves rows from products
to products_log
. Data-modifying statements in WITH
are executed exactly once, and always to completion, independently of whether the primary query reads all (or indeed any) of their output.
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot “see” one another's effects on the target tables.
Comments
You have to be logged in to add a comment
User Comments
There are currently no comments for this article.