PostgreSQL Performance Tips

Finishing up notes on PostgreSQL "The SQL Language" documentation.

Date Created:
1 81

References



Notes


Query performance can be affected by many things. Some of these can be controlled by the user, while others are fundamental to the underlying design of the system.

Using EXPLAIN

PostgreSQL devises a query plan for each query it receives. Choosing the right plan to match the query structure and the properties of the data is absolutely critical for good performance, so the system includes a complex planner that tries to choose good plans. You can use EXPLAIN command to see what query plan the planner creates for any query.

The structure of a query plan is a tree of plan nodes. Nodes at the bottom level of the tree are scan nodes: they return raw rows from a table. There are different types of scan nodes for different table access methods: sequential scans, index scans, and bitmap index scans. If the query requires joining, aggregation, sorting, or other operations on the raw rows, then there will be additional nodes above the scan nodes to perform these operations. The output of EXPLAIN has one line for each node in the plan tree, showing the basic node type plus the cost estimates that the planner made for the execution of that plan node.

EXPLAIN SELECT * FROM tenk1;

QUERY PLAN
-------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=244)

Since this query has no WHERE clause, it must scan all the rows of the table, so the planner has chosen to use a simple sequential scan plan. The numbers that are quoted in parentheses are (left to right):

  • Estimated start-up cost: This is the time expended before the output phase can begin, e.g. the time to do sorting in a sort node.
  • Estimated total cost: This is stated on the assumption that the plan node is run to completion, i.e., all available rows are retrieved.
  • Estimated number of rows output by this plan node
  • Estimated average width of rows output by this plan node (in bytes)

The cost of an upper-level node includes the cost of all its child nodes. The rows value is the number of rows emitted by the node. This is often less than the number scanned, as a result of filtering by any WHERE-clause conditions that are being applied at the node. The rows estimate is only approximate.

The planner may implement an ORDER BY clause in several ways. Compared to regular sorts, sorting incrementally allows returning tuples before the entire result set has been sorted, which particularly enables optimizations with LIMIT queries.

It is possible to check the accuracy of the planner's estimates by using EXPLAIN's ANALYZE option. With this option, EXPLAIN actually executes the query, and then displays the true row counts and true run time accumulated within each plan node, along with the same estimates that a plain EXPLAIN shows.

Statistics Used by the Planner

The query planner needs to estimate the number of rows retrieved by a query in order to make good choices of query plans. One component of the statistics is the total number of entries in each table and index, as well as the number of sidk blocks occupied by each table and index. This information is kept in the table pg_class, in the columns reltuples and relpages.

SELECT relname, relkind, reltuples, relpages
FROM pg_class
WHERE relname LIKE 'tenk1%';

relname | relkind | reltuples | relpages
----------------------+---------+-----------+----------
tenk1 | r | 10000 | 345
tenk1_hundred | i | 10000 | 11
tenk1_thous_tenthous | i | 10000 | 30
tenk1_unique1 | i | 10000 | 30
tenk1_unique2 | i | 10000 | 30
(5 rows)

You can read more about how comments are sorted in this blog post.

User Comments