PostgreSQL Performance Tips
Finishing up notes on PostgreSQL "The SQL Language" documentation.
References
Notes
Query performance can be affected by many things. Some of these can be controlled by the user, while others are fundamental to the underlying design of the system.
Using EXPLAIN
PostgreSQL devises a query plan for each query it receives. Choosing the right plan to match the query structure and the properties of the data is absolutely critical for good performance, so the system includes a complex planner that tries to choose good plans. You can use EXPLAIN
command to see what query plan the planner creates for any query.
The structure of a query plan is a tree of plan nodes. Nodes at the bottom level of the tree are scan nodes: they return raw rows from a table. There are different types of scan nodes for different table access methods: sequential scans, index scans, and bitmap index scans. If the query requires joining, aggregation, sorting, or other operations on the raw rows, then there will be additional nodes above the scan nodes to perform these operations. The output of EXPLAIN
has one line for each node in the plan tree, showing the basic node type plus the cost estimates that the planner made for the execution of that plan node.
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
-------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=244)
Since this query has no WHERE
clause, it must scan all the rows of the table, so the planner has chosen to use a simple sequential scan plan. The numbers that are quoted in parentheses are (left to right):
- Estimated start-up cost: This is the time expended before the output phase can begin, e.g. the time to do sorting in a sort node.
- Estimated total cost: This is stated on the assumption that the plan node is run to completion, i.e., all available rows are retrieved.
- Estimated number of rows output by this plan node
- Estimated average width of rows output by this plan node (in bytes)
The cost of an upper-level node includes the cost of all its child nodes. The rows
value is the number of rows emitted by the node. This is often less than the number scanned, as a result of filtering by any WHERE
-clause conditions that are being applied at the node. The rows
estimate is only approximate.
The planner may implement an ORDER BY
clause in several ways. Compared to regular sorts, sorting incrementally allows returning tuples before the entire result set has been sorted, which particularly enables optimizations with LIMIT
queries.
It is possible to check the accuracy of the planner's estimates by using EXPLAIN
's ANALYZE
option. With this option, EXPLAIN
actually executes the query, and then displays the true row counts and true run time accumulated within each plan node, along with the same estimates that a plain EXPLAIN
shows.
Statistics Used by the Planner
The query planner needs to estimate the number of rows retrieved by a query in order to make good choices of query plans. One component of the statistics is the total number of entries in each table and index, as well as the number of sidk blocks occupied by each table and index. This information is kept in the table pg_class
, in the columns reltuples
and relpages
.
SELECT relname, relkind, reltuples, relpages
FROM pg_class
WHERE relname LIKE 'tenk1%';
relname | relkind | reltuples | relpages
----------------------+---------+-----------+----------
tenk1 | r | 10000 | 345
tenk1_hundred | i | 10000 | 11
tenk1_thous_tenthous | i | 10000 | 30
tenk1_unique1 | i | 10000 | 30
tenk1_unique2 | i | 10000 | 30
(5 rows)
Comments
You have to be logged in to add a comment
User Comments
There are currently no comments for this article.