PostgreSQL Parallel Query
Finishing up notes on PostgreSQL "The SQL Language" documentation.
References
Notes
PostgreSQL can devise query plans that can leverage multiple CPUs in order to answer queries faster. This feature is known as parallel query. Many queries cannot benefit from parallel query, but some queries can run more than twice/four times as fast with parallel query. Queries that touch a large amount of data but return only a few rows to the user will typically benefit the most.
How Parallel Query Works
When the optimizer determines that parallel query is the fastest execution strategy for a particular query, it will create a query plan that includes a Gather or Gather Merge node. Here is a simple example:
EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
QUERY PLAN
-------------------------------------------------------------------------------------
Gather (cost=1000.00..217018.43 rows=1 width=97)
Workers Planned: 2
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..216018.33 rows=1 width=97)
Filter: (filler ~~ '%x%'::text)
(4 rows)
In all cases, the Gather
or Gather Merge
node will have exactly one child plan, which is the portion of the plan that will be executed in parallel. If the Gather
or Gather Merge
node is at the very top of the plan tree, then the entire query will execute in parallel. If the Gather
or Gather Merge
node is at the very top of the plan tree, then the entire query plan will execute in parallel. If it is somewhere else in the plan tree, then only the portion of the plan below it will run in parallel. Using EXPLAIN
, you can see the number of workers chosen by the planner. Every background worker process that is successfully started for a given parallel query will execute the parallel portion of the plan. The leader will also execute that portion of the plan, but it has an additional responsibility: it must read all of the tuples generated by the workers.
Where Can Parallel Query be Used?
In order for any parallel query plans whatsoever to be generated, the following settings must be configured as indicated:
max_parallel_workers_per_gather
: must be set to a value that is greater than 0.
In addition, the system must not be running in single-user mode. Since the entire database system is running as a single process in this situation, no background workers will be available. Even when it is possible for parallel query plans to be generated, the planner will not generate them for a given query if any of the following are trye:
- The query writes any data or locks any database rows.
- The query might be suspended during execution.
- The query uses any function marked
PARALLEL UNSAFE
- The query is running inside of another query that is already parallel.
Parallel Plans
Because each worker executes the parallel portion of the plan to completion, it is not possible to simply take an ordinary query plan and run it using multiple workers. Each worker would produce a full copy of the output result set, so the query would not run any faster than normal but would produce incorrect results. Instead, the parallel portion of the plan must be what is known internally to the query optimizer as a partial plan; that is, it must be constructed so that each process that executes the plan will generate only a subset of the output rows in such a way that each required output row is guaranteed to be generated by exactly one of the cooperating processes.
There are various types of parallel scans, parallel joins, and parallel aggregation.
Comments
You have to be logged in to add a comment
User Comments
There are currently no comments for this article.