PostgreSQL SQL Language Documentation

I want to read through the PostgreSQL SQL Language documentation because I realized my SQL skills weren't as good as I thought they were.

Date Created:

Last Edited:

1 143

References

PostgreSQL Documentation Part II: The SQL Language

Notes

This part describes the use of the SQL language in PostgreSQL. We start with describing the general syntax of SQL, then how to create tables, how to populate the database, and how to query it. The middle part lists the available data types and functions for use in SQL commands. Lastly, we address several aspects of importance for tuning a database.

The information is arranged so that a novice user can follow it from start to end and gain a full understanding of the topics without having to refer forward too many times. The chapters are intended to be self-contained, so that advanced users can read the chapters individually as they choose.

SQL Syntax

This chapter describes the syntax of SQL. It forms the foundation for understanding the following chapters which will go into detail about how SQL commands are applies to define and modify data.

Lexical Structure

SQL input consists of a sequence of commands. A command is composed of a sequence of tokens, terminated by a semicolon (;). The end of the input stream also terminates a command. Which tokens are valid depends on the syntax of the particular command.

A token can be a keyword, an identifier, a quoted identifier, a literal (or constant), or special character symbol. Tokens are normally separated by whitespace (space, tab, newline), but need not be if there is no ambiguity (which is generally only the case if a special character is adjacent to some other token type). Comments can occur in SQL input. They are not tokens; they are effectively equivalent to whitespace.

Key words are words that have a fixed meaning in the SQL language. Table names, attribute/column names, and other object names are called identifiers. SQL identifiers should only use a-z and underscores. The delimited identifier or quoted identifier is a formally enclosed arbitrary sequence of characters in double quotes ("). Quoting an identifier makes it case -sensitive, whereas unquoted names are always folded to lower case.

There are three kinds of implicitly-typed constants in PostgreSQL: strings, bit strings, and numbers.

To allow more readable queries in such situations, PostgreSQL provides another way, called “dollar quoting”, to write string constants. A dollar-quoted string constant consists of a dollar sign ($), an optional “tag” of zero or more characters, another dollar sign, an arbitrary sequence of characters that makes up the string content, a dollar sign, the same tag that began this dollar quote, and a dollar sign.

$$Dianne's horse$$
$SomeTag$Dianne's horse$SomeTag$

Numeric constants are accepted in these general forms:

digits
digits.[digits][e[+-]digits]
[digits].digits[e[+-]digits]
digitse[+-]digits

where digits is one or more decimal digits (0 through 9).

A constant of an arbitrary type can be entered using one of the following notations:

type 'string'
'string'::type
CAST ( 'string' AS type )

The string constant's text is passed to the input conversion routine for the type called type. The result is a constant of the indicated type.

Special Characters

A dollar sign $ followed by digits is used to represent a positional parameter in the body of a function definition or a prepared statement.
Parentheses () have their usual meaning to group expressions and enforce precedence. In some cases, parentheses are required as part of the fixed syntax of a particular SQL command.
Brackets [] are used to select the elements of an array
Commas , are used in some syntactical constructs to separate the elements of a list
The semicolon ; terminates an SQL command. It cannot appear anywhere within a command, except within a string constant or quoted identifier.
The color : is used to select slices from arrays.
The asterisk * is used in some contexts to denote all the fields of a table row or composite value. In the context of aggregate function, it means that the aggregate does not require any explicit parameter.
The period . is used in numeric constants, and to separate schema, table, and column values.

Value Expressions

Value expressions are used in a variety of contexts, such as in the target list of the SELECT command, as new column values in INSERT or UPDATE, or in search conditions in a number of commands. The result of a value expression is sometimes called a scalar, to distinguish it from the result of a table expression (which is a table). Value expressions are therefore also called scalar expressions (or even simply expressions). The expression syntax allows the calculation of values from primitive parts using arithmetic, logical, set, and other operations.

A value expression is one of the following:

a constant or literal value
a column reference
- correlation.columnname
- - correlation is the name of a table or an alias for a table. The correlation name and separating dot can be omitted if the column name is unique across all the tables being used in the current query
a positional parameter reference
- $number
- Used to indicate a value that is supplied externally to an SQL statement. Parameters are used in SQL function definitions and in prepared queries.
a subscripted expression
- expression[subscript]
- expression[lower_subscript:upper_subscript]
- Each subscript is itself an expression, which will be rounded to the nearest integer value. The array expression must be parenthesized; multiple subscripts can be concatenated when the original array is multidimensional.
a field selection expression
- If an expression yields a value of a composite type (row type), then a specific field of the row can be extracted by writing expression.fieldname

an operator invocation

a function call
- function_name([ expression [, expression ... ]] )
an aggregate expression
- An aggregate expression represents the application of an aggregate function across the rows selected by a query. An aggregate function reduced multiple inputs to a single output value, such as the sum or average of the inputs.
a window function call
- A window function call represents the application of an aggregate-like function over some portion of the rows selected by a query.
a type cast
- A type case specifies a conversion from one data type to another.
a collation expression
a scalar subquery
- A scalar subquery is an ordinary SELECT query in parentheses that returns one row with one column. The SELECT query is executed and the single returned value is used in the surrounding value expression. It is an error to use a query that returns more than one row or more than one column as a scalar subquery.

an array constructor
- An array constructor is an expression that builds an array value using values for its member elements. A simple array constructor consists of the key word ARRAY , a left square bracket, a lost of expressions, and finally a right square bracket.

SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');

a row constructor
- A row constructor is an expression that builds a row value (also called a composite value) using values for its member fields.
another value expression

The order of evaluation of subexpressions is not defined, In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any fixed order.

Calling Functions

PostgreSQL allows functions that have named parameters to be called using either positional or named notation. Named notation is especially useful for functions that have a large number of parameters, since it makes the associations between parameters and actual arguments more explicit and reliable. In positional notation, a function call is written with its argument values in the same order as they are defined in the function declaration. In named notation, the arguments are matched to the function parameters by name and can be written in any order.

CREATE FUNCTION concat_lower_or_upper(a text, b text, uppercase boolean DEFAULT false)
RETURNS text AS $$ SELECT CASE WHEN $3 THEN UPPER($1 || ' ' || $2) ELSE LOWER($1 || ' ' || $2) END; $$ LANGUAGE SQL IMMUTABLE STRICT;
-- Positional 
SELECT concat_lower_or_upper('Hello', 'World', true);
-- Named 
SELECT concat_lower_or_upper(a => 'Hello', b => 'World');
-- Mixed 
SELECT concat_lower_or_upper('Hello', 'World', uppercase => true);

Data Definitions

Table Basics

A table in a relational database is much like a table on paper: it consists of rows and columns. The number and order of the columns is fixed, and each column has a name. The number of rows is variable - it reflects how much data is stored at a given moment. SQL does not make any guarantees about the order of the rows in a table. Each column has a data type. The data type constrains the set of possible values that can be assigned to a column and assigns semantics to the data stored in the column so that it can be used for computations. To create a table, you use the CREATE TABLE command. In this command, you specify at least a name for the new table, the names of the columns, and the data type for each column. If you no longer need a table, you can remove it using the DROP TABLE command.

Default Values

A column can be assigned a default value. When a new row is created and no values are specified for some of the columns, those columns will be filled with their respective default values.

Identity Columns

An identity column is a special column that is generated automatically from an implicit sequence. It can be used to generate key values. To create an identity column, use the GENERATED ... AS IDENTITY clause in CREATE TABLE:

CREATE TABLE people (
    id bigint GENERATED ALWAYS AS IDENTITY,
    ...,
);

Generated Columns

A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables. There are two kinds of generated columns: stored and virtual. A stored generated column is computed when it is written (inserted or updated) and occupies storage as if it were a normal column. A virtual generated column occupies no storage and is computed when it is read.

CREATE TABLE people (
    ...,
    height_cm numeric,
    height_in numeric GENERATED ALWAYS AS (height_cm / 2.54) STORED
);

A generated column cannot be written to directly. In INSERT or UPDATE commands, a value cannot be specified for a generated column, but the keyword DEFAULT may be specified.

Constraints

See the daily reading article on PostgreSQL Constraints.

System Columns

Every table has several system columns that are implicitly defined by the system. Therefore, these names cannot be used as names of user-defined columns. You do not need to be concerned about these columns, just know they exist.

tableoid: The OID of the table containing this row
xmin: The identity (transaction ID) of the inserting transaction for this row version
cmin: The command identifier within the inserting transaction
xmax: The identity of the deleting transaction, or zero for an undeleted row version.
cmax: The command identifier within the deleting transaction
ctid: The physical location of the row version within its table

Modifying Tables

PostgreSQL provides a family of commands to make modifications to existing tables.

Add columns

Remove Columns
Add constraints

Remove Constraints
- The psql command \d tablename can be helpful to find names of constraints

Change Default values

Change column data types

Rename columns

Rename tables

-- ADD COLUMN
ALTER TABLE products ADD COLUMN description TEXT;
--  You can add constraints at the same time
ALTER TABLE products ADD COLUMN description TEXT CHECK (description <> '');
-- REMOVE COLUMN
ALTER TABLE products DROP COLUMN description;
--  Authorize dropping everything that depends on the column
ALTER TABLE products DROP COLUMN description CASCADE;
-- Add Constraint
ALTER TABLE products ADD CHECK (name <> '');
ALTER TABLE products ADD CONSTRAINT some_name UNIQUE (product_no);
ALTER TABLE products ADD FOREIGN KEY (product_group_id) REFERENCES product_groups;
-- Use this syntax to set not null constraint
ALTER TABLE products ALTER COLUMN product_no SET NOT NULL;
-- To remove a constraint you need to know its name. 
ALTER TABLE products DROP CONSTRAINT some_name;
-- Change Default Value
ALTER TABLE products ALTER COLUMN price SET DEFAULT 7.77;
-- Remove Default
ALTER TABLE products ALTER COLUMN price DROP DEFAULT;
-- Rename column
ALTER TABLE products RENAME COLUMN product_no TO product_number;
-- Rename Table
ALTER TABLE products RENAME TO items;

Privileges

When an object is created, it is assigned an owner. The owner is normally the role that executed the creation statement. For most kinds of objects, the initial state is that only the owner (or a superuser) can do anything with the object. To allow other roles to use it, privileges must be granted. There are different kinds of privileges: SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER, CREATE, CONNECT, TEMPORARY, EXECUTE, USAGE, SET, ALTER SYSTEM, and MAINTAIN.

An object can be assigned to a new owner with an ALTER command of the appropriate kind for the object. To assign privileges, the GRANT command is used. Writing ALL in the place of a specific privilege grants all privileges that are relevant for the object type. To revoke a previously-granted privilege, use the REVOKE command.

ALTER TABLE table_name OWNER TO new_owner;
GRANT UPDATE on accounts to joe;
REMOVE ALL on accounts FROM PUBLIC;

Row Security Policies

In addition to the SQL-standard privilege system available through GRANT, tables can have row security policies that restrict, on a per-user basis, which rows can be returned by normal queries or inserted, updated, or deleted by data modification commands.

When row security is enabled on a table, all normal access to the table for selecting rows or modifying rows must be allowed by a row security policy. (The table's owner is typically not subject to row security policies.) Row security policies can be specific to commands, or to roles, or to both.

Schemas

A PostgreSQL database cluster contains one or more named databases. Roles and a few other object types are shared across the entire cluster. A client connection to the server can only access data in a single database, the one specified in the connection request. A database contains one or more named schemas, which in turn contain tables. Schemas contain other kinds of named objects, including data types, functions, and operators.

There are several reasons why one might want to use schemas:

To allow users to use one database without interfering with each other.
To organize database objects into logical groups to make them more manageable.
Third-party applications can be put into separate schemas so they do not collide with the names of other objects.

Inheritance

PostgreSQL implements table inheritance, which can be a useful tool for database designers.

CREATE TABLE cities (
    name            text,
    population      float,
    elevation       int     -- in feet
);

CREATE TABLE capitals (
    state           char(2)
) INHERITS (cities);

In this case, the capitals table inherits all the columns of its parent table, cities. State capitals also have an extra column, state, that shows their state.

Table Partitioning

Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits:

Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. Partitioning effectively substitutes for the upper tree level of indexes, making it more likely that the heavily-used parts of the indexes fit in memory,
When queries or updates access a large percentage of a single partition, performance can be improved by using a sequential scan of that partition instead of using an index, which would require random-access reads scattered across the whole table.
Bulk loads and deletes can be accomplished by adding or removing partitions, if the usage pattern is accounted for in the partitioning design. Dropping an individual partition using DROP TABLE, or doing ALTER TABLE DETACH PARTITION, is far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead by a bulk delete.
Seldom-used data can be migrated to cheaper and slower storage media.

Foreign Data

Foreign data is accessed with help from a foreign data wrapper. A foreign data wrapper is a library that can communicate with an external data source, hiding the details of connecting to the data source and obtaining from it.

Data Manipulation

Inserting Data

When a table i created, it contains no data. The first thing to do before a database can be of much use is to insert data. Data is inserted one row at a time. You can also insert more than one row in a single command, but it is not possible to insert something that is not a complete row. Even if you know only some column values, a complete row must be created.

It is good practice to always list the column names when inserting into the table.

Updating Data

The modification of data that is already in the database is referred to as updating. You can update individual rows, all the rows in a table, or a subset of all rows. Each column can be updated separately; the other columns are not affected. To update the existing rows, use the UPDATE command. This requires three pieces of information:

The name of the table and column to update
The new value of the column
Which row(s) to update

Returning Data from Modified Rows

Sometimes it is useful to obtain data from modified rows while they are being manipulated. The INSERT, UPDATE, DELETE , and MERGE commands all have an optional RETURNING clause that supports this. Use of RETURNING avoids performing an extra database query to collect the data, and is especially valuable when it would otherwise be difficult to identify the modified rows reliably. The allowed contents of a RETURNING clause are the same as a SELECT command's output list. It can contain column names of the command's target table, or value expressions using those columns. A common shorthand is RETURNING *, which selects all columns of the target table in order.

Queries

Overview

The process of retrieving or the command to retrieve data from a database is called a query. In SQL, the SELECT command is used to specify queries. The general syntax of the SELECT command is:

[WITH with_queries] SELECT select_list FROM table_expression [sort_specification];

Table Expressions

A table expression computes a table. The table expression contains a FROM clause that is optionally followed by WHERE, GROUP BY, and HAVING clauses. Trivial table expressions simply refer to a table on disk, a so-called base table, but more complex expressions can be used to modify or combine base tables in various ways. The optional WHERE, GROUP BY, and HAVING clauses in the table expressions specify a pipeline of successive transformations performed on the table derived in the FROM clause. All these transformations produce a virtual table that provides the rows that are passed to the select list to compute the output rows of the query.

The FROM clause derives a table from one or more other tables given a comma-separated table reference list.

FROM table_reference [, table_reference [, ...]]

A table reference can be a table name, or a derived table such as a subquery, a JOIN construct, or complex combination of these. If more than one table reference is listed in the FROM clause, the tables are cross-joined (that ism the =Cartesian product of their rows is formed). The result of the FROM list is an intermediate virtual table that can then be subject to transformations by the WHERE, GROUP BY, and HAVING clauses and is finally the result of the overall table expression.

Joined Tables

A joined table is a table derived from two other (real or derived) tables according to the rules of the particular join type. Inner, outer, and cross-joins are available. The general syntax of a joined table is:

T1 join_type T2 [ join_condition ]

Join Types

Cross Join

CROSS JOIN

T1 CROSS JOIN T2

For every possible combination of rows from T1 and T2 (i.e., a Cartesian product), the joined table will contain a row consisting of all columns in T1 followed by all columns in T2. If the tables have N and M rows respectively, the joined table will have N * M rows.

Qualified Joins

INNER JOIN
- For each row R1 of T1, the joined table has a row for each row in T2 that satisfies the join condition with R1.
LEFT OUTER JOIN
- First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Thus, the joined table always has at least one row for each in T1.
RIGHT OUTER JOIN
- First, an inner join is performed. Then, for each row in T2 that does not satisfy the join condition with any row in T1, a joined row is added with null values in columns of T1. This is the converse of a left join: the result table will always have a row for each row in T2.
FULL OUTER JOIN
- First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Also, for each row of T2 that does not satisfy the join condition with any row in T1, a joined row with null values in the columns of T1 is added.

The ON clause is the most general kind of join condition: it takes a Boolean value expression of the same kind as is used in a WHERE clause. A pair of rows from T1 to T2 match if the ON expression evaluates to true. The USING clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s).

Table and Column Aliases

A temporary name can be given to tables and complex table references to be used for references to the derived table in the rest of the query. This is called a table alias. To create a table alias, write:

FROM table_reference AS alias
-- or
FROM table_reference alias

A typical application of table aliases is to assign short identifiers to long table names to keep the join clauses readable.

Subqueries

Subqueries specifying a derived table must be enclosed in parentheses. They may be assigned a table alias name, and optionally column alias names.

FROM (SELECT * FROM table1) AS alias_name

The `WHERE` Clause

The syntax of the WHERE clause is:

WHERE search_condition

where search_condition is any value expression that returns a value of type boolean.

After the processing of the FROM clause is done, each row of the derived virtual table is checked against the search condition. If the result of the condition is true, the row is kept in the output table, otherwise it is discarded. The search condition typically references at least one column of the table generated in the FROM clause; this is not required, but otherwise the WHERE clause will be fairly useless.

The `GROUP BY` and `HAVING` Clauses

After passing the WHRE filter, the derived input table might be subject to grouping, using the GROUP BY clause, and elimination of group rows using the HAVING clause.

The GROUP BY clause is used to group together those rows in a table that have the same values in all the columns listed. The order in which the columns are listed does not matter. The effect is to combine each set of rows having common values into one group row that represents all rows in the group. In general, if a table is grouped, columns that are not listed in GROUP BY cannot be referenced except in aggregate expressions. An example with aggregate expressions is:

SELECT x, sum(y) FROM test1 GROUP BY x;

Select Lists

The table expression in the SELECT command constructs an intermediate virtual table by possibly combining tables, views, eliminating rows, grouping, etc. This table is finally passed on to processing by the select list. The select list determines which columns of the intermediate table are actually output.

The simplest kind of select list is * which emits all columns that the table expression produces. Otherwise, a select list is a comma-separated list of value expressions.

Distinct

After the select list has been processed, the result table can optionally be subject to the elimination of duplicate rows. The DISTINCT key word is written directly after SELECT to specify this:

SELECT DISTINCT select_list ...

Obviously, two rows are considered distinct if they differ in at least one column value. Null values are considered waul in this comparison. Alternatively, an arbitrary expression can determine what rows are to be considered distinct:

SELECT DISTINCT ON (expression [, expression ...]) select_list ...

Here expression is an arbitrary value expression that is evaluated for all rows. A set of rows for which all the expressions are equal are considered duplicates, and only the first row of the set is kept in the output.

Combining Queries

The results of two queries can be combined using the set operations union, intersection, and difference.

query1 UNION [ALL] query2
query1 INTERSECT [ALL] query2
query1 EXCEPT [ALL] query2

where query1 and query2 are queries that can use any of the features discussed up to this point.

UNION effectively appends the result of query2 to the result of query1 - with no order guarantee. It eliminates duplicate rows from its result, in the same was as DISTINCT, unless UNION ALL is used.

INTERSECT returns all rows that are both in the result of query1 and in the result of query2. Duplicate rows are eliminated unless INTERSECT ALL is used.

EXCEPT returns all rows that are in the result of query1 but not in the result of query2 (This is sometimes called the difference between the two queries). Again, the duplicates are eliminated unless EXCEPT ALL is used.

In order to calculate the union, intersection, or difference of two queries, the two queries must be union compatible, which means that they return the same umber of columns and the corresponding columns have compatible data types.

Sorting Rows (ORDER BY)

After a query has produced an output table, it can optionally be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and order on disk, but it must not be relied on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.

SELECT select_list
    FROM table_expression
    ORDER BY sort_expression1 [ASC | DESC] [NULLS { FIRST | LAST }]
             [, sort_expression2 [ASC | DESC] [NULLS { FIRST | LAST }] ...]

The sort expression(s) can be any expression that would be valid the query's select list.

`LIMIT` and `OFFSET`

LIMIT and OFFSET allow you to retrieve just a portion of the rows that are generate by the rest of the query:

SELECT select_list
    FROM table_expression
    [ ORDER BY ... ]
    [ LIMIT { count | ALL } ]
    [ OFFSET start ]

If a limit count is given, no more than that many rows will be returned. LIMIT ALL is the same as omitting the LIMIT clause, as is LIMIT with a NULL argument. OFFSET says to skip that many rows before beginning to return rows. OFFSET 0 is the same as omitting the OFFSET clause. If both OFFSET and LIMIT appear, then OFFSET rows are skipped before starting to count the LIMIT rows that are returned. When using LIMIT, it is important to use an ORDER BY clause that constrains the result rows into a unique order.

`VALUES` Lists

VALUES provides a way to generate a constraint table that can be used in a query without having to actually create and populate a table on-disk. The syntax is:

VALUES ( expression [, ...] ) [, ...]
VALUES (1, 'one'), (2, 'two'), (3, 'three');

`WITH` Queries (Common Table Expressions)

See the daily reading article on PostgreSQL Common Table Expressions.

Data Types

PostgreSQL has a rich set of native data types available to users. Users can add new types to PostgreSQL using the CREATE TYPE command.


Name	Aliases	Description
`bigint`	`int8`	signed eight-byte integer
`bigserial`	`serial8`	autoincrementing eight-byte integer
`bit [ (``n``) ]`		fixed-length bit string
`bit varying [ (``n``) ]`	`varbit [ (``n``) ]`	variable-length bit string
`boolean`	`bool`	logical Boolean (true/false)
`box`		rectangular box on a plane
`bytea`		binary data (“byte array”)
`character [ (``n``) ]`	`char [ (``n``) ]`	fixed-length character string
`character varying [ (``n``) ]`	`varchar [ (``n``) ]`	variable-length character string
`cidr`		IPv4 or IPv6 network address
`circle`		circle on a plane
`date`		calendar date (year, month, day)
`double precision`	`float8`	double precision floating-point number (8 bytes)
`inet`		IPv4 or IPv6 host address
`integer`	`int`, `int4`	signed four-byte integer
`interval [` `fields` `] [ (``p``) ]`		time span
`json`		textual JSON data
`jsonb`		binary JSON data, decomposed
`line`		infinite line on a plane
`lseg`		line segment on a plane
`macaddr`		MAC (Media Access Control) address
`macaddr8`		MAC (Media Access Control) address (EUI-64 format)
`money`		currency amount
`numeric [ (``p``,` `s``) ]`	`decimal [ (``p``,` `s``) ]`	exact numeric of selectable precision
`path`		geometric path on a plane
`pg_lsn`		PostgreSQL Log Sequence Number
`pg_snapshot`		user-level transaction ID snapshot
`point`		geometric point on a plane
`polygon`		closed geometric path on a plane
`real`	`float4`	single precision floating-point number (4 bytes)
`smallint`	`int2`	signed two-byte integer
`smallserial`	`serial2`	autoincrementing two-byte integer
`serial`	`serial4`	autoincrementing four-byte integer
`text`		variable-length character string
`time [ (``p``) ] [ without time zone ]`		time of day (no time zone)
`time [ (``p``) ] with time zone`	`timetz`	time of day, including time zone
`timestamp [ (``p``) ] [ without time zone ]`		date and time (no time zone)
`timestamp [ (``p``) ] with time zone`	`timestamptz`	date and time, including time zone
`tsquery`		text search query
`tsvector`		text search document
`txid_snapshot`		user-level transaction ID snapshot (deprecated; see `pg_snapshot`)
`uuid`		universally unique identifier
`xml`		XML data

The data types smallserial, serial, and bigserial are not true types, but merely a notational convenience for creating unique identifier columns.

The money type stores a currency amount with a fixed fractional precision.

PostgreSQL offers data types to store IPv4, IPv6, and MAC addresses. It is better to use these types instead of plain text types to store network addresses, because these types offer input error checking and specialized operators and functions.

PostgreSQL provides two data types that are designed to support full text search, which is the activity of searching through a collection of natural-language documents to locate those that best match a query. The texvector type represents a document in a form optimized for text search; the tsquery type similarly represents a text query.

The data type uuid stores Universally unique identifiers (UUID) as defined by RFC 4122. This identifier is a 128-bit quantity that is generated by an algorithm chosen to make it very unlikely that the same identifier will be generated by anyone else in the known universe using the same algorithm. Therefore, for distributed systems, these identifiers provide a better uniqueness guarantee than sequence generators, which are only unique within a single database.

The xml data type can be used to store XML data. Its advantage over storing XML data in a text field is that it checks the input values for well-formedness and there are support functions to perform type-safe operations on it. Use of this data type requires the installation to have been built with cconfigure --with-libxml.

JSON data types are for storing JSON data. Such data can also be stored as text, but the JSON data types have the advantage of enforcing that each stored value is valid according to the JSON rules. There are also assorted JSON-specific functions and operators available for data stored in these data types.

The json and jsonb data types accept almost identical sets of values as input. The major practical difference is one of efficiency. The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage.

PostgreSQL allows columns of a table to be defined as variable-length, multidimensional arrays. Arrays of any built-in or user-defined base type, enum type, composite type, range type, or domain can be created.

CREATE TABLE sal_emp (
    name            text,
    pay_by_quarter  integer[],
    schedule        text[][],
    squares   integer[3][3] -- You can specify the exact size of arrays
);

SELECT pay_by_quarter[1] FROM sal_emp; -- Arrays are 1-indexed
SELECT pay_by_quarter[1:3] FROM sal_emp; -- Slicing
SELECT pay_by_quarter[1:] FROM sal_emp; -- Slicing
SELECT pay_by_quarter[:3] FROM sal_emp; -- Slicing
-- Returns the array dimension as text
SELECT array_dims(schedule) FROM sal_emp;
-- Updating Arrays
UPDATE sal_emp SET pay_by_quarter = '{25000,25000,27000,27000}' WHERE name = 'Carol';
UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}' WHERE name = 'Carol';

A composite type represents the structure of a row or record; it is essentially just a list of field names and their data types. PostgreSQL allows composite types to be used in many of the same ways that simple types can be used.

Range types are data types representing a range of values of some element type (called the range's subtype). For instance, ranges of timestamp might be used to represent the ranges of time that a meeting room is reserved. In this case, the data type is tsrange (short for timestamp range) and timestamp is the subtype. The subtype must have a total order so that it is well-defined whether element values are within, before, or after a range of values. Range types are useful because they represent many element values in a single range value, and because concepts such as overlapping ranges can be expressed clearly.

A domain is a user-defined data type that is based on another underlying type. Optionally, it can have constraints that restrict its valid values to a subset of what the underlying type would allow. Otherwise, it behaves like the underlying type - for example, any operator or function that can be applied to the underlying type will work on the domain type. The underlying type can be any built-in or user-defined base type, enum type, composite type, range type, or another domain.

The PostgreSQL type system contains a number of special-purpose entries that are collectively called pseudo-types. A pseudo-type cannot be used as a column data type, but it can be used to declare a function's argument or result type. Each of the available pseudo-types is useful in situations where a function's behavior does not correspond to simply taking or returning a value of a specific SQL data type.

Indexes

See the note on PostgreSQL to see information on indexes.

Full Text Search

See the note on PostgreSQL Search.

Concurrency Control

This chapter describes the behavior of PostgreSQL database system when two or more sessions try to access the same data at the same time. The goals in that situation are to allow efficient access for all sessions while maintaining strict data integrity.

Introduction

PostgreSQL provides a rich set of tools for developers to manage concurrent access to data. Internally, data consistency is maintained by using a multiversion model (Multiversion Concurrency Control, MVCC). This means that each SQL statement sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. This prevents statements from viewing inconsistent data produced by concurrent transactions performing updates on the same data rows, providing transaction isolation for each database session. MVCC, by eschewing the locking methodologies of traditional database systems, minimizes lock contention in order to allow for reasonable performance in multiuser environments.

The main advantage of using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading.