PostgreSQL SQL Language Documentation
I want to read through the PostgreSQL SQL Language documentation because I realized my SQL skills weren't as good as I thought they were.
References
Notes
This part describes the use of the SQL language in PostgreSQL. We start with describing the general syntax of SQL, then how to create tables, how to populate the database, and how to query it. The middle part lists the available data types and functions for use in SQL commands. Lastly, we address several aspects of importance for tuning a database.
The information is arranged so that a novice user can follow it from start to end and gain a full understanding of the topics without having to refer forward too many times. The chapters are intended to be self-contained, so that advanced users can read the chapters individually as they choose.
SQL Syntax
This chapter describes the syntax of SQL. It forms the foundation for understanding the following chapters which will go into detail about how SQL commands are applies to define and modify data.
Lexical Structure
SQL input consists of a sequence of commands. A command is composed of a sequence of tokens, terminated by a semicolon (;). The end of the input stream also terminates a command. Which tokens are valid depends on the syntax of the particular command.
A token can be a keyword, an identifier, a quoted identifier, a literal (or constant), or special character symbol. Tokens are normally separated by whitespace (space, tab, newline), but need not be if there is no ambiguity (which is generally only the case if a special character is adjacent to some other token type). Comments can occur in SQL input. They are not tokens; they are effectively equivalent to whitespace.
Key words are words that have a fixed meaning in the SQL language. Table names, attribute/column names, and other object names are called identifiers. SQL identifiers should only use a-z and underscores. The delimited identifier or quoted identifier is a formally enclosed arbitrary sequence of characters in double quotes ("). Quoting an identifier makes it case -sensitive, whereas unquoted names are always folded to lower case.
There are three kinds of implicitly-typed constants in PostgreSQL: strings, bit strings, and numbers.
To allow more readable queries in such situations, PostgreSQL provides another way, called “dollar quoting”, to write string constants. A dollar-quoted string constant consists of a dollar sign ($), an optional “tag” of zero or more characters, another dollar sign, an arbitrary sequence of characters that makes up the string content, a dollar sign, the same tag that began this dollar quote, and a dollar sign.
$$Dianne's horse$$
$SomeTag$Dianne's horse$SomeTag$
Numeric constants are accepted in these general forms:
digits
digits.[digits][e[+-]digits]
[digits].digits[e[+-]digits]
digitse[+-]digits
where digits is one or more decimal digits (0 through 9).
A constant of an arbitrary type can be entered using one of the following notations:
type 'string'
'string'::type
CAST ( 'string' AS type )
The string constant's text is passed to the input conversion routine for the type called type. The result is a constant of the indicated type.
Special Characters
- A dollar sign
$
followed by digits is used to represent a positional parameter in the body of a function definition or a prepared statement. - Parentheses
()
have their usual meaning to group expressions and enforce precedence. In some cases, parentheses are required as part of the fixed syntax of a particular SQL command. - Brackets
[]
are used to select the elements of an array - Commas
,
are used in some syntactical constructs to separate the elements of a list - The semicolon
;
terminates an SQL command. It cannot appear anywhere within a command, except within a string constant or quoted identifier. - The color
:
is used to selectslices
from arrays. - The asterisk
*
is used in some contexts to denote all the fields of a table row or composite value. In the context of aggregate function, it means that the aggregate does not require any explicit parameter. - The period
.
is used in numeric constants, and to separate schema, table, and column values.
Value Expressions
Value expressions are used in a variety of contexts, such as in the target list of the SELECT
command, as new column values in INSERT
or UPDATE
, or in search conditions in a number of commands. The result of a value expression is sometimes called a scalar, to distinguish it from the result of a table expression (which is a table). Value expressions are therefore also called scalar expressions (or even simply expressions). The expression syntax allows the calculation of values from primitive parts using arithmetic, logical, set, and other operations.
A value expression is one of the following:
- a constant or literal value
- a column reference
correlation.columnname
- correlation is the name of a table or an alias for a table. The correlation name and separating dot can be omitted if the column name is unique across all the tables being used in the current query
- a positional parameter reference
$number
- Used to indicate a value that is supplied externally to an SQL statement. Parameters are used in SQL function definitions and in prepared queries.
- a subscripted expression
expression[subscript]
expression[lower_subscript:upper_subscript]
- Each subscript is itself an expression, which will be rounded to the nearest integer value. The array
expression
must be parenthesized; multiple subscripts can be concatenated when the original array is multidimensional.
- a field selection expression
- If an expression yields a value of a composite type (row type), then a specific field of the row can be extracted by writing
expression.fieldname
- If an expression yields a value of a composite type (row type), then a specific field of the row can be extracted by writing
- an operator invocation
- a function call
function_name([ expression [, expression ... ]] )
- an aggregate expression
- An aggregate expression represents the application of an aggregate function across the rows selected by a query. An aggregate function reduced multiple inputs to a single output value, such as the sum or average of the inputs.
- a window function call
- A window function call represents the application of an aggregate-like function over some portion of the rows selected by a query.
- a type cast
- A type case specifies a conversion from one data type to another.
- a collation expression
- a scalar subquery
- A scalar subquery is an ordinary
SELECT
query in parentheses that returns one row with one column. TheSELECT
query is executed and the single returned value is used in the surrounding value expression. It is an error to use a query that returns more than one row or more than one column as a scalar subquery.
- A scalar subquery is an ordinary
- an array constructor
- An array constructor is an expression that builds an array value using values for its member elements. A simple array constructor consists of the key word
ARRAY
, a left square bracket, a lost of expressions, and finally a right square bracket.
- An array constructor is an expression that builds an array value using values for its member elements. A simple array constructor consists of the key word
SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
- a row constructor
- A row constructor is an expression that builds a row value (also called a composite value) using values for its member fields.
- another value expression
The order of evaluation of subexpressions is not defined, In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any fixed order.
Calling Functions
PostgreSQL allows functions that have named parameters to be called using either positional or named notation. Named notation is especially useful for functions that have a large number of parameters, since it makes the associations between parameters and actual arguments more explicit and reliable. In positional notation, a function call is written with its argument values in the same order as they are defined in the function declaration. In named notation, the arguments are matched to the function parameters by name and can be written in any order.
CREATE FUNCTION concat_lower_or_upper(a text, b text, uppercase boolean DEFAULT false)
RETURNS text AS $$ SELECT CASE WHEN $3 THEN UPPER($1 || ' ' || $2) ELSE LOWER($1 || ' ' || $2) END; $$ LANGUAGE SQL IMMUTABLE STRICT;
-- Positional
SELECT concat_lower_or_upper('Hello', 'World', true);
-- Named
SELECT concat_lower_or_upper(a => 'Hello', b => 'World');
-- Mixed
SELECT concat_lower_or_upper('Hello', 'World', uppercase => true);
Data Definitions
Table Basics
A table in a relational database is much like a table on paper: it consists of rows and columns. The number and order of the columns is fixed, and each column has a name. The number of rows is variable - it reflects how much data is stored at a given moment. SQL does not make any guarantees about the order of the rows in a table. Each column has a data type. The data type constrains the set of possible values that can be assigned to a column and assigns semantics to the data stored in the column so that it can be used for computations. To create a table, you use the CREATE TABLE
command. In this command, you specify at least a name for the new table, the names of the columns, and the data type for each column. If you no longer need a table, you can remove it using the DROP TABLE
command.
Default Values
A column can be assigned a default value. When a new row is created and no values are specified for some of the columns, those columns will be filled with their respective default values.
Identity Columns
An identity column is a special column that is generated automatically from an implicit sequence. It can be used to generate key values. To create an identity column, use the GENERATED ... AS IDENTITY
clause in CREATE TABLE
:
CREATE TABLE people (
id bigint GENERATED ALWAYS AS IDENTITY,
...,
);
Generated Columns
A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables. There are two kinds of generated columns: stored and virtual. A stored generated column is computed when it is written (inserted or updated) and occupies storage as if it were a normal column. A virtual generated column occupies no storage and is computed when it is read.
CREATE TABLE people (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm / 2.54) STORED
);
A generated column cannot be written to directly. In INSERT
or UPDATE
commands, a value cannot be specified for a generated column, but the keyword DEFAULT
may be specified.
Constraints
See the daily reading article on PostgreSQL Constraints.
System Columns
Every table has several system columns that are implicitly defined by the system. Therefore, these names cannot be used as names of user-defined columns. You do not need to be concerned about these columns, just know they exist.
tableoid
: The OID of the table containing this rowxmin
: The identity (transaction ID) of the inserting transaction for this row versioncmin
: The command identifier within the inserting transactionxmax
: The identity of the deleting transaction, or zero for an undeleted row version.cmax
: The command identifier within the deleting transactionctid
: The physical location of the row version within its table
Modifying Tables
PostgreSQL provides a family of commands to make modifications to existing tables.
- Add columns
- Remove Columns
- Add constraints
- Remove Constraints
- The psql command
\d tablename
can be helpful to find names of constraints
- The psql command
- Change Default values
- Change column data types
- Rename columns
- Rename tables
-- ADD COLUMN
ALTER TABLE products ADD COLUMN description TEXT;
-- You can add constraints at the same time
ALTER TABLE products ADD COLUMN description TEXT CHECK (description <> '');
-- REMOVE COLUMN
ALTER TABLE products DROP COLUMN description;
-- Authorize dropping everything that depends on the column
ALTER TABLE products DROP COLUMN description CASCADE;
-- Add Constraint
ALTER TABLE products ADD CHECK (name <> '');
ALTER TABLE products ADD CONSTRAINT some_name UNIQUE (product_no);
ALTER TABLE products ADD FOREIGN KEY (product_group_id) REFERENCES product_groups;
-- Use this syntax to set not null constraint
ALTER TABLE products ALTER COLUMN product_no SET NOT NULL;
-- To remove a constraint you need to know its name.
ALTER TABLE products DROP CONSTRAINT some_name;
-- Change Default Value
ALTER TABLE products ALTER COLUMN price SET DEFAULT 7.77;
-- Remove Default
ALTER TABLE products ALTER COLUMN price DROP DEFAULT;
-- Rename column
ALTER TABLE products RENAME COLUMN product_no TO product_number;
-- Rename Table
ALTER TABLE products RENAME TO items;
Privileges
When an object is created, it is assigned an owner. The owner is normally the role that executed the creation statement. For most kinds of objects, the initial state is that only the owner (or a superuser) can do anything with the object. To allow other roles to use it, privileges must be granted. There are different kinds of privileges: SELECT
, INSERT
, UPDATE
, DELETE
, TRUNCATE
, REFERENCES
, TRIGGER
, CREATE
, CONNECT
, TEMPORARY
, EXECUTE
, USAGE
, SET
, ALTER SYSTEM
, and MAINTAIN
.
An object can be assigned to a new owner with an ALTER
command of the appropriate kind for the object. To assign privileges, the GRANT
command is used. Writing ALL
in the place of a specific privilege grants all privileges that are relevant for the object type. To revoke a previously-granted privilege, use the REVOKE
command.
ALTER TABLE table_name OWNER TO new_owner;
GRANT UPDATE on accounts to joe;
REMOVE ALL on accounts FROM PUBLIC;
Row Security Policies
In addition to the SQL-standard privilege system available through GRANT
, tables can have row security policies that restrict, on a per-user basis, which rows can be returned by normal queries or inserted, updated, or deleted by data modification commands.
When row security is enabled on a table, all normal access to the table for selecting rows or modifying rows must be allowed by a row security policy. (The table's owner is typically not subject to row security policies.) Row security policies can be specific to commands, or to roles, or to both.
Schemas
A PostgreSQL database cluster contains one or more named databases. Roles and a few other object types are shared across the entire cluster. A client connection to the server can only access data in a single database, the one specified in the connection request. A database contains one or more named schemas, which in turn contain tables. Schemas contain other kinds of named objects, including data types, functions, and operators.
There are several reasons why one might want to use schemas:
- To allow users to use one database without interfering with each other.
- To organize database objects into logical groups to make them more manageable.
- Third-party applications can be put into separate schemas so they do not collide with the names of other objects.
Inheritance
PostgreSQL implements table inheritance, which can be a useful tool for database designers.
CREATE TABLE cities (
name text,
population float,
elevation int -- in feet
);
CREATE TABLE capitals (
state char(2)
) INHERITS (cities);
In this case, the capitals
table inherits all the columns of its parent table, cities
. State capitals also have an extra column, state
, that shows their state.
Table Partitioning
Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits:
- Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. Partitioning effectively substitutes for the upper tree level of indexes, making it more likely that the heavily-used parts of the indexes fit in memory,
- When queries or updates access a large percentage of a single partition, performance can be improved by using a sequential scan of that partition instead of using an index, which would require random-access reads scattered across the whole table.
- Bulk loads and deletes can be accomplished by adding or removing partitions, if the usage pattern is accounted for in the partitioning design. Dropping an individual partition using
DROP TABLE
, or doingALTER TABLE DETACH PARTITION
, is far faster than a bulk operation. These commands also entirely avoid theVACUUM
overhead by a bulk delete. - Seldom-used data can be migrated to cheaper and slower storage media.
Foreign Data
Foreign data is accessed with help from a foreign data wrapper. A foreign data wrapper is a library that can communicate with an external data source, hiding the details of connecting to the data source and obtaining from it.
Data Manipulation
Inserting Data
When a table i created, it contains no data. The first thing to do before a database can be of much use is to insert data. Data is inserted one row at a time. You can also insert more than one row in a single command, but it is not possible to insert something that is not a complete row. Even if you know only some column values, a complete row must be created.
It is good practice to always list the column names when inserting into the table.
Updating Data
The modification of data that is already in the database is referred to as updating. You can update individual rows, all the rows in a table, or a subset of all rows. Each column can be updated separately; the other columns are not affected. To update the existing rows, use the UPDATE
command. This requires three pieces of information:
- The name of the table and column to update
- The new value of the column
- Which row(s) to update
Returning Data from Modified Rows
Sometimes it is useful to obtain data from modified rows while they are being manipulated. The INSERT
, UPDATE
, DELETE
, and MERGE
commands all have an optional RETURNING
clause that supports this. Use of RETURNING
avoids performing an extra database query to collect the data, and is especially valuable when it would otherwise be difficult to identify the modified rows reliably. The allowed contents of a RETURNING
clause are the same as a SELECT
command's output list. It can contain column names of the command's target table, or value expressions using those columns. A common shorthand is RETURNING *
, which selects all columns of the target table in order.
Queries
Overview
The process of retrieving or the command to retrieve data from a database is called a query. In SQL, the SELECT
command is used to specify queries. The general syntax of the SELECT
command is:
[WITH with_queries] SELECT select_list FROM table_expression [sort_specification];
Table Expressions
A table expression computes a table. The table expression contains a FROM
clause that is optionally followed by WHERE
, GROUP BY
, and HAVING
clauses. Trivial table expressions simply refer to a table on disk, a so-called base table, but more complex expressions can be used to modify or combine base tables in various ways. The optional WHERE
, GROUP BY
, and HAVING
clauses in the table expressions specify a pipeline of successive transformations performed on the table derived in the FROM
clause. All these transformations produce a virtual table that provides the rows that are passed to the select list to compute the output rows of the query.
The FROM
clause derives a table from one or more other tables given a comma-separated table reference list.
FROM table_reference [, table_reference [, ...]]
A table reference can be a table name, or a derived table such as a subquery, a JOIN
construct, or complex combination of these. If more than one table reference is listed in the FROM
clause, the tables are cross-joined (that ism the =Cartesian product of their rows is formed). The result of the FROM
list is an intermediate virtual table that can then be subject to transformations by the WHERE
, GROUP BY
, and HAVING
clauses and is finally the result of the overall table expression.
Joined Tables
A joined table is a table derived from two other (real or derived) tables according to the rules of the particular join type. Inner, outer, and cross-joins are available. The general syntax of a joined table is:
T1 join_type T2 [ join_condition ]
Join Types
Cross Join
CROSS JOIN
T1 CROSS JOIN T2
For every possible combination of rows from T1
and T2
(i.e., a Cartesian product), the joined table will contain a row consisting of all columns in T1
followed by all columns in T2
. If the tables have N and M rows respectively, the joined table will have N * M rows.
Qualified Joins
INNER JOIN
- For each row R1 of T1, the joined table has a row for each row in T2 that satisfies the join condition with R1.
LEFT OUTER JOIN
- First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Thus, the joined table always has at least one row for each in T1.
RIGHT OUTER JOIN
- First, an inner join is performed. Then, for each row in T2 that does not satisfy the join condition with any row in T1, a joined row is added with null values in columns of T1. This is the converse of a left join: the result table will always have a row for each row in T2.
FULL OUTER JOIN
- First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Also, for each row of T2 that does not satisfy the join condition with any row in T1, a joined row with null values in the columns of T1 is added.
The ON
clause is the most general kind of join condition: it takes a Boolean value expression of the same kind as is used in a WHERE
clause. A pair of rows from T1
to T2
match if the ON
expression evaluates to true
. The USING
clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s).
Table and Column Aliases
A temporary name can be given to tables and complex table references to be used for references to the derived table in the rest of the query. This is called a table alias. To create a table alias, write:
FROM table_reference AS alias
-- or
FROM table_reference alias
A typical application of table aliases is to assign short identifiers to long table names to keep the join clauses readable.
Subqueries
Subqueries specifying a derived table must be enclosed in parentheses. They may be assigned a table alias name, and optionally column alias names.
FROM (SELECT * FROM table1) AS alias_name
The WHERE
Clause
The syntax of the WHERE
clause is:
WHERE search_condition
where search_condition
is any value expression that returns a value of type boolean
.
After the processing of the FROM
clause is done, each row of the derived virtual table is checked against the search condition. If the result of the condition is true, the row is kept in the output table, otherwise it is discarded. The search condition typically references at least one column of the table generated in the FROM
clause; this is not required, but otherwise the WHERE
clause will be fairly useless.
The GROUP BY
and HAVING
Clauses
After passing the WHRE
filter, the derived input table might be subject to grouping, using the GROUP BY
clause, and elimination of group rows using the HAVING
clause.
The GROUP BY
clause is used to group together those rows in a table that have the same values in all the columns listed. The order in which the columns are listed does not matter. The effect is to combine each set of rows having common values into one group row that represents all rows in the group. In general, if a table is grouped, columns that are not listed in GROUP BY
cannot be referenced except in aggregate expressions. An example with aggregate expressions is:
SELECT x, sum(y) FROM test1 GROUP BY x;
Select Lists
The table expression in the SELECT
command constructs an intermediate virtual table by possibly combining tables, views, eliminating rows, grouping, etc. This table is finally passed on to processing by the select list. The select list determines which columns of the intermediate table are actually output.
The simplest kind of select list is * which emits all columns that the table expression produces. Otherwise, a select list is a comma-separated list of value expressions.
Distinct
After the select list has been processed, the result table can optionally be subject to the elimination of duplicate rows. The DISTINCT
key word is written directly after SELECT
to specify this:
SELECT DISTINCT select_list ...
Obviously, two rows are considered distinct if they differ in at least one column value. Null values are considered waul in this comparison. Alternatively, an arbitrary expression can determine what rows are to be considered distinct:
SELECT DISTINCT ON (expression [, expression ...]) select_list ...
Here expression
is an arbitrary value expression that is evaluated for all rows. A set of rows for which all the expressions are equal are considered duplicates, and only the first row of the set is kept in the output.
Combining Queries
The results of two queries can be combined using the set operations union, intersection, and difference.
query1 UNION [ALL] query2
query1 INTERSECT [ALL] query2
query1 EXCEPT [ALL] query2
where query1
and query2
are queries that can use any of the features discussed up to this point.
UNION
effectively appends the result of query2
to the result of query1
- with no order guarantee. It eliminates duplicate rows from its result, in the same was as DISTINCT
, unless UNION ALL
is used.
INTERSECT
returns all rows that are both in the result of query1
and in the result of query2
. Duplicate rows are eliminated unless INTERSECT ALL
is used.
EXCEPT
returns all rows that are in the result of query1
but not in the result of query2
(This is sometimes called the difference between the two queries). Again, the duplicates are eliminated unless EXCEPT ALL
is used.
In order to calculate the union, intersection, or difference of two queries, the two queries must be union compatible
, which means that they return the same umber of columns and the corresponding columns have compatible data types.
Sorting Rows (ORDER BY)
After a query has produced an output table, it can optionally be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and order on disk, but it must not be relied on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.
SELECT select_list
FROM table_expression
ORDER BY sort_expression1 [ASC | DESC] [NULLS { FIRST | LAST }]
[, sort_expression2 [ASC | DESC] [NULLS { FIRST | LAST }] ...]
The sort expression(s) can be any expression that would be valid the query's select list.
LIMIT
and OFFSET
LIMIT
and OFFSET
allow you to retrieve just a portion of the rows that are generate by the rest of the query:
SELECT select_list
FROM table_expression
[ ORDER BY ... ]
[ LIMIT { count | ALL } ]
[ OFFSET start ]
If a limit count is given, no more than that many rows will be returned. LIMIT ALL
is the same as omitting the LIMIT
clause, as is LIMIT
with a NULL
argument. OFFSET
says to skip that many rows before beginning to return rows. OFFSET 0
is the same as omitting the OFFSET
clause. If both OFFSET
and LIMIT
appear, then OFFSET
rows are skipped before starting to count the LIMIT
rows that are returned. When using LIMIT
, it is important to use an ORDER BY
clause that constrains the result rows into a unique order.
VALUES
Lists
VALUES
provides a way to generate a constraint table
that can be used in a query without having to actually create and populate a table on-disk. The syntax is:
VALUES ( expression [, ...] ) [, ...]
VALUES (1, 'one'), (2, 'two'), (3, 'three');
WITH
Queries (Common Table Expressions)
See the daily reading article on PostgreSQL Common Table Expressions.
Data Types
PostgreSQL has a rich set of native data types available to users. Users can add new types to PostgreSQL using the CREATE TYPE
command.
Name | Aliases | Description |
---|---|---|
|
| signed eight-byte integer |
|
| autoincrementing eight-byte integer |
|
| fixed-length bit string |
|
| variable-length bit string |
|
| logical Boolean (true/false) |
|
| rectangular box on a plane |
|
| binary data (“byte array”) |
|
| fixed-length character string |
|
| variable-length character string |
|
| IPv4 or IPv6 network address |
|
| circle on a plane |
|
| calendar date (year, month, day) |
|
| double precision floating-point number (8 bytes) |
|
| IPv4 or IPv6 host address |
|
| signed four-byte integer |
|
| time span |
|
| textual JSON data |
|
| binary JSON data, decomposed |
|
| infinite line on a plane |
|
| line segment on a plane |
|
| MAC (Media Access Control) address |
|
| MAC (Media Access Control) address (EUI-64 format) |
|
| currency amount |
|
| exact numeric of selectable precision |
|
| geometric path on a plane |
|
| PostgreSQL Log Sequence Number |
|
| user-level transaction ID snapshot |
|
| geometric point on a plane |
|
| closed geometric path on a plane |
|
| single precision floating-point number (4 bytes) |
|
| signed two-byte integer |
|
| autoincrementing two-byte integer |
|
| autoincrementing four-byte integer |
|
| variable-length character string |
|
| time of day (no time zone) |
|
| time of day, including time zone |
|
| date and time (no time zone) |
|
| date and time, including time zone |
|
| text search query |
|
| text search document |
|
| user-level transaction ID snapshot (deprecated; see |
|
| universally unique identifier |
|
| XML data |
The data types smallserial
, serial
, and bigserial
are not true types, but merely a notational convenience for creating unique identifier columns.
The money
type stores a currency amount with a fixed fractional precision.
PostgreSQL offers data types to store IPv4, IPv6, and MAC addresses. It is better to use these types instead of plain text types to store network addresses, because these types offer input error checking and specialized operators and functions.
PostgreSQL provides two data types that are designed to support full text search, which is the activity of searching through a collection of natural-language documents to locate those that best match a query. The texvector
type represents a document in a form optimized for text search; the tsquery
type similarly represents a text query.
The data type uuid
stores Universally unique identifiers (UUID) as defined by RFC 4122. This identifier is a 128-bit quantity that is generated by an algorithm chosen to make it very unlikely that the same identifier will be generated by anyone else in the known universe using the same algorithm. Therefore, for distributed systems, these identifiers provide a better uniqueness guarantee than sequence generators, which are only unique within a single database.
The xml
data type can be used to store XML data. Its advantage over storing XML data in a text
field is that it checks the input values for well-formedness and there are support functions to perform type-safe operations on it. Use of this data type requires the installation to have been built with cconfigure --with-libxml
.
JSON data types are for storing JSON data. Such data can also be stored as text, but the JSON data types have the advantage of enforcing that each stored value is valid according to the JSON rules. There are also assorted JSON-specific functions and operators available for data stored in these data types.
Thejson
andjsonb
data types accept almost identical sets of values as input. The major practical difference is one of efficiency. Thejson
data type stores an exact copy of the input text, which processing functions must reparse on each execution; whilejsonb
data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed.jsonb
also supports indexing, which can be a significant advantage.
PostgreSQL allows columns of a table to be defined as variable-length, multidimensional arrays. Arrays of any built-in or user-defined base type, enum
type, composite type, range type, or domain can be created.
CREATE TABLE sal_emp (
name text,
pay_by_quarter integer[],
schedule text[][],
squares integer[3][3] -- You can specify the exact size of arrays
);
SELECT pay_by_quarter[1] FROM sal_emp; -- Arrays are 1-indexed
SELECT pay_by_quarter[1:3] FROM sal_emp; -- Slicing
SELECT pay_by_quarter[1:] FROM sal_emp; -- Slicing
SELECT pay_by_quarter[:3] FROM sal_emp; -- Slicing
-- Returns the array dimension as text
SELECT array_dims(schedule) FROM sal_emp;
-- Updating Arrays
UPDATE sal_emp SET pay_by_quarter = '{25000,25000,27000,27000}' WHERE name = 'Carol';
UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}' WHERE name = 'Carol';
A composite type represents the structure of a row or record; it is essentially just a list of field names and their data types. PostgreSQL allows composite types to be used in many of the same ways that simple types can be used.
Range types are data types representing a range of values of some element type (called the range's subtype). For instance, ranges of timestamp
might be used to represent the ranges of time that a meeting room is reserved. In this case, the data type is tsrange
(short for timestamp range
) and timestamp
is the subtype. The subtype must have a total order so that it is well-defined whether element values are within, before, or after a range of values. Range types are useful because they represent many element values in a single range value, and because concepts such as overlapping ranges can be expressed clearly.
A domain is a user-defined data type that is based on another underlying type. Optionally, it can have constraints that restrict its valid values to a subset of what the underlying type would allow. Otherwise, it behaves like the underlying type - for example, any operator or function that can be applied to the underlying type will work on the domain type. The underlying type can be any built-in or user-defined base type, enum type, composite type, range type, or another domain.
The PostgreSQL type system contains a number of special-purpose entries that are collectively called pseudo-types. A pseudo-type cannot be used as a column data type, but it can be used to declare a function's argument or result type. Each of the available pseudo-types is useful in situations where a function's behavior does not correspond to simply taking or returning a value of a specific SQL data type.
Indexes
See the note on PostgreSQL to see information on indexes.
Full Text Search
See the note on PostgreSQL Search.
Concurrency Control
This chapter describes the behavior of PostgreSQL database system when two or more sessions try to access the same data at the same time. The goals in that situation are to allow efficient access for all sessions while maintaining strict data integrity.
Introduction
PostgreSQL provides a rich set of tools for developers to manage concurrent access to data. Internally, data consistency is maintained by using a multiversion model (Multiversion Concurrency Control, MVCC). This means that each SQL statement sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. This prevents statements from viewing inconsistent data produced by concurrent transactions performing updates on the same data rows, providing transaction isolation for each database session. MVCC, by eschewing the locking methodologies of traditional database systems, minimizes lock contention in order to allow for reasonable performance in multiuser environments.
The main advantage of using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading.
Comments
You can read more about how comments are sorted in this blog post.
User Comments
There are currently no comments for this article.