PostgreSQL PL/pgSQL

I need to review the Postgres PL/pgSQL SQL procedural language because it is what I use to write Postgres functions.

Date Created:

1 464

Notes

Overview

PL/pgSQL is a loadable procedural language for the PostgreSQL database system.

Goals of the language:

can be used to create functions, procedures, and triggers,
adds control structures to the SQL language
can perform complex computations
inherits all user-defined types, functions, procedures, and operators
can be defined to be trusted by the server
easy to use

Advantages

With PL/pgSQL you can group a block of computation and a series of queries inside the database server, thus having the power of a procedural language and the ease of use of SQL, but with considerable savings of client/server communication overhead.

Round trips between client and server are eliminated
Intermediate results that the client does not need do not have to be marshaled or transferred between server and client
Multiple rounds of query parsing can be avoided

This can result in a considerable performance increase as compared to an application that does not use stored functions.

Supported Argument and Result Data Types

Functions written in PL/pgSQL can accept as arguments any scalar or array data type supported by the server, and they can return the result of any of these types. They can also accept or return any composite type (row type) specified by name. It is also possible to declare a PL/pgSQL function as accepting record, which means that any composite type will do as input, or as returning record, which means that the result is a row type whose columns are determined by the specification in the calling query.

PL/pgSQL function can be declared to return a "set" (or table) of any data type that can be returned as a single instance. Such a function generates its output by executing RETURN NEXT for each desired element of the result set, or by using RETURN QUERY to output the result of evaluating a query.

Structure of PL/pgSQL

Functions written in PL/pgSQL are defined to the server by executing CREATE FUNCTION commands. Such a command would normally look like:

CREATE FUNCTION somefunc(integer,text) RETURNS integer
AS 'function body trext'
LANGUAGE plpgsql;

The function body is simply a string literal s far as CREATE FUNCTION is concerned. It is often helpful to use dollar quoting to write the function body, rather than the normal single quote syntax.

PL/pgSQL is a block-structured language. The complete text of a function body must be a block. A block is defined as:

[ <<label>> ]
[ DECLARE
    declarations ]
BEGIN
    statements
END [ label ];

Each declaration and each statement within a block is terminated by a semicolon. A block that appears within another block must have a semicolon after END, as shown above, however the final END that concludes a function body does not require a semicolon.

A label is only needed if you want to identify the bock for use in an EXIT statement, or to qualify the names of the variables declared in the block. If a label is given after END, it must match the label at the block's beginning.

All keywords are case-sensitive. Identifiers are implicitly converted to lower case unless double-quoted, just as they are in ordinary SQL commands. Comments work the same way in PL/pgSQL code as in ordinary SQL.

Any statement in the statement section of a block can be a subblock. Subblocks can be used for logical grouping or to localize variables to a small group of statements. Variables declared in a subblock mask any similarly-named variables of outer blocks created for the duration of the subblock; but you can access the outer variables anyway if you qualify their names with their block's label.

CREATE FUNCTION somefunc() RETURNS integer AS $$
<< outerblock >>
DECLARE
    quantity integer := 30;
BEGIN
    RAISE NOTICE 'Quantity here is %', quantity;  -- Prints 30
    quantity := 50;
    --
    -- Create a subblock
    --
    DECLARE
        quantity integer := 80;
    BEGIN
        RAISE NOTICE 'Quantity here is %', quantity;  -- Prints 80
        RAISE NOTICE 'Outer quantity here is %', outerblock.quantity;  -- Prints 50
    END;

    RAISE NOTICE 'Quantity here is %', quantity;  -- Prints 50

    RETURN quantity;
END;
$$ LANGUAGE plpgsql;

It is important to not confuse the use of BEGIN/END for grouping statements in PL/pgSQL with the similarly named SQL commands for transaction control. PL/pgSQL's BEGIN/END are only for grouping; they no not start or end a transaction. A block containing an EXCEPTION clause effectively forms a sub-transaction that can be rolled back without affecting the outer transaction.

Declarations

All variables used in block must be declared in the declarations section of the block. The only exceptions are that the loop variable for a FOR loop iterating over a range of integer values is automatically declared as an integer variable.

PL/pgSQL variables can have any SQL data type, such as integer, varchar, and char.

user_id integer;
quantity numeric(5);
url varchar;
myrow tablename%ROWTYPE;
myfield tablename.columnname%TYPE;
arow RECORD;

quantity integer DEFAULT 32;
url varchar := 'http://mysite.com';
transaction_time CONSTANT timestamp with time zone := now();

Once declared, a variable's value can be used in later initialized expressions in the same block:

DECLARE
  x integer := 1;
  y integer := x + 1;

The general syntax of a variable declaration is:

name [ CONSTANT ] type [ COLLATE collation_name ] [ NOT NULL ] [ { DEFAULT | := | = } expression ];

The DEFAULT clause, if given, specifies the initial value assigned to the variable when the block is entered. If the DEFAULT clause if not given then the variable is initialized to the SQL null value. The CONSTANT option prevents the variable from being assigned to after initialized, so that its value will remain constant for the duration of the block. The COLLATE option specifies a collation to use for the variable. If NOT NULL is specified, an assignment of a null value results in a runtime error. A variable's default value is evaluated and assigned to the variable each time the block is entered (not just once per function call),

Declaring Function Parameters

Variables passed to functions are named with the identifiers $1, $2, etc. Optionally, aliases can be declared for $n parameter names for increased readability. Either the parameter of the numeric identifier can then be used to refer to the parameter value.

Two ways to create an alias. The preferred way is to give a name to the parameter in the CREATE FUNCTION comamnd:

CREATE FUNCTION sales_tax(subtotal real) RETURNS real AS $$
BEGIN
    RETURN subtotal * 0.06;
END;
$$ LANGUAGE plpgsql;

The other way is to explicitly declare an alias:

CREATE FUNCTION sales_tax(real) RETURNS real AS $$
DECLARE
    subtotal ALIAS FOR $1;
BEGIN
    RETURN subtotal * 0.06;
END;
$$ LANGUAGE plpgsql;

When a PL/pgSQL function is declared with output parameters, the output parameters are given $n names and optional aliases in just the same way as the normal input parameters. An output parameter is effectively a variable that starts out as NULL; it should be assigned to during the execution of the function. The final value is what is returned.

CREATE FUNCTION sales_tax(subtotal real, OUT tax real) AS $$
BEGIN
    tax := subtotal * 0.06;
END;
$$ LANGUAGE plpgsql;

CREATE FUNCTION sum_n_product(x int, y int, OUT sum int, OUT prod int) AS $$
BEGIN
    sum := x + y;
    prod := x * y;
END;
$$ LANGUAGE plpgsql;

In a call to procedure, all the parameters must be specified. For output parameters, NULL may be specified when calling the procedure from plain SQL:

CALL sum_n_product(2, 4, NULL, NULL);
 sum | prod
-----+------
   6 |    8

Another way to declare a PL/pgSQL function is with RETURNS TABLE:

CREATE FUNCTION extended_sales(p_itemno int)
RETURNS TABLE(quantity int, total numeric) AS $$
BEGIN
    RETURN QUERY SELECT s.quantity, s.quantity * s.price FROM sales AS s
                 WHERE s.itemno = p_itemno;
END;
$$ LANGUAGE plpgsql;

Polymorphic Types:

CREATE FUNCTION add_three_values(v1 anyelement, v2 anyelement, v3 anyelement,
                                 OUT sum anyelement)
AS $$
BEGIN
    sum := v1 + v2 + v3;
END;
$$ LANGUAGE plpgsql;

Alias

newname ALIAS FOR oldname;

The ALIAS syntax is more general than is suggested in the previous section: you can declare an alias for any variable, not just function parameters. The main practical use for this is to assign a different name for variables with predetermined names.

Copying Types

name table.column%TYPE
name variable%TYPE
-- Example, declaring variable with the same data type as users.user_id
user_id users.user_id%TYPE
-- Creating an array declaration with the referenced type:
user_ids users.user_id%TYPE[];

%TYPE provides the data type of a table column or a previously-declared PL/pgSQL variable. You can use this to declare variables that will hold database values.

Just as when declaring table columns that are arrays, it doesn't matter whether you write multiple bracket pairs or specific array dimensions: PostgreSQL treats all arrays of a given element type as the same type, regardless of dimensionality.

Row Types

name table_name%ROWTYPE;
name composite_type_name;

A variable of a composite type is called a row variable (or row-type variable). Such a variable can hold a whole row of SELECT or FOR query result, as long as the query's column set matches the declared type of the variable. The individual fields of the row value are accessed using the usual dot notation, rowvar.field.

Parameters to a function can be composite types (complete table rows).

Record Types

name RECORD;

Record variables are similar to row-type variables, but they have no predefined structure. They take on the actual row structure of the row they are assigned during a SELECT or FOR command.

Expressions

All expressions used in PL/pgSQL statements are processed using the server's main SQL executor. If you write a PL/pgSQL statement like:

IF expression THEN ...

PL/pgSQL will evaluate the expression by feeding a query like:

SELECT expression

Basic Statements

Assignment

An assignment of a value to a PL/pgSQL variable is written as:

variable { := | = } expression;
-- Examples
tax := subtotal * 0.06;
my_record.user_id := 20;
my_array[j] := 20;
my_array[1:3] := array[1,2,3];
complex_array[n].realpart = 12.3;

The expression in such a statement is evaluated by means of an SQL SELECT command sent to the main database engine. The expression must yield a single value (possibly a row value, if the variable is a row or record variable). The target variable can be a single variable (optionally qualified with a block name), a field of a row or record target, or an element or slice of an array target.

If the expression's result data type doesn't match the variable's data type, the value will be coerced as through by an assignment cast.

Executing SQL Commands

Any SQL command that does not return rows can be executed within a PL/pgSQL function just by writing the command. If the command does return rows (for example SELECT, or INSERT/UPDATE/DELETE/MERGE with RETURNING), there are two ways to proceed.

When the command will return at most one row, or you only care about the first row of the output, write the command as usual but add an INTO clause to capture the output.
To process all of the output rows, write the command as the data source for a FOR loop.

Typically you'll want to use varying data values or even different tables at different times. There are two ways to proceed depending on the situation:

PL/pgSQL variable vales can be automatically inserted into optimizable SQL commands, which are SELECT, INSERT, UPDATE, DELETE, `MERGE, and certain utility commands that incorporate one of these, such as EXPLAIN and CREATE TABLE ... AS SELECT. In these commands, any PL/pgSQL variable name appearing in the command text is replaced by a query parameter, and then the current value of the variable is provided as the parameter at run time.
Non-optimizable SQL commands (also called utility commands) are not capable of accepting query parameters. So automatic substitution of PL/pgSQL variables does not work in such commands. To include non-constant text in a utility command executed from PL/pgSQL, you must build the utility command as a string and then EXECUTE it.

Sometimes it is useful to evaluate an expression or SELECT query but discard the result, for example when calling a function that has side-effects but no useful result value. To do this in PL/pgSQL, use the PERFORM statement.

PERFORM query;

This executes the query and discards the result. Write the query the same way you would write an SQL SELECT command, but replace the initial keyword SELECT with PERFORM. For WITH queries, use PERFORM and then place the query in parentheses.

Executing a Command with a Single-Row Result

The result of an SQL command yielding a single row (possibly of multiple columns) can be assigned to a record variable, row-type variable, or list of scalar variables. This is done by writing the base SQL command and adding a INTO clause.

SELECT select_expressions INTO [STRICT] target FROM ...;
INSERT ... RETURNING expressions INTO [STRICT] target;
UPDATE ... RETURNING expressions INTO [STRICT] target;
DELETE ... RETURNING expressions INTO [STRICT] target;
MERGE ... RETURNING expressions INTO [STRICT] target

where target can be a record variable, a row variable, or a comma-separated list of simple variables and record/row fields. PL/pgSQL variables will be substituted into the rest of the command (that is, everything but the INTO clause) just as described above, and the plan is cached in the same way. Except for the INTO clause, the SQL command is the same as it would be written outside of PG/pgSQL.

If STRICT is not specified in the INTO clause, then target will be set to the first row returned by the command, or to nulls if the command returned no rows. If the STRICT option is specified, the command must return exactly one row or a run-time error will be reported, either NO_DATA_FOUND (no rows) or TOO_MANY_ROWS (more than one row).

BEGIN
    SELECT * INTO STRICT myrec FROM emp WHERE empname = myname;
    EXCEPTION
        WHEN NO_DATA_FOUND THEN
            RAISE EXCEPTION 'employee % not found', myname;
        WHEN TOO_MANY_ROWS THEN
            RAISE EXCEPTION 'employee % not unique', myname;
END;

Successful execution of a command with STRICT always sets FOUND to true.

For INSERT/UPDATE/DELETE/MERGE with RETURNING, PL/pgSQL reports an error for more than one returned row, even when STRICT is not specified. This is because there is no option such as ORDER BY with which to determine which affected row should be returned.

Executing Dynamic Commands

To handle problems where you want to generate dynamic commands inside your PL/pgSQL functions, the EXECUTE statement is provided:

EXECUTE command-string [ INTO [STRICT] target ] [ USING expression [, ... ] ];

where command-string is an expression yielding a string (of type text) containing the command to be executed. The optional target is a record variable, a row variable, or a comma-separated list of simple variables and record/row fields, into which the results of the command will be stored.

Obtaining the Result Status

There are several ways to determine the effect of a command. The first method is to use the GET DIAGNOSTICS command, which has the form:

GET [ CURRENT ] DIAGNOSTICS variable { = | := } item [ , ... ];

This command allows retrieval of system status indicators. CURRENT is a noise word. Each item is a key word identifying a status value to be assigned to the specific variable. The current available status items are shown below. Colon-equal can be used instead of the SQL-standard = token.

GET DIAGNOSTICS integer_var = ROW_COUNT;


Name	Type	Description
`ROW_COUNT`	`bigint`	the number of rows processed by the most recent SQL command
`PG_CONTEXT`	`text`	line(s) of text describing the current call stack
`PG_ROUTINE_OID`	`oid`	OID of the current function

The second method to determine the effects of a commands to check the special variable named FOUND, which is the type of boolean. FOUND starts out false within each PL/pgSQL function call. It is set by each of the following types of statements:

SELECT INTO statement sets FOUND true if a row is assigned, false if no row is returned
A PERFORM statement sets FOUND true if it produces (and discards) one or more rows, false if no row is produced
UPDATE, INSERT, DELETE, and MERGE statements set FOUND true if at least one row is affected, false if no row is affected
A FETCH statements sets FOUND true if it returns a row, false if no row is returned
A MOVE statement sets FOUND true if it successfully repositions the cursor, false otherwise
A FOR or FOREACH statement sets FOUND true if it iterates one ore more times, else false. FOUND is set this way when the loop exits; inside the execution of the loop, FOUND is not modified by the loop statement, although it might be changed by the execution of other statements within the loop body.
RETURN QUERY and RETURN QUERY EXECUTE statements set FOUND true if the query returns at least one row, false if no row is returned.

Doing Nothing at All

Sometimes a placeholder statement that does nothing is useful. For example, it can indicate that one arm of an if/then/else chain is deliberately empty.

BEGIN
    y := x / 0;
EXCEPTION
    WHEN division_by_zero THEN
        NULL;  -- ignore the error
END;

I am going to continue taking notes on PL/pgSQL in the Notes page. See this article for the rest of the notes (I should be finished soon).