PL/Proxy Language Syntax
The language is similar to plpgsql - string quoting, comments, semicolon at the statements end.
It contains only 4 statements: CONNECT
, CLUSTER
, RUN
and SELECT
.
Each function needs to have either CONNECT
or pair of CLUSTER
+ RUN
statements
to specify where to run the function.
The SELECT
statement is optional, if it is missing, there will be default
query generated based on proxy function signature.
The RUN
statment is also optional, it defaults to RUN ON ANY
which means the query will be run random partition.
CONNECT
CONNECT 'libpq connstr';
Specifies exact location where to connect and execute the query. If several functions have same connstr, they will use same connection.
CONNECT connect_func(...);
CONNECT argname;
CONNECT $argnr;
Connect string is taken from function result or directly from argument. If several functions have same connstr, they will use same connection. (New in 2.0.9)
NB: giving untrusted users ability to specify full connect string creates
security hole. Eg it can used to read cleartext passwords from ~/.pgpass
or pg_service.conf
. If such function cannot be avoided, it’s access rights
need to be restricted.
CLUSTER
CLUSTER 'cluster_name';
Specifies exact cluster name to be run on. The cluster name will
be passed to plproxy.get_cluster_*
functions.
CLUSTER cluster_func(..);
Cluster name can be dynamically decided upon proxy function arguments.
cluster_func
should return text value of final cluster name.
RUN ON …
RUN ON ALL;
Query will be run on all partitions in cluster in parallel.
RUN ON ANY;
Query will be run on random partition.
RUN ON <NR>;
Run on partition number <NR>
.
RUN ON partition_func(..);
Run partition_func()
which should return one or more hash values. (int4)
Query will be run on tagged partitions. If more than one partition was
tagged, query will be sent in parallel to them.
RUN ON argname;
RUN ON $1;
Take hash value directly from function argument. (New in 2.0.8)
SPLIT
SPLIT array_arg_1 [ , array_arg_2 ... ] ;
SPLIT ALL ;
Split the input arrays based on RUN ON statement into per-partition arrays. This is done by evaluating RUN ON condition for each array element and building per-partition parameter arrays for each matching partition. During execution each tagged partition then gets its own subset of the array to process. (New in 2.1)
The semantics of RUN ON statement is slightly changed with SPLIT arrays:
RUN ON partition_func(..);
The array is split between the partitions matching partition_func()
. Any
SPLIT parameters passed to the function are actually replaced with the
individual array elements.
RUN ON argname;
RUN ON $1;
An array of partition numbers (or hashes) can be passed as argname
. The function
shall be run on the partitions specified in the array.
RUN ON ANY;
Each element is assigned to random partition.
RUN ON ALL;
RUN ON <NR>;
Unaffected, except for the added overhead of array copying.
Example:
CREATE FUNCTION set_profiles(i_users text[], i_profiles text[])
RETURNS SETOF text AS $$
CLUSTER 'userdb';
SPLIT i_users, i_profiles;
RUN ON hashtext(i_users);
$$ LANGUAGE plproxy;
Given query:
SELECT * FROM set_profiles(ARRAY['foo', 'bar'], ARRAY['a', 'b']);
The hash function is called 2 times:
SELECT * FROM hashtext('foo');
SELECT * FROM hashtext('bar');
And target partitions get queries:
SELECT * FROM set_profiles(ARRAY['foo'], ARRAY['a']);
SELECT * FROM set_profiles(ARRAY['bar'], ARRAY['b']);
TARGET
Specify function name on remote side to be called. By default PL/Proxy uses current function name. (New in 2.2)
Following function:
CREATE FUNCTION some_function(username text, num int4)
RETURNS SETOF text AS $$
CLUSTER 'userdb';
RUN ON hashtext(username);
TARGET other_function;
$$ LANGUAGE plproxy;
will run following query on remote side:
SELECT * FROM other_function(username, num);
SELECT
SELECT .... ;
By default, PL/Proxy generates query based on its own signature.
But this can be overrided by giving explicit SELECT
statement to run.
Everything after SELECT
until semicolon is taken as SQL to be passed on.
Only argument substitution is done on the contents, otherwise the text
is unparsed. To avoid a table column to be parsed as function argument,
table aliases should be used.
Query result should have same number of columns as function result and same names too.
Argument substitution
Proxy function arguments can be referenced using name or $n
syntax.
Everything that is not argument reference is just passed on.
Dynamic records
PL/Proxy supports function returning plain RECORD type. Such functions need the result type specified at call site. Main use-case is to run random queries on partitions. (New in 2.0.6)
Very simple example:
CREATE OR REPLACE FUNCTION dynamic_query(q text)
RETURNS SETOF RECORD AS $$
CLUSTER 'mycluster';
RUN ON ALL;
$$ LANGUAGE plproxy;
Corresponding function in partitions:
CREATE OR REPLACE FUNCTION dynamic_query(sql text)
RETURNS SETOF RECORD AS $$
DECLARE
rec RECORD;
BEGIN
FOR rec IN EXECUTE sql
LOOP
RETURN NEXT rec;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
Sample request:
SELECT * FROM dynamic_query('SELECT id, username FROM sometable')
AS (id integer, username text);
The types given in AS clause must match actual types from query.