Function-based sharding for PostgreSQL

PL/Proxy Cluster Configuration

PL/Proxy can be used in either CONNECT mode or CLUSTER mode.

In CONNECT mode PL/Proxy acts as a pass through proxy to another database. Each PL/Proxy function contains a libpq connect string for the connection to a database it will proxy the request to.

PL/Proxy can also be used in CLUSTER mode where it provides support for partitioning data across multiple databases based on a clustering function.

When using PL/Proxy in CONNECT mode no special configuration is required. However, using PL/Proxy in CLUSTER mode requires the cluster configuration to be defined, either by the cluster configuration API or SQL/MED.

Cluster configuration API

The following plproxy schema functions are used to define the clusters:


plproxy.get_cluster_version(cluster_name text)
returns integer

The get_cluster_version() function is called on each request, it should return the version number of the current configuration for a particular cluster.
If the version number returned by this function is higher than the one plproxy has cached, then the configuration and partition information will be reloaded by calling the get_cluster_config() and get_cluster_partitions() functions.

This is an example function that does not lookup the version number for an external source such as a configuration table.

CREATE OR REPLACE FUNCTION plproxy.get_cluster_version(cluster_name text)
RETURNS int4 AS $$
IF cluster_name = 'a_cluster' THEN
        RETURN 1;
    END IF;
    RAISE EXCEPTION 'Unknown cluster';
$$ LANGUAGE plpgsql;


plproxy.get_cluster_partitions(cluster_name text)
returns setof text

This is called when a new partition configuration needs to be loaded. It should return connect strings to the partitions in the cluster. The connstrings should be returned in the correct order. The total number of connstrings returned must be a power of 2 unless modular_mapping is used. If two or more connstrings are equal then they will use the same connection.

If the string user= does not appear in a connect string then user=CURRENT_USER will be appended to the connection string by PL/Proxy.
This will cause PL/Proxy to connect to the partition database using the same username as was used to connect to the proxy database. This also avoids the accidental user=postgres connections.

There are several approaches how to handle passwords:

An example function without the use of separate configuration tables:

CREATE OR REPLACE FUNCTION plproxy.get_cluster_partitions(cluster_name text)
    IF cluster_name = 'a_cluster' THEN
        RETURN NEXT 'dbname=part00 host=';
        RETURN NEXT 'dbname=part01 host=';
        RETURN NEXT 'dbname=part02 host=';
        RETURN NEXT 'dbname=part03 host=';
    END IF;
    RAISE EXCEPTION 'Unknown cluster';
$$ LANGUAGE plpgsql;


        IN cluster_name text,
        OUT key text,
        OUT val text)

The get_cluster_config() function returns a set of key-value pairs that can consist of any of the following configuration parameters. All of them are optional. Timeouts/lifetime values are given in seconds. If the value is 0 or NULL then the parameter is disabled (a default value will be used).

Example function without the use of separate tables for storing parameters.

CREATE OR REPLACE FUNCTION plproxy.get_cluster_config(
    IN cluster_name text,
    OUT key text,
    OUT val text)
    -- lets use same config for all clusters
    key := 'connection_lifetime';
    val := 30*60; -- 30m
$$ LANGUAGE plpgsql;

SQL/MED cluster definitions

Pl/Proxy can take advantage of SQL/MED connection info management available in PostgreSQL 8.4 and above. The benefits of using SQL/MED are simplified cluster definition management and slightly improved performance.

Both SQL/MED defined clusters and configuration function based clusters can coexist in the same database. If a cluster is defined in both, SQL/MED takes precedence. If no SQL/MED cluster is found an attempt is made to fall back to configuration functions.

Configuring SQL/MED clusters

First we need to create a foreign data wrapper. Generally the FDW is a kind of driver that provides remote database access, data marshalling etc. In this case, it’s main role is to provide a validation function that sanity checks your cluster definitions.

Note: the validation function is known to be broken in PostgreSQL 8.4.2 and below.

        [ VALIDATOR plproxy_fdw_validator ]
        [ OPTIONS global options ] ;

Next we need to define a CLUSTER, this is done by creating a SERVER that uses the plproxy FDW. The options to the SERVER are PL/Proxy configuration settings and the list of cluster partitions.

Note: USAGE access to the SERVER must be explicitly granted. Without this, users are unable to use the cluster.

        OPTIONS (
                connection_lifetime '1800',
                disable_binary '1',
                p0 'dbname=part00 host=',
                p1 'dbname=part01 host=',
                p2 'dbname=part02 host=',
                p3 'dbname=part03 host='

Finally we need to create a user mapping for the Pl/Proxy users. One might create individual mappings for specific users:

CREATE USER MAPPING FOR bob SERVER a_cluster OPTIONS (user 'bob', password 'secret');

or create a PUBLIC mapping for all users of the system:

CREATE USER MAPPING FOR public SERVER a_cluster OPTIONS (user 'plproxy', password 'foo');

Also it is possible to create both individual and PUBLIC mapping, in this case the individual mapping takes precedence.