Function-based sharding for PostgreSQL

PL/Proxy FAQ


What is PL/Proxy?

PL/Proxy is compact language for remote calls between PostgreSQL databases. Syntax is similar to PL/pgSql and language contains only 4 statements.

With PL/Proxy user can create proxy functions that have same signature as remote functions to be called. The function body describes how the remote connection should be acquired.

When such proxy function is called, PL/Proxy:

  1. Automatically generates the SQL query to be executed from function signature
  2. Executes statements in function body to get the connection
  3. Uses function arguments as input parameters to query
  4. Passes the query result back as function result

Why functions?

Concentrating on just function-calls allows PL/Proxy to keep its code small and also to present user simple and compact API.

Using function-based database access has more general good points:

Why not develop it into Remote Parallel PL/SQL?

Huge benefit of PL/Proxy is it’s compactness and efficiency. As it does not need to parse queries going through it adds very little overhead.

Making it full-blown language for SQL execution would mean reimplementing PL/pgSQL, PL/Perl, parts of pgpool and more, which is waste of effort.

Also when plproxy functions mirror actual functions, the PL/Proxy becomes optional component of setup - the client apps can bypass PL/Proxy and work directly on actual database.

This is good for testing and also live deployment - we let clients work on smaller databases directly, they are put behind PL/Proxy only when load gets too high and we need to partition a database.

What can PL/Proxy be used for?

PL/Proxy version 1 had PL and pooler integrated. But such design caused a lot of unnecessary complexity. With PL/Proxy version 2, we wrote both pooler and PL part from scratch, both designed to be standalone components.

That allowed both components to be tested and used separately and resulted in compact and robust codebase.

So PgBouncer can be used with PL/Proxy to lessen connection count on partition server, but such usage is not mandatory.


What are the external dependencies?

It depends only on libpq and poll(2) + gettimeofday(2) system calls. So it should be quite portable.

How the remote calls are done?

First a SELECT query is generated based on PL/Proxy function signature.

A function signature of:

CREATE FUNCTION get_data(IN first_name text, IN last_name text,
                         OUT bdate date, OUT balance numeric(20,10))

Results in following query:

SELECT bdate::date, balance::numeric(20,10)
  FROM public.get_data($1::text, $2::text);

The casts and explicit OUT parameter names are used to survive minor type or result column order differences between local and remote databases.

Then the CLUSTER statement is processed, optionally executing function. This result in cluster name.

Then plproxy.get_cluster_version(<cluster_name>) is executed. This gives numeric version number for cluster. If resulting version number differs from version in cached cluster, the cache is dropped.

If cluster information in not cached, the plproxy.get_cluster_partitions() function is executed, resulting in list of connect strings for that cluster.

Then RUN statement is processed, optionally executing function if requested. This will tag one or more connections in cluster to be used for query execution.

Then the query is sent to remote server using libpq async API. If there are several remote connections tagged, the execution will happen in parallel. PL/Proxy then waits until it has acquired resultsets from all connections and then returns them to local backend.

How does PL/Proxy handle connections?

It opens them lazily - only when needed. Then keeps them open until it libpq reports error on it or connection lifetime is over - which is by default 2h.

There is a safety hack used - before sending query to already open connection a poll(2) call is run on connection. If poll() shows events the connection is dropped to avoid use of likely broken connection.

Can PL/Proxy survive different settings in local and remote database?

Why does PL/Proxy require the number of partition be power of 2?

There is no deep reason, mostly because of following points:

It would be easy to use mod N internally, but:

So it seems it’s preferable to keep the power-of-2 requirement.

This may seem to require that the number of servers be also power of 2, but this is not so - to make future administration easier it is always preferable to split database into more parts than you immediately need. Such splitting also overcomes the power-of-2 requirement.

For example, if user needs to spread the load over 3 servers, the database can be split to 16 partitions and then 2 servers get 5 partitions and last one 6.


How to partition data?

There are several usage patterns how PL/Proxy can be used to distribute load on several servers

In many of these scenarios good replication software like Londiste from SkyTools is handy.

How to spread single large query over several partitions?

If each partition holds only part of the total data this happens automatically - just use RUN ON ALL.

If the partitions are copies of each other or the query does not follow the split pattern for some other reason, you need to use SPLIT command to give each partition part of the data.

How to do aggregated queries?

Aggregation needs to happen in 3 steps:

  1. Function on partition that does per-partition aggregation.

  2. PL/Proxy function that collects the result of per-partition aggregation. It will return a row for each partition.

  3. Top-level aggregation that does the final aggregation on the resultset of PL/Proxy function. A regular PL/pgSQL function can be used or this can be done outside database by client application.

Note: some of the aggregations cannot be done naively - eg. avg(). Instead each partition must do sum() + count() and the top-level aggregator calculates actual average.

How to add partitions?

The simple way would be to collect data from all partitions together then split it again to new partitions. But that is a waste of resources.

Few things to keep in mind to make the addition easier:

Can I have foreign keys on my data?

Yes, unless the data you want to partition on references itself.

Another common scenario is that there are some big data tables that user wants to partition but they reference various smaller common tables that are not partitionable. In such situation the common tables should be managed from single external database and replicated to each partition. That gives single place to manipulate data and correct transactionality when spreading data out.

What happens if I do updates in remote database?

PL/Proxy is in autocommit mode so if remote function succeeds then changes are automatically committed at once. Special handling is needed if updates are done in both databases. If remote call fails both are rolled back but if remote call succeeds and local updates fail then only local updates are rolled back. Usually PgQ based solutions are used in these situations.

How to handle sequences?

Best way is to use separate ranges for each partition.

In our case, no code uses serials directly, instead they use wrapper function that combines unique ID each database has and plain sequence. That way we don’t need to manage sequences explicitly, instead only thing we need to do is to assign each database unique ID.