Postgres-R: current state of development

Lists: pgsql-hackers
From: Markus Wanner <markus(at)bluegap(dot)ch>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Postgres-R: current state of development
Date: 2008-07-15 16:48:43
Message-ID: 487CD4EB.8070906@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

After having published the source code, I'd like to add some words about
the current state of the project.

Postgres-R is currently capable of replicating tuples via binary change
sets, does proper conflict detection and prevention. It offers three
different timing methods: sync, eager and lazy. Out of those, the eager
method is the most advanced one, which I've been focusing on. However,
for fully synchronous replication, we mainly need to be able to handle
multiple changesets per transaction. This will be necessary anyway to
support long running transactions. Because it simply doesn't make sense
to keep a huge changeset back and send it just before the commit.

A pretty general framework for helper processes is provided. I think
this framework could be used for parallel querying or data loading as
well. The helper processes are ordinary backends which process a single
transaction at a time. But they don't have a client connection, instead
they communicate with a manager via a messaging module based on shared
memory and signals. Within Postgres-R, those helper backends are mostly
called 'remote backends', which is a somewhat misleading name. It's just
a short name for a helper backend which processes a remote transaction.

I've written interfaces to ensemble, spread and an emulated GCS for
testing purposes. The spread interface is still lacking functionality,
the other two should work fine. None of the interfaces is dependent on
external libraries, because I have added asynchronous clients, which
none of the given libraries for ensemble or spread offered, but is
required for the replication manager.

Sequence increments are replicated just fine and sequences feature an
additional per-node cache. The setval() functionality is still missing,
though.

Recovery and initialization must still be done manually, although I've
already done much of the work to synchronize table data. A daunting task
will be the synchronization of the sytsem catalogs. Postgres-R can
currently not replicate any DDL command.

Compared with the WAL log shipping method mentioned in the core team
statement about built-in replication, this is certainly the longer way
to go. But on the other hand it isn't limited to single-master
replication and certainly offers more options for future extensions.

Regards

Markus


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Markus Wanner <markus(at)bluegap(dot)ch>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres-R: current state of development
Date: 2008-07-16 19:55:40
Message-ID: 444E75B1-48C5-4EC0-A9FC-187034C2CA04@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

First, thanks a lot for opening Postgres-R, I hope -core will find in
your code as many good ideas and code as possible :)

Le 15 juil. 08 à 18:48, Markus Wanner a écrit :
> A pretty general framework for helper processes is provided. I think
> this framework could be used for parallel querying or data loading
> as well. The helper processes are ordinary backends which process a
> single transaction at a time. But they don't have a client
> connection, instead they communicate with a manager via a messaging
> module based on shared memory and signals. Within Postgres-R, those
> helper backends are mostly called 'remote backends', which is a
> somewhat misleading name. It's just a short name for a helper
> backend which processes a remote transaction.

Could this framework help the current TODO item to have a concurrent
pg_restore?

The ideas I remember of on this topic where to add the capability for
pg_restore to create all indexes of any given table in parallel as to
benefit from concurrent seqscan improvements of 8.3.

There was also the idea to have pg_restore handle the ALTER TABLE
statements in parallel to the other data copying taking place, this
part maybe requiring more dependancy information than currently
available.

And there was some parallel pg_dump idea floating around too, in order
to give PostgreSQL the capability to saturate high-end hardware at
pg_dump time, as far as I understood this part of the mails.

Of course, reading that an Open Source framework for parallel queries
in PostgreSQL is available, can we skip asking if having the executor
benefit from it for general purpose queries would be doable?

Regards,
- --
dim

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkh+UjwACgkQlBXRlnbh1bmsAwCaAhr4xTeCeGjtuap4sHL04IOP
OL8AoI0yv0qEn1eDt+s0qeajzxyIqRhI
=KaLQ
-----END PGP SIGNATURE-----