Re: [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)

From: "md(at)rpzdesign(dot)com" <md(at)rpzdesign(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)
Date: 2012-09-22 17:37:09
Message-ID: 505DF745.6080408@rpzdesign.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres, nice job on the writeup.

I think one aspect you are missing is that there must be some way for
the multi-masters to
re-stabilize their data sets and quantify any data loss. You cannot do
this without
some replication intelligence in each row of each table so that no
matter how disastrous
the hardware/internet failure in the cloud, the system can HEAL itself
and keep going, no human beings involved.

I am laying down a standard design pattern of columns for each row:

MKEY - Primary key guaranteed unique across ALL nodes in the CLOUD with
NODE information IN THE KEY. (A876543 vs B876543 or whatever)(network
link UP or DOWN)
CSTP - create time stamp on unix time stamp
USTP - last update time stamp based on unix time stamp
UNODE - Node that updated this record

Many applications already need the above information, might as well
standardize it so external replication logic processing can self heal.

Postgresql tables have optional 32 bit int OIDs, you may want consider
having a replication version of the ROID, replication object ID and then
externalize the primary
key generation into a loadable UDF.

Of course, ALL the nodes must be in contact with each other not allowing
signficant drift on their clocks while operating. (NTP is a starter)

I just do not know of any other way to add self healing without the
above information, regardless of whether you hold up transactions for
synchronous
or let them pass thru asynch. Regardless if you are getting your
replication data from the WAL stream or thru the client libraries.

Also, your replication model does not really discuss busted link
replication operations, where is the intelligence for that in the
operation diagram?

Everytime you package up replication into the core, someone has to tear
into that pile to add some extra functionality, so definitely think
about providing sensible hooks for that extra bit of customization to
override the base function.

Cheers,

marco

On 9/22/2012 11:00 AM, Andres Freund wrote:
> This time I really attached both...
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2012-09-22 17:57:54 Re: Draft release notes complete
Previous Message Andrew Dunstan 2012-09-22 17:06:08 Re: alter enum add value if not exists