Re: [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)

From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)
Date: 2012-10-15 18:54:33
Message-ID: 507C5BE9.3090000@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/15/2012 08:44 PM, Andres Freund wrote:
> On Monday, October 15, 2012 08:38:07 PM Hannu Krosing wrote:
>> On 10/11/2012 01:42 PM, Andres Freund wrote:
>>> On Thursday, October 11, 2012 09:15:47 AM Heikki Linnakangas wrote:
>>> ...
>>> If the only meaningful advantage is reducing the amount of WAL written,
>>> I can't help thinking that we should just try to address that in the
>>> existing solutions, even if it seems "easy to solve at a first glance,
>>> but a solution not using a normal transactional table for its log/queue
>>> has to solve a lot of problems", as the document says.
>>> Youre welcome to make suggestions, but everything I could think of that
>>> didn't fall short of reality ended up basically duplicating the amount
>>> of writes & fsyncs, even if not going through the WAL.
>>>
>>> You need to be crash safe/restartable (=> writes, fsyncs) and you need to
>>> reduce the writes (in memory, => !writes). There is only one
>>> authoritative point where you can rely on a commit to have been
>>> successfull and thats when the commit record has been written to the
>>> WAL. You can't send out the data to be committed before thats written
>>> because that could result in spuriously committed transactions on the
>>> remote side and you can't easily do it afterwards because you can crash
>>> after the commit.
>> Just curious here, but do you know how is this part solved in current sync
>> wal replication - you can get "spurious" commits on slave side id master
>> dies while waiting for confirmation.
> Synchronous replication is only synchronous in respect to the COMMIT reply sent
> to the user. First the commit is written to WAL locally, so it persists across
> a crash (c.f. RecordTransactionCommit). Only then we wait for the standby
> (SyncRepWaitForLSN). After that finished the shared memory on the primary gets
> updated (c.f. ProcArrayEndTransaction in CommitTransaction) and soon after that
> the user gets the response to the COMMIT back.
>
> I am not really sure what you were asking for, does the above explanation
> answer this?
I think I mostly got it if master crashes before the commit confirmation
comes back then it _will_ get it after restart.

To client it looks like it doid not commit, but it is no different in this
respect than any other crash-before-confirmation and thus client can
not rely on commit not happening and has to check it.
>
> Greetings,
>
> Andres

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-10-15 18:59:21 Re: [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)
Previous Message Andres Freund 2012-10-15 18:49:57 Re: Deprecating Hash Indexes