Re: Transactions over pathological TCP connections

From: Leon Smith <leon(dot)p(dot)smith(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Transactions over pathological TCP connections
Date: 2012-06-19 22:06:55
Message-ID: CAPwAf1kGXm6YVKgAumdPR+Hu98Lpbzf8fWxwf4xwS6bmk9DRHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I just realized this is essentially an instance of the Two General's
Problem; which is something I feel should have been more obvious to me.

On Tue, Jun 19, 2012 at 5:50 PM, Leon Smith <leon(dot)p(dot)smith(at)gmail(dot)com> wrote:

> On Tue, Jun 19, 2012 at 11:59 AM, Robert Haas <robertmhaas(at)gmail(dot)com>wrote:
>
>> On Tue, Jun 19, 2012 at 1:56 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > The transaction would be committed before a command success report is
>> > delivered to the client, so I don't think delivered-and-not-marked is
>> > possible.
>>
>> ...unless you have configured synchronous_commit=off, or fsync=off.
>>
>> Or unless your disk melts into a heap of slag and you have to restore
>> from backup. You can protect against that last case using synchronous
>> replication.
>>
>
>
> But hard disk failure isn't in the failure model I was concerned about.
> =) To be perfectly honest, I'm not too concerned with either hard drive
> failure or network failure, as we are deploying on Raid 1+0 database
> server talking to the client over a redundant LAN, and using asynchronous
> (Slony) replication to an identical database server just in case. No
> single point of failure is a key philosophy of this app from top to bottom.
> Like I said, this is mostly idle curiosity.
>
> But I'm also accustomed to trying to get work done on shockingly
> unreliable internet connections. As a result, network failure is
> something I think about quite a lot when writing networked applications.
> So this is not entirely idle curiosity either.
>
> And thinking about this a bit more, it's clear that the database has to
> commit before the result is sent, on the off chance that the transaction
> fails and needs to be retried. And that an explicit transaction block
> isn't really a solution either, because a "BEGIN; SELECT dequeue_row()"
> would get the row to the client without marking it as taken, but the
> pathological TCP disconnect could then attack the following "COMMIT;",
> leaving the client to think that the row has not been actually taken when
> it in fact has.
>
> It's not clear to me that this is even a solvable problem without
> modifying the schema to include both a "taken" and a "finished processing"
> state, and then letting elements be re-delievered after a period of time.
> But this would then allow a pathological demon with the power to cause
> TCP connects have a single element delivered and processed multiple times.
>
> In any case, thanks for the responses...
>
> Best,
> Leon
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-06-19 22:08:50 Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
Previous Message Simon Riggs 2012-06-19 22:02:06 Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node