Re: SSI and Hot Standby

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Florian Pflug" <fgp(at)phlo(dot)org>, Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI and Hot Standby
Date: 2011-01-21 17:37:16
Message-ID: 4D396FEC02000025000399B6@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Anssi Kääriäinen<anssi(dot)kaariainen(at)thl(dot)fi> wrote:

> I am beginning to understand the problem. If you don't mind, here
> is a complete example if somebody else is having troubles
> understanding this.
>
> Let's say we have tables D1 and D2. Both contain a single column,
> id, and a single row. The data in the beginning is as follows:
>
> D1: id = 1
> D2: id = 1
>
> The constrains: D1.id can only be incremented. Whenever D2.id is
> updated, it must be updated to D1.id + 1.
>
> The transactions:
> T1: begin; update D1 set id = id + 1;
> T2: begin; update D2 set id = (select id+1 from D1);
> T1: commit;
> T3: begin; select id from D1; select id from D2; commit; Data
> seen: (2, 1) -- this is a possible state
> T2: commit;
> T4: begin; select id from D1; select id from D2; commit; Data
> seen: (2, 2)
> This is again a possible state. But if we compare this to the
> state seen
> by T3 this is not valid. From state (2, 1) we can not get to state
> (2, 2) without breaking one of the constraints. Thus, the state of
> T3 is not valid in the database.
>
> So, I finally got it! :-) I hope this example will help somebody
> else understand the problem.

Yeah, interesting example. Under SSI, once T3 selects from D2 you
have a dangerous structure, and either T2 or T3 must fail to prevent
the possibility of the sort of anomaly your example demonstrates.
We would prefer to see T2 fail, because if T3 fails it will continue
to fail on retry until T2 completes. We're trying to avoid that
kind of thrashing. If T2 fails and is retried, it will immediately
succeed and generate results consistent with what T3 saw.

When I test your example, though, I'm getting the serialization
failure on T3 rather than T2, so I'd call that a bug. Will
investigate. Thanks again for your tests! You seem to be able to
shake out issues better than anyone else! Once found, fixing them
is not usually very hard, it's coming up with that creative usage
pattern to *find* the problem which is the hard part.

OK if I add this one to our dcheck test suite, too?

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Browne 2011-01-21 17:43:10 Re: Review: compact fsync request queue on overflow
Previous Message Robert Haas 2011-01-21 17:31:43 Re: WIP: RangeTypes