Re: Remembering bug #6123

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Remembering bug #6123
Date: 2012-01-14 18:39:34
Message-ID: 4F11778602000025000447DF@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> Well, the bottom line that's concerning me here is whether throwing
> errors is going to push anyone's application into an unfixable
> corner. I'm somewhat encouraged that your Circuit Courts software
> can adapt to it, since that's certainly one of the larger and more
> complex applications out there. Or at least I would be if you had
> actually verified that the CC code was okay with the recently-
> proposed patch versions. Do you have any thorough tests you can run
> against whatever we end up with?

In spite of several attempts over the years to come up with automated
tests of our applications at a level that would show these issues,
we're still dependent on business analysts to run through a standard
list of tests for each release, plus tests designed to exercise code
modified for the release under test. For the release where we went
to PostgreSQL triggers, that included running lists against the
statistics tables to see which trigger functions had not yet been
exercised in testing, until we had everything covered.

To test the new version of this patch, we would need to pick an
application release, and use the patch through the development,
testing, and staging cycles, We would need to look for all triggers
needing adjustment, and make the necessary changes. We would need to
figure out which triggers were important to cover, and ensure that
testing covered all of them.

Given the discussions with my new manager this past week, I'm pretty
sure we can work this into a release that would complete testing and
hit pilot deployment in something like three months, give or take a
little. I can't actually make any promises on that until I talk to
her next week.

>From working through all the BEFORE triggers with UPDATE or DELETE
statements this summer, I'm pretty confident that the ones which
remain can be handled by reissuing the DELETE (we didn't keep any of
the UPDATEs with this pattern) and returning NULL. The most
complicated and troublesome trigger code has to do with purging old
data. I suspect that if we include testing of all purge processes in
that release cycle, we'll shake out just about any issues we have.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Remembering bug #6123
Date: 2012-01-20 17:30:41
Message-ID: 4F1950610200002500044A34@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Tom Lane wrote:

>> Well, the bottom line that's concerning me here is whether
>> throwing errors is going to push anyone's application into an
>> unfixable corner. I'm somewhat encouraged that your Circuit
>> Courts software can adapt to it, since that's certainly one of
>> the larger and more complex applications out there. Or at least I
>> would be if you had actually verified that the CC code was okay
>> with the recently-proposed patch versions. Do you have any
>> thorough tests you can run against whatever we end up with?

> To test the new version of this patch, we would need to pick an
> application release, and use the patch through the development,
> testing, and staging cycles, We would need to look for all
> triggers needing adjustment, and make the necessary changes. We
> would need to figure out which triggers were important to cover,
> and ensure that testing covered all of them.
>
> Given the discussions with my new manager this past week, I'm
> pretty sure we can work this into a release that would complete
> testing and hit pilot deployment in something like three months,
> give or take a little. I can't actually make any promises on that
> until I talk to her next week.

After a couple meetings, I have approval to get this into an
application release currently in development. Assuming that your
patch from the 13th is good for doing the testing, I think I can
post test results in about three weeks. I'll also work on a
follow-on patch to add couple paragraphs and an example of the issue
to the docs by then.

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remembering bug #6123
Date: 2012-01-20 20:34:52
Message-ID: 16646.1327091692@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> After a couple meetings, I have approval to get this into an
> application release currently in development. Assuming that your
> patch from the 13th is good for doing the testing, I think I can
> post test results in about three weeks. I'll also work on a
> follow-on patch to add couple paragraphs and an example of the issue
> to the docs by then.

Well, the issues about wording of the error message are obviously just
cosmetic, but I think we'd better do whatever we intend to do to the
callers of heap_lock_tuple before putting the patch through testing.
I'm inclined to go ahead and make them throw errors, just to see if
that has any effects we don't like.

I'm up to my elbows in planner guts at the moment, but will try to
fix up the patch this weekend if you want.

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Remembering bug #6123
Date: 2012-01-20 21:30:33
Message-ID: 4F1988990200002500044A5A@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I'm up to my elbows in planner guts at the moment, but will try to
> fix up the patch this weekend if you want.

They have scheduled testers to check on this issue next week, so it
would be great to get as close as we can on the stuff that matters.

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remembering bug #6123
Date: 2012-01-22 21:04:31
Message-ID: 9698.1327266271@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

OK, here's an updated version of the patch. I changed the error message
texts as per discussion (except I decided to use one message string for
all the cases rather than saddle translators with several variants).
Also, I put in an error in GetTupleForTrigger, which fixes the
self-reference case I illustrated before (now added to the regression
test). However, I found out that changing the other two callers of
heap_lock_tuple would open an entirely new can of worms, so for now
they still have the historical behavior of ignoring self-updated tuples.

The problem with changing ExecLockRows or EvalPlanQualFetch can be
illustrated by the regression test case it breaks, which basically is

BEGIN;
DECLARE c1 CURSOR FOR SELECT * FROM table FOR UPDATE;
UPDATE table SET ...;
FETCH ALL FROM c1;
COMMIT;

When the FETCH comes to a row that's been updated by the UPDATE, it sees
that row as HeapTupleSelfUpdated with a cmax greater than es_output_cid
(which is the CID assigned to the DECLARE). So if we make these callers
throw an error for the case, coding like the above will fail, which
seems to me to be pretty darn hard to justify. It is not a corner case
that could be caused only by questionable use of trigger side effects.
So that seems to leave us with two choices: (1) ignore the row, or
(2) attempt to lock the latest version instead of the visible version.
(1) is our historical behavior but seems arguably wrong. I tried to
make the patch do (2) but it crashed and burned because heap_lock_tuple
spits up if asked to lock an "invisible" row. We could possibly finesse
that by having EvalPlanQualFetch sometimes pass a CID later than
es_output_cid to heap_lock_tuple, but it seems ticklish. More, I think
it would also take some adjustments to the behavior of
HeapTupleSatisfiesDirty, else we'll not see such tuples in the first
place. So this looks messy, and also rather orthogonal to the current
goals of the patch.

Also, I'm not sure that your testing would exercise such cases at all,
as you have to be using SELECT FOR UPDATE and/or READ COMMITTED mode to
get to any of the relevant code. I gather your software mostly relies
on SERIALIZABLE mode to avoid such issues. So I stopped with this.

regards, tom lane

Attachment Content-Type Size
bug6123-v8.patch text/x-patch 47.1 KB

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Remembering bug #6123
Date: 2012-06-05 17:14:07
Message-ID: 4FCDF80F020000250004809B@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

http://archives.postgresql.org/pgsql-hackers/2012-01/msg01241.php

> OK, here's an updated version of the patch.

I was on vacation after PGCon and just got back to the office
yesterday, so I have just now been able to check on the status of
our testing of this after being asked about it by Tom at PGCon.
He has this listed as an open bug, with testing of his fix by my
organization as the hold-up.

There was some testing of this in January while I was on another
vacation, some triggers were found which worked as intended with the
patch I had hacked together, but which got errors with the stricter
(and safer) patch created by Tom. I pointed out to the developers
some triggers which needed to be changed, and what should be
considered safe coding techniques, but I was then assigned to
several urgent issues and lost track of this one. I have arranged
for testing over the next few days. I will post again with results
when I have them.

-Kevin