Quick Links

slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )

Lists:	pgsql-hackerspgsql-patches

From:	"Jim Nasby" <jnasby(at)pervasive(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",
Date:	2005-10-28 22:10:19
Message-ID:	D1D2D51E3BE3FC4E98598248901F759402C88F12@ausmail2k4.aus.pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > On Fri, Oct 28, 2005 at 05:45:51PM -0400, Tom Lane wrote:
> >> Jim, are you interested
> >> in seeing if this patch makes the problem go away for you?
>
> > Well, this is a production system... what's the risk with
> that patch?
>
> Well, it's utterly untested, which means it might crash your system,
> which is where you are now, no?

Yes, but the crashes are somewhat sporadic and most importantly they don't appear to involve any data loss/corruption. I just don't want to make matters any worse.

In any case, my client's gone home for the weekend, so I doubt anything would happen until Monday.

> > BTW, is it typical to see a 10 difference between asserts
> on and off? My
> > client has a process that was doing 10-20 records/sec with
> asserts on
> > and 90-110 with asserts off.
>
> Not typical, but I can believe there are some code paths like that.

Yeah, they're doing some not-so-good things like row-by-row operations, so that's probably what the issue is. I seem to recall 20% being the penalty that's normally thrown around, so I was just surprised by such a huge difference.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jim Nasby" <jnasby(at)pervasive(dot)com>
Cc:	"Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-10-30 23:17:53
Message-ID:	15543.1130714273@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

OK, I think I see it. The problem is that the code in slru.c is careful
about not modifying state when it doesn't hold the proper lock, but not
so careful about not *inspecting* state without the proper lock. In
particular consider these lines in SimpleLruReadPage (line numbers are
as in CVS tip):

/* Mark the slot read-busy (no-op if it already was) */
277 shared->page_number[slotno] = pageno;
278 shared->page_status[slotno] = SLRU_PAGE_READ_IN_PROGRESS;

...

/* Release shared lock, grab per-buffer lock instead */
287 LWLockRelease(shared->ControlLock);
288 LWLockAcquire(shared->buffer_locks[slotno], LW_EXCLUSIVE);

/*
* Check to see if someone else already did the read, or took
* the buffer away from us. If so, restart from the top.
*/
294 if (shared->page_number[slotno] != pageno ||
295 shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
...

Suppose that we arrive at line 277 when the page is currently being
faulted in by another process (ie, its state is already
read-in-progress). The assignments at lines 277 & 278 are then no-ops,
and we'll block at line 288 waiting for the other guy to finish the I/O
and release the per-buffer lock.

Suppose that after the I/O finishes and before we get to run again, this
buffer sinks back to the bottom of the LRU list and gets chosen to be
replaced with another page. Some other process will then start
executing this code and will change the page number (line 277) and
change the state from clean to read-in-progress (line 278).

The race condition is that this could happen between the tests at lines
294 and 295 in our original process. In that case the original process
would think that the page still needed to be read in, and would set
about doing so. It would discover its error at the Assert at line
308, exactly where Jim reports a problem.

The association we thought we'd noted to recursion through
SlruSelectLRUPage is in part a red herring: the race can occur without
that. However, it's perhaps a bit more probable to occur in that path,
because when SlruSelectLRUPage recurses back to SimpleLruReadPage, we
*know* that there is another process currently doing read-in, and so
the first coincidence is already satisfied.

Still, the race condition window is extremely narrow, only a couple of
instructions. I looked at the assembly output for the compiler
Jim is using, and it looks like this:

cmpl %r13d, 104(%rbp,%r12,4)
je .L155
.L116:
... code for if() body here
...
.L155:
cmpl $1, 72(%rbp,%r12,4)
jne .L116
... code for subsequent lines here

However, if the processor predicts the forward branch not to be taken,
it could waste a few cycles recovering from its mistake, so the window
maybe is a little wider than it appears.

I'd like Jim to test this theory by seeing if it helps to reverse the
order of the if-test elements at lines 294/295, ie make it look like

if (shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS ||
shared->page_number[slotno] != pageno)

This won't do as a permanent patch, because it isn't guaranteed to fix
the problem on machines that don't strongly order writes, but it should
work on Opterons, at least well enough to confirm the diagnosis.

I'm still thinking about how to make a real fix without introducing
another lock/unlock cycle --- we can do that if we have to, but I think
maybe it's possible to fix it without.

SimpleLruWritePage has an identical problem BTW :-(

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pgsql-patches(at)postgreSQL(dot)org
Cc:	"Jim Nasby" <jnasby(at)pervasive(dot)com>, "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>
Subject:	Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-10-31 17:03:54
Message-ID:	8705.1130778234@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

I wrote:
> OK, I think I see it. The problem is that the code in slru.c is careful
> about not modifying state when it doesn't hold the proper lock, but not
> so careful about not *inspecting* state without the proper lock.
> ...
> I'm still thinking about how to make a real fix without introducing
> another lock/unlock cycle --- we can do that if we have to, but I think
> maybe it's possible to fix it without.

Attached is a proposed patch to fix up slru.c so that it's not playing
any games by either reading or writing shared state without holding
the ControlLock for the SLRU set.

The main problem with the existing code, as I now see it, was a poor
choice of representation of page state: we had states CLEAN, DIRTY, and
WRITE_IN_PROGRESS, which meant that re-dirtying a page while a write was
in progress required setting the state back to DIRTY, which hid the fact
that indeed a write was in progress. So the I/O code was designed to
cope with not knowing whether another write was already in progress. We
can make it a whole lot cleaner by changing the state representation so
that we can tell the difference --- then, we can know before releasing
the ControlLock whether we need to write the page or just wait for
someone else to finish writing it. And that means we can do all the
state-manipulation before releasing the lock.

I could have just added an additional state WRITE_IN_PROGRESS_REDIRTIED,
or some such, but it seemed notationally cleaner to create a separate
"page_dirty" boolean, and reduce the number of states to four (empty,
read-in-progress, valid, write-in-progress). This lets outside code
such as clog.c just set "page_dirty = true" rather than doing a complex
bit of logic to change the state value properly.

The patch is a bit bulky because of the representation change, but the
key changes are localized in SimpleLruReadPage and SimpleLruWritePage.

I think this code is a whole lot cleaner than it was before, but it's
a bit of a large change to be making so late in the 8.1 cycle (not to
mention that we really ought to back-patch similar changes all the way
back, because the bug exists as far back as 7.2). I am going to take
another look to see if there is a less invasive change that will fix
the problem at some performance cost; if so, we might want to do it that
way in 8.1 and back branches.

Any comments on the patch, or thoughts on how to proceed?

regards, tom lane

Attachment	Content-Type	Size
slru-race-1.patch	application/octet-stream	26.3 KB

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-10-31 17:42:38
Message-ID:	20051031174238.GF20349@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote:
> I'd like Jim to test this theory by seeing if it helps to reverse the
> order of the if-test elements at lines 294/295, ie make it look like
>
> if (shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS ||
> shared->page_number[slotno] != pageno)
>
> This won't do as a permanent patch, because it isn't guaranteed to fix
> the problem on machines that don't strongly order writes, but it should
> work on Opterons, at least well enough to confirm the diagnosis.

Given your proposed fix on -patches, do you still need me to test this?
Also, is there any heap corruption risk associated with this patch?

I'm also wondering what the effect of this is when assertions are turned
off. My client had to go back to running with assertions turned off
because of the performance impact. Are they now risking data corruption?
Is there a way to turn on the assertion just in this code segment?

This incident has made me wonder if it's worth creating two classes of
assertions. The (hopefully more common) set of assertions would be for
things that shouldn't happen, but if go un-caught won't result in heap
corruption. A new set (well, existing asserts, but just re-classified)
would be for things that if uncaught could result in heap corruption. My
hope is that the set of critical assertions could be turned on by
default, helping to identify race conditions and other bugs that
conventional testing is unlikely to find.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-10-31 17:46:19
Message-ID:	20051031174619.GG20349@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Sorry, two more things...

Will increasing shared_buffers make this less likely to occur? Or is
this just something that's likely to happen when there are things like
seqscans that are putting buffers near the front of the LRU? (The 8.0.3
buffer manager does something like that, right?)

Is this something that a test case can be created for? I know someone
submitted a framework for doing concurrent testing...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-patches(at)postgresql(dot)org, Jim Nasby <jnasby(at)pervasive(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject:	Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-10-31 17:56:45
Message-ID:	200510311756.j9VHuj510045@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Good analysis. I guess the question is what patch would we put into a
subrelease? If you go for a new state code, rather than a separate
boolean, does it reduce the size of the patch?

---------------------------------------------------------------------------

Tom Lane wrote:
> I wrote:
> > OK, I think I see it. The problem is that the code in slru.c is careful
> > about not modifying state when it doesn't hold the proper lock, but not
> > so careful about not *inspecting* state without the proper lock.
> > ...
> > I'm still thinking about how to make a real fix without introducing
> > another lock/unlock cycle --- we can do that if we have to, but I think
> > maybe it's possible to fix it without.
>
> Attached is a proposed patch to fix up slru.c so that it's not playing
> any games by either reading or writing shared state without holding
> the ControlLock for the SLRU set.
>
> The main problem with the existing code, as I now see it, was a poor
> choice of representation of page state: we had states CLEAN, DIRTY, and
> WRITE_IN_PROGRESS, which meant that re-dirtying a page while a write was
> in progress required setting the state back to DIRTY, which hid the fact
> that indeed a write was in progress. So the I/O code was designed to
> cope with not knowing whether another write was already in progress. We
> can make it a whole lot cleaner by changing the state representation so
> that we can tell the difference --- then, we can know before releasing
> the ControlLock whether we need to write the page or just wait for
> someone else to finish writing it. And that means we can do all the
> state-manipulation before releasing the lock.
>
> I could have just added an additional state WRITE_IN_PROGRESS_REDIRTIED,
> or some such, but it seemed notationally cleaner to create a separate
> "page_dirty" boolean, and reduce the number of states to four (empty,
> read-in-progress, valid, write-in-progress). This lets outside code
> such as clog.c just set "page_dirty = true" rather than doing a complex
> bit of logic to change the state value properly.
>
> The patch is a bit bulky because of the representation change, but the
> key changes are localized in SimpleLruReadPage and SimpleLruWritePage.
>
> I think this code is a whole lot cleaner than it was before, but it's
> a bit of a large change to be making so late in the 8.1 cycle (not to
> mention that we really ought to back-patch similar changes all the way
> back, because the bug exists as far back as 7.2). I am going to take
> another look to see if there is a less invasive change that will fix
> the problem at some performance cost; if so, we might want to do it that
> way in 8.1 and back branches.
>
> Any comments on the patch, or thoughts on how to proceed?
>
> regards, tom lane
>
>

Content-Description: slru-race-1.patch

[ Type application/octet-stream treated as attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	pgsql-patches(at)postgresql(dot)org, Jim Nasby <jnasby(at)pervasive(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject:	Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-10-31 18:00:35
Message-ID:	12634.1130781635@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> If you go for a new state code, rather than a separate
> boolean, does it reduce the size of the patch?

No, and it certainly wouldn't improve my level of confidence in it ...

regards, tom lane

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-10-31 18:01:14
Message-ID:	200510311801.j9VI1EN10692@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Jim C. Nasby wrote:
> On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote:
> > I'd like Jim to test this theory by seeing if it helps to reverse the
> > order of the if-test elements at lines 294/295, ie make it look like
> >
> > if (shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS ||
> > shared->page_number[slotno] != pageno)
> >
> > This won't do as a permanent patch, because it isn't guaranteed to fix
> > the problem on machines that don't strongly order writes, but it should
> > work on Opterons, at least well enough to confirm the diagnosis.
>
> Given your proposed fix on -patches, do you still need me to test this?
> Also, is there any heap corruption risk associated with this patch?

Because it is a test, I am not sure there is any way to know what the
possible impact of a bug is. If we knew there were bug in the patch,
it would have been fixed already.

> I'm also wondering what the effect of this is when assertions are turned
> off. My client had to go back to running with assertions turned off
> because of the performance impact. Are they now risking data corruption?
> Is there a way to turn on the assertion just in this code segment?
>
> This incident has made me wonder if it's worth creating two classes of
> assertions. The (hopefully more common) set of assertions would be for
> things that shouldn't happen, but if go un-caught won't result in heap
> corruption. A new set (well, existing asserts, but just re-classified)
> would be for things that if uncaught could result in heap corruption. My
> hope is that the set of critical assertions could be turned on by
> default, helping to identify race conditions and other bugs that
> conventional testing is unlikely to find.

That is probably overkill. Running with test patches isn't something we
expect folks to do often.

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-patches(at)postgresql(dot)org, Jim Nasby <jnasby(at)pervasive(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject:	Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-10-31 18:01:58
Message-ID:	200510311801.j9VI1wP10746@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > If you go for a new state code, rather than a separate
> > boolean, does it reduce the size of the patch?
>
> No, and it certainly wouldn't improve my level of confidence in it ...

Well, then what real options do we have? It seems the patch is just
required for all branches.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-10-31 18:05:06
Message-ID:	12691.1130781906@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote:
>> This won't do as a permanent patch, because it isn't guaranteed to fix
>> the problem on machines that don't strongly order writes, but it should
>> work on Opterons, at least well enough to confirm the diagnosis.

> Given your proposed fix on -patches, do you still need me to test this?

Yes; we still need to verify that my theory actually explains your
problem. Given that I'm positing that you can repeatedly hit a
two-instruction window, this is by no means a sure thing. We need
it tested (and with asserts on, so that we can tell if it's fixed
the problem or not).

> Also, is there any heap corruption risk associated with this patch?

Look, Jim, I'm trying to help you fix this. Are you going to help or not?
If you want some kind of written guarantee, you're not going to get one.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	pgsql-patches(at)postgresql(dot)org, Jim Nasby <jnasby(at)pervasive(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject:	Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-10-31 18:13:52
Message-ID:	14263.1130782432@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Well, then what real options do we have? It seems the patch is just
> required for all branches.

I think it would be possible to fix it in a less invasive way by taking
and releasing the ControlLock an extra time in SimpleLruReadPage and
SimpleLruWritePage. What's indeterminate about that is the performance
cost. In situations where there's not a lot of SLRU I/O traffic it's
presumably negligible, but in a case like Jim's where there's evidently
a *whole* lot of traffic, it might be a killer.

regards, tom lane

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-10-31 18:17:05
Message-ID:	200510311817.j9VIH5Y12628@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Tom Lane wrote:
> "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote:
> >> This won't do as a permanent patch, because it isn't guaranteed to fix
> >> the problem on machines that don't strongly order writes, but it should
> >> work on Opterons, at least well enough to confirm the diagnosis.
>
> > Given your proposed fix on -patches, do you still need me to test this?
>
> Yes; we still need to verify that my theory actually explains your
> problem. Given that I'm positing that you can repeatedly hit a
> two-instruction window, this is by no means a sure thing. We need
> it tested (and with asserts on, so that we can tell if it's fixed
> the problem or not).
>
> > Also, is there any heap corruption risk associated with this patch?
>
> Look, Jim, I'm trying to help you fix this. Are you going to help or not?
> If you want some kind of written guarantee, you're not going to get one.

I think we can say Jim gets his money back if he finds a bug. :-)

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-10-31 18:17:36
Message-ID:	20051031181736.GI20349@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Mon, Oct 31, 2005 at 01:05:06PM -0500, Tom Lane wrote:
> "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote:
> >> This won't do as a permanent patch, because it isn't guaranteed to fix
> >> the problem on machines that don't strongly order writes, but it should
> >> work on Opterons, at least well enough to confirm the diagnosis.
>
> > Given your proposed fix on -patches, do you still need me to test this?
>
> Yes; we still need to verify that my theory actually explains your
> problem. Given that I'm positing that you can repeatedly hit a
> two-instruction window, this is by no means a sure thing. We need
> it tested (and with asserts on, so that we can tell if it's fixed
> the problem or not).

Ok, I'll work on getting this tested. Just to clarify, if this fixes it
then the problem wouldn't occur, or would we just see a different
assert?

> > Also, is there any heap corruption risk associated with this patch?
>
> Look, Jim, I'm trying to help you fix this. Are you going to help or not?
> If you want some kind of written guarantee, you're not going to get one.

Of course not, and I'm not looking for one. On the otherhand, I don't
want to recommend something on a production system without understanding
what kind of risks are involved, and unfortunately much of this is still
over my head. I would really like to have a better idea of what the
impact of this bug is.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-patches(at)postgresql(dot)org, Jim Nasby <jnasby(at)pervasive(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject:	Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-10-31 18:19:53
Message-ID:	200510311819.j9VIJrD12991@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Well, then what real options do we have? It seems the patch is just
> > required for all branches.
>
> I think it would be possible to fix it in a less invasive way by taking
> and releasing the ControlLock an extra time in SimpleLruReadPage and
> SimpleLruWritePage. What's indeterminate about that is the performance
> cost. In situations where there's not a lot of SLRU I/O traffic it's
> presumably negligible, but in a case like Jim's where there's evidently
> a *whole* lot of traffic, it might be a killer.

To me a performance problem is much harder get reports on and to locate
than a real fix to the problem. I think if a few people eyeball the
patch, it is OK for application. Are backpatches significantly
different?

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-10-31 18:23:20
Message-ID:	20051031182320.GJ20349@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Mon, Oct 31, 2005 at 01:01:14PM -0500, Bruce Momjian wrote:
> > This incident has made me wonder if it's worth creating two classes of
> > assertions. The (hopefully more common) set of assertions would be for
> > things that shouldn't happen, but if go un-caught won't result in heap
> > corruption. A new set (well, existing asserts, but just re-classified)
> > would be for things that if uncaught could result in heap corruption. My
> > hope is that the set of critical assertions could be turned on by
> > default, helping to identify race conditions and other bugs that
> > conventional testing is unlikely to find.
>
> That is probably overkill. Running with test patches isn't something we
> expect folks to do often.

I wasn't thinking about test patches.

My assumption is that the asserts that are currently in place fall into
one of two categories: either they check for something that if false
could result in data corruption in the heap, or they check for something
that shouldn't happen, but if it does it can't corrupt the heap. If that
assumption is correct then seperating them might make it easier to run
with the set of critical asserts turned on. Currently, there can be a
substantial performance penalty with all asserts turned on, but I
suspect a lot of that penalty is from asserts in things like parsing and
planning code; code that pretty much couldn't corrupt data.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	pgsql-patches(at)postgresql(dot)org, Jim Nasby <jnasby(at)pervasive(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject:	Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-10-31 18:27:54
Message-ID:	16200.1130783274@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> To me a performance problem is much harder get reports on and to locate
> than a real fix to the problem. I think if a few people eyeball the
> patch, it is OK for application. Are backpatches significantly
> different?

Well, the logic is the same all the way back, but the code has changed
textually quite a bit since 7.4 and even more since 7.3. I think the
patch would apply reasonably cleanly to 8.0, but adjusting it for 7.x
will take a bit of work, which would mean those versions would probably
need to be reviewed separately.

One possible compromise is to use this patch in 8.x and a simpler patch
in 7.x --- people who are very concerned about performance ought to be
running 8.x anyway ;-)

regards, tom lane

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-10-31 18:34:17
Message-ID:	200510311834.j9VIYHd15035@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Jim C. Nasby wrote:
> On Mon, Oct 31, 2005 at 01:01:14PM -0500, Bruce Momjian wrote:
> > > This incident has made me wonder if it's worth creating two classes of
> > > assertions. The (hopefully more common) set of assertions would be for
> > > things that shouldn't happen, but if go un-caught won't result in heap
> > > corruption. A new set (well, existing asserts, but just re-classified)
> > > would be for things that if uncaught could result in heap corruption. My
> > > hope is that the set of critical assertions could be turned on by
> > > default, helping to identify race conditions and other bugs that
> > > conventional testing is unlikely to find.
> >
> > That is probably overkill. Running with test patches isn't something we
> > expect folks to do often.
>
> I wasn't thinking about test patches.
>
> My assumption is that the asserts that are currently in place fall into
> one of two categories: either they check for something that if false
> could result in data corruption in the heap, or they check for something
> that shouldn't happen, but if it does it can't corrupt the heap. If that
> assumption is correct then seperating them might make it easier to run
> with the set of critical asserts turned on. Currently, there can be a
> substantial performance penalty with all asserts turned on, but I
> suspect a lot of that penalty is from asserts in things like parsing and
> planning code; code that pretty much couldn't corrupt data.

There is no way if the system has some incorrect value whether that
would later corrupt the data or not. Anything the system does that it
shouldn't do is a potential corruption problem.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	pgsql-patches(at)postgresql(dot)org, Jim Nasby <jnasby(at)pervasive(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject:	Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-10-31 18:46:35
Message-ID:	17961.1130784395@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

I wrote:
> I think it would be possible to fix it in a less invasive way by taking
> and releasing the ControlLock an extra time in SimpleLruReadPage and
> SimpleLruWritePage. What's indeterminate about that is the performance
> cost.

Attached is an alternative patch that does it this way. I realized that
we could use LWLockConditionalAcquire to usually avoid any extra lock
traffic, so the performance cost may be negligible except under the very
heaviest of loads. I still like the bigger patch for 8.2 and forward,
because it's a lot cleaner, but this seems like a credible alternative
for 8.1 and back branches.

Comments?

regards, tom lane

Attachment	Content-Type	Size
slru-race-2.patch	application/octet-stream	8.2 KB

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-patches(at)postgresql(dot)org, Jim Nasby <jnasby(at)pervasive(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject:	Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-10-31 18:50:40
Message-ID:	200510311850.j9VIoe016872@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

OK, this is the way to fix for 8.0 and earlier. It is up to you about
8.1. I think we can handle the larger patch if we do another RC.

---------------------------------------------------------------------------

Tom Lane wrote:
> I wrote:
> > I think it would be possible to fix it in a less invasive way by taking
> > and releasing the ControlLock an extra time in SimpleLruReadPage and
> > SimpleLruWritePage. What's indeterminate about that is the performance
> > cost.
>
> Attached is an alternative patch that does it this way. I realized that
> we could use LWLockConditionalAcquire to usually avoid any extra lock
> traffic, so the performance cost may be negligible except under the very
> heaviest of loads. I still like the bigger patch for 8.2 and forward,
> because it's a lot cleaner, but this seems like a credible alternative
> for 8.1 and back branches.
>
> Comments?
>
> regards, tom lane
>

Content-Description: slru-race-2.patch

[ Type application/octet-stream treated as attachment, skipping... ]

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	pgsql-patches(at)postgresql(dot)org, Jim Nasby <jnasby(at)pervasive(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject:	Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-10-31 19:19:46
Message-ID:	18150.1130786386@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> OK, this is the way to fix for 8.0 and earlier. It is up to you about
> 8.1. I think we can handle the larger patch if we do another RC.

Well, I'd like not to do another RC, so I'll hold the larger patch for
8.2.

We still need a test to confirm it fixes Jim's problem though.
Jim, if you like you can test the second proposed patch instead of
that off-the-cuff line swapping. However, either one will need to
be run with Asserts on in order to have any confidence that the problem
is fixed. If performance is an issue, most of the performance hit is
probably coming from memory context checking, so what I'd suggest you
do is comment out these two #defines in src/include/pg_config_manual.h:
#define CLOBBER_FREED_MEMORY
#define MEMORY_CONTEXT_CHECKING
That should let you build with --enable-cassert and not take quite
so much speed hit.

regards, tom lane

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-10-31 19:30:32
Message-ID:	20051031193032.GL20349@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Mon, Oct 31, 2005 at 01:34:17PM -0500, Bruce Momjian wrote:
> There is no way if the system has some incorrect value whether that
> would later corrupt the data or not. Anything the system does that it
> shouldn't do is a potential corruption problem.

But is it safe to say that there are areas where a failed assert is far
more likely to result in data corruption? And that there's also areas
where there's likely to be difficult/impossible to find bugs, such as
race conditions? ISTM that it would be valuable to do some additional
checking in these critical areas.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Gregory Maxwell <gmaxwell(at)gmail(dot)com>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-10-31 20:46:15
Message-ID:	e692861c0510311246y4f9409feh692e1de15ba1c5d3@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On 10/31/05, Jim C. Nasby <jnasby(at)pervasive(dot)com> wrote:
> On Mon, Oct 31, 2005 at 01:34:17PM -0500, Bruce Momjian wrote:
> > There is no way if the system has some incorrect value whether that
> > would later corrupt the data or not. Anything the system does that it
> > shouldn't do is a potential corruption problem.
> But is it safe to say that there are areas where a failed assert is far
> more likely to result in data corruption? And that there's also areas
> where there's likely to be difficult/impossible to find bugs, such as
> race conditions? ISTM that it would be valuable to do some additional
> checking in these critical areas.

There are, no doubt, also places where an assert has minimal to no
performance impact. I'd wager a guess that the intersection of low
impact asserts, and asserts which measure high risk activities, is
small enough to be uninteresting.

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-10-31 23:47:31
Message-ID:	20051031234731.GQ20349@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Now that I've got a little better idea of what this code does, I've
noticed something interesting... this issue is happening on an 8-way
machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't this
greatly increase the odds of buffer conflicts? Bug aside, would it be
better to set NUM_SLRU_BUFFERS higher for a larger number of CPUs?

Also, something else to note is that this database can see a pretty high
transaction rate... I just checked and it was doing 200TPS, but iirc it
can hit 1000+ TPS during the day.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-11-01 00:02:59
Message-ID:	20051101000259.GE12906@surnet.cl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Jim C. Nasby wrote:
> Now that I've got a little better idea of what this code does, I've
> noticed something interesting... this issue is happening on an 8-way
> machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't this
> greatly increase the odds of buffer conflicts? Bug aside, would it be
> better to set NUM_SLRU_BUFFERS higher for a larger number of CPUs?

We had talked about increasing NUM_SLRU_BUFFERS depending on
shared_buffers, but it didn't get done. Something to consider for 8.2
though. I think you could have better performance by increasing that
setting, while at the same time dimishing the possibility that the race
condition appears.

I think you should also consider increasing PGPROC_MAX_CACHED_SUBXIDS
(src/include/storage/proc.h), because that should decrease the chance
that the subtrans area needs to be scanned. By how much, however, I
wouldn't know -- it depends on the number of subtransactions you
typically have; I guess you could activate the measuring code in
procarray.c to get a figure.

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
www.google.com: interfaz de línea de comando para la web.

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-11-01 00:56:04
Message-ID:	20051101005604.GU20349@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Mon, Oct 31, 2005 at 09:02:59PM -0300, Alvaro Herrera wrote:
> Jim C. Nasby wrote:
> > Now that I've got a little better idea of what this code does, I've
> > noticed something interesting... this issue is happening on an 8-way
> > machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't this
> > greatly increase the odds of buffer conflicts? Bug aside, would it be
> > better to set NUM_SLRU_BUFFERS higher for a larger number of CPUs?
>
> We had talked about increasing NUM_SLRU_BUFFERS depending on
> shared_buffers, but it didn't get done. Something to consider for 8.2
> though. I think you could have better performance by increasing that
> setting, while at the same time dimishing the possibility that the race
> condition appears.

Ok, I'll look into that. This database is definately having issues due
to the sheer transaction volume, so maybe that will help.

If NUM_SLRU_BUFFERS were to be tied to something, wouldn't it make more
sense to tie it to wal_buffers though? One example is a data warehouse
might have a very high shared_buffers, but most likely won't have a high
transaction rate. ISTM that most databases with a high transaction rate
are likely to have increased wal_buffers.

> I think you should also consider increasing PGPROC_MAX_CACHED_SUBXIDS
> (src/include/storage/proc.h), because that should decrease the chance
> that the subtrans area needs to be scanned. By how much, however, I
> wouldn't know -- it depends on the number of subtransactions you
> typically have; I guess you could activate the measuring code in
> procarray.c to get a figure.

AFAIK they're not using subtransactions at all, but I'll check.

Is there anywhere this stuff is documented other than in code? It sounds
like an advanced tuning guide would be very valuable for environments
like this one...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-11-01 03:12:59
Message-ID:	21073.1130814779@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> AFAIK they're not using subtransactions at all, but I'll check.

Well, yeah, they are ... else you'd never have seen this failure.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-11-01 14:23:55
Message-ID:	20051101142355.GB17904@surnet.cl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Tom Lane wrote:
> "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > AFAIK they're not using subtransactions at all, but I'll check.
>
> Well, yeah, they are ... else you'd never have seen this failure.

Maybe it's in plpgsql EXCEPTION clauses.

--
Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 17.7", W 73º 14' 26.8"
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu. Five minutes later I realize that it's also talking
about food" (Donald Knuth)

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-11-01 18:49:08
Message-ID:	20051101184908.GC20349@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Tue, Nov 01, 2005 at 11:23:55AM -0300, Alvaro Herrera wrote:
> Tom Lane wrote:
> > "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > > AFAIK they're not using subtransactions at all, but I'll check.
> >
> > Well, yeah, they are ... else you'd never have seen this failure.
>
> Maybe it's in plpgsql EXCEPTION clauses.

They say they're not using either.

Doesn't clog use the same code?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-11-01 18:49:44
Message-ID:	20051101184944.GD20349@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Err, I forgot they're using Slony, which is probably using savepoints
and/or exceptions.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-11-01 19:21:29
Message-ID:	27298.1130872889@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> Doesn't clog use the same code?

Yeah, but all three of your examples were referencing pg_subtrans,
as proven by the stack traces and the contents of the shared control
block.

Even though the bug seems completely clog.c's fault, this is not a
coincidence: if subtransactions are being used heavily, then pg_subtrans
would have far greater I/O volume than any of the other clog-managed
logs, and hence have more exposure to the race condition.

We really ought to fix that code so that pg_subtrans can have more
buffers than pg_clog...

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", )
Date:	2005-11-01 19:22:38
Message-ID:	27317.1130872958@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

I wrote:
> Even though the bug seems completely clog.c's fault,

s/clog.c/slru.c/ of course :-(

regards, tom lane

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition
Date:	2005-11-01 19:41:44
Message-ID:	603bmg9mvb.fsf@dba2.int.libertyrms.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

jnasby(at)pervasive(dot)com ("Jim C. Nasby") writes:

> On Tue, Nov 01, 2005 at 11:23:55AM -0300, Alvaro Herrera wrote:
>> Tom Lane wrote:
>> > "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
>> > > AFAIK they're not using subtransactions at all, but I'll check.
>> >
>> > Well, yeah, they are ... else you'd never have seen this failure.
>>
>> Maybe it's in plpgsql EXCEPTION clauses.
>
> Err, I forgot they're using Slony, which is probably using savepoints
> and/or exceptions.

Slony-I does use exceptions in pretty conventional ways; it does *not*
make any use of subtransactions, because it needs to run on PG 7.3 and
7.4 that do not support subtransactions.
--
(format nil "~S(at)~S" "cbbrowne" "acm.org")
http://www3.sympatico.ca/cbbrowne/linuxxian.html
"I can't believe my room doesn't have Ethernet! Why wasn't it wired
when the house was built?"
"The house was built in 1576."
-- Alex Kamilewicz on the Oxford breed of `conference American.'

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition
Date:	2005-11-01 19:50:57
Message-ID:	60slug87vi.fsf@dba2.int.libertyrms.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

jnasby(at)pervasive(dot)com ("Jim C. Nasby") writes:
> AFAIK they're not using subtransactions at all, but I'll check.

Are they perchance using pl/PerlNG?

We discovered a problem with Slony-I's handling of subtransactions
which was exposed by pl/PerlNG, which evidently wraps its SPI calls
inside subtransactions.

For more details...
<http://gborg.postgresql.org/project/slony1/bugs/bugupdate.php?1293>

That is the only subtransaction issue I am aware of...
--
select 'cbbrowne' || '@' || 'acm.org';
http://www.ntlug.org/~cbbrowne/nonrdbms.html
:FATAL ERROR -- ERROR IN ERROR HANDLER

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition
Date:	2005-11-01 22:42:37
Message-ID:	60oe547zxe.fsf@dba2.int.libertyrms.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

alvherre(at)alvh(dot)no-ip(dot)org (Alvaro Herrera) writes:

> Chris Browne wrote:
>> jnasby(at)pervasive(dot)com ("Jim C. Nasby") writes:
>>
>> > On Tue, Nov 01, 2005 at 11:23:55AM -0300, Alvaro Herrera wrote:
>> >> Tom Lane wrote:
>> >> > "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
>> >> > > AFAIK they're not using subtransactions at all, but I'll check.
>> >> >
>> >> > Well, yeah, they are ... else you'd never have seen this failure.
>> >>
>> >> Maybe it's in plpgsql EXCEPTION clauses.
>> >
>> > Err, I forgot they're using Slony, which is probably using savepoints
>> > and/or exceptions.
>>
>> Slony-I does use exceptions in pretty conventional ways; it does *not*
>> make any use of subtransactions, because it needs to run on PG 7.3 and
>> 7.4 that do not support subtransactions.
>
> Hmm, does it use the BEGIN/EXCEPTION/END construct at all? Because if
> it does, it won't work on 7.4; and if it doesn't, then it isn't using
> savepoints in 8.0 either.

Ah, then I was misreading that.

There are instances of RAISE EXCEPTION, which was what I had in mind,
but not of BEGIN/EXCEPTION/END.

There is some logic in 8.x to *detect* the nesting of transactions,
but that's quite another matter.
--
output = ("cbbrowne" "@" "acm.org")
http://cbbrowne.com/info/multiplexor.html
"If I could find a way to get [Saddam Hussein] out of there, even
putting a contract out on him, if the CIA still did that sort of a
thing, assuming it ever did, I would be for it." -- Richard M. Nixon

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-11-02 04:11:06
Message-ID:	87br13k7tx.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:

> Jim C. Nasby wrote:
>
> > My assumption is that the asserts that are currently in place fall into
> > one of two categories: either they check for something that if false
> > could result in data corruption in the heap, or they check for something
> > that shouldn't happen, but if it does it can't corrupt the heap. If that
> > assumption is correct then seperating them might make it easier to run
> > with the set of critical asserts turned on. Currently, there can be a
> > substantial performance penalty with all asserts turned on, but I
> > suspect a lot of that penalty is from asserts in things like parsing and
> > planning code; code that pretty much couldn't corrupt data.
>
> There is no way if the system has some incorrect value whether that
> would later corrupt the data or not. Anything the system does that it
> shouldn't do is a potential corruption problem.

That's true but the reason why is subtler than that. If something has happened
that "can't happen" then there's no way to know how you arrived in that
situation. You might already have major problems that can lead to data
corruption -- or for that matter have already caused data corruption..

I happen to think that except for the rare assertion that has major
performance impact all the assertions should be on in production builds. The
goal of assertions is to catch corruption quickly and that's something that's
just as important in production as it is in debugging.

--
greg

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-11-02 04:45:12
Message-ID:	1433.1130906712@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Greg Stark <gsstark(at)mit(dot)edu> writes:
> I happen to think that except for the rare assertion that has major
> performance impact all the assertions should be on in production builds. The
> goal of assertions is to catch corruption quickly and that's something that's
> just as important in production as it is in debugging.

You seem not to have read the documentation:

<term><option>--enable-cassert</option></term>

Enables <firstterm>assertion</> checks in the server, which test for
many <quote>can't happen</> conditions. This is invaluable for
code development purposes, but the tests slow things down a little.
Also, having the tests turned on won't necessarily enhance the
stability of your server! The assertion checks are not categorized
for severity, and so what might be a relatively harmless bug will
still lead to server restarts if it triggers an assertion
failure. Currently, this option is not recommended for
production use, but you should have it on for development work
or when running a beta version.

The great thing about Assert() is that you can throw one in for any
condition that your code is assuming-without-proof, without having to
think too hard about consequences. If we were to recommend having
enable-cassert on in production databases, we would need a MUCH higher
standard of care about when to use Assert. I would bet that ninety
percent of the Asserts in the existing code are on conditions that could
represent, at worst, corruption of backend-local or even
transaction-local data structures. Taking down the entire database
cluster for that is not something that sounds like a stability-enhancing
tradeoff to me.

In other words, the "you don't know how bad it might be" theory has a
flip side: you can't cry wolf when there's no wolf, either, at least not
if you want to continue to be listened to.

regards, tom lane

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-11-02 12:03:57
Message-ID:	8764rbjlxu.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > I happen to think that except for the rare assertion that has major
> > performance impact all the assertions should be on in production builds. The
> > goal of assertions is to catch corruption quickly and that's something that's
> > just as important in production as it is in debugging.
>
> You seem not to have read the documentation:

Sure I have, I just disagree.

> I would bet that ninety percent of the Asserts in the existing code are on
> conditions that could represent, at worst, corruption of backend-local or
> even transaction-local data structures. Taking down the entire database
> cluster for that is not something that sounds like a stability-enhancing
> tradeoff to me.

It may be minor corruption or it may be that the reason for the minor
corruption comes from some larger bug. It may also be backend-local or
transaction-local corruption at the time the assert catches it but cause major
damage by the time it actually crashes a non-assert-enabled database.

--
greg

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-11-02 18:45:44
Message-ID:	20051102184544.GI55520@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Wed, Nov 02, 2005 at 07:03:57AM -0500, Greg Stark wrote:
> > I would bet that ninety percent of the Asserts in the existing code are on
> > conditions that could represent, at worst, corruption of backend-local or
> > even transaction-local data structures. Taking down the entire database
> > cluster for that is not something that sounds like a stability-enhancing
> > tradeoff to me.
>
> It may be minor corruption or it may be that the reason for the minor
> corruption comes from some larger bug. It may also be backend-local or
> transaction-local corruption at the time the assert catches it but cause major
> damage by the time it actually crashes a non-assert-enabled database.

Agreed. Personally I'd want to know about anything that corrupts my
data, no matter what the locality. I would also argue that if people are
seeing 'minor' asserts firing in production that there's a bug that
needs to be tracked down.

IF it comes out that there's some asserts that can be fired even though
there's not anything really bad happening, they could always be
relegated to a second class of assert that's not normally turned on.

BTW, that's a reversal from what I was originally arguing for, which was
due to the performance penalty associated with --enable-cassert. My
client is now running with Tom's suggestion of commenting out
CLOBBER_FREED_MEMORY and MEMORY_CONTEXT_CHECKING and performance is
good. It appears to be as good as it was with asserts disabled. So I
think it would definately be good to break those options out from
--enable-cassert. That makes it viable to run with asserts in
production, at least from a performance standpoint.

BTW, they're also running with patch2 now. Previously, with asserts
turned on and without the patch, they were seeing assert failures on
average of 2/day. So hopefully tomorrow we'll have an idea if the patch
fixed this or not.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-11-02 19:04:09
Message-ID:	7534.1130958249@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> BTW, that's a reversal from what I was originally arguing for, which was
> due to the performance penalty associated with --enable-cassert. My
> client is now running with Tom's suggestion of commenting out
> CLOBBER_FREED_MEMORY and MEMORY_CONTEXT_CHECKING and performance is
> good. It appears to be as good as it was with asserts disabled.

Interesting. I've always wondered whether the "debug_assertions" GUC
variable is worth the electrons it's printed on. If you are running
with asserts active, that variable actually slows things down, by
requiring an additional bool test for every Assert. I suppose the
motivation was to allow the same compiled executable to be used for both
assert-enabled and assert-disabled runs, but how many people really need
that capability?

regards, tom lane

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-11-02 22:51:54
Message-ID:	20051102225154.GR55520@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Wed, Nov 02, 2005 at 02:04:09PM -0500, Tom Lane wrote:
> "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > BTW, that's a reversal from what I was originally arguing for, which was
> > due to the performance penalty associated with --enable-cassert. My
> > client is now running with Tom's suggestion of commenting out
> > CLOBBER_FREED_MEMORY and MEMORY_CONTEXT_CHECKING and performance is
> > good. It appears to be as good as it was with asserts disabled.
>
> Interesting. I've always wondered whether the "debug_assertions" GUC
> variable is worth the electrons it's printed on. If you are running
> with asserts active, that variable actually slows things down, by
> requiring an additional bool test for every Assert. I suppose the
> motivation was to allow the same compiled executable to be used for both
> assert-enabled and assert-disabled runs, but how many people really need
> that capability?

Not sure how that relates to CLOBBER_FREED_MEMORY and
MEMORY_CONTEXT_CHECKING :P, but I agree that it doesn't make sense to
have a GUC, at least not if asserts default to being compiled out.

Hrm... does debug_assertions end up changing assert_enabled?

BTW, is MEMORY_CONTEXT_CHECKING that expensive? It seems like it
shouldn't be, but I'm only guessing at what exactly it does...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-11-03 16:01:24
Message-ID:	20051103160124.GB7138@surnet.cl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Jim C. Nasby wrote:

> BTW, is MEMORY_CONTEXT_CHECKING that expensive? It seems like it
> shouldn't be, but I'm only guessing at what exactly it does...

Yes, because not only it checks marker bytes at the end of palloc chunks
and similar stuff, but it also overwrites whole contexts with 0x7f when
they are reset.

May I propose to make Assert() yield only WARNINGs, and take out the
most expensive parts of MEMORY_CONTEXT_CHECKING, when --enable-cassert
is disabled? Enabling it would trigger the current FailedAssertion and
all of MEMORY_CONTEXT_CHECKING. That way we get all bug reports without
the expensive runtime costs for in-production systems.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-11-03 16:11:42
Message-ID:	22539.1131034302@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> May I propose to make Assert() yield only WARNINGs,

That is a horrid idea --- for one thing, it means that Asserts inside
the elog machinery itself would be instant infinite recursion, and even
elsewhere you'd have to think a bit about whether it's ok to call the
elog machinery. Plus, once you *have* detected an assertion failure,
allowing the code to keep running is just silly.

Either they dump core or they're disabled, there is no third option.

I do think it would be reasonable to fix things so that
MEMORY_CONTEXT_CHECKING could be turned on and off at runtime.

Perhaps rather than an all-or-nothing debug_assertions GUC variable,
what we want is something that turns on or off "expensive" assertion
checks at runtime. This could include MEMORY_CONTEXT_CHECKING and
anything else where the actual checking of the condition is nontrivial.
(For instance, there is code in list.c that grovels over the whole
list for a consistency check --- that is "expensive". There is some
code in the bufmgr that scans through all the buffers --- ditto.)

regards, tom lane

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-11-03 21:34:24
Message-ID:	20051103213424.GJ55520@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Thu, Nov 03, 2005 at 11:11:42AM -0500, Tom Lane wrote:
> Perhaps rather than an all-or-nothing debug_assertions GUC variable,
> what we want is something that turns on or off "expensive" assertion
> checks at runtime. This could include MEMORY_CONTEXT_CHECKING and
> anything else where the actual checking of the condition is nontrivial.
> (For instance, there is code in list.c that grovels over the whole
> list for a consistency check --- that is "expensive". There is some
> code in the bufmgr that scans through all the buffers --- ditto.)

Two levels of assertions sounds like a great idea... wish I'd thought of
it! ;P

Seriously, I am wondering about the performance hit of always checking
debug_assertions.
http://archives.postgresql.org/pgsql-hackers/2005-08/msg00389.php
indicates that even with debug_assertions=false, --enable-cassert has a
big performance impact.

I don't really see any reason for debug_assertions unless we start
defaulting to assertions being compiled in. If we're going to split
things up maybe the expensive assertions you mention should get a
different macro so that there's no performance hit unless you
specifically compile them in. Michael's email tends to indicate that
going the other way around (one macro, two GUC's) wouldn't do any good.

BTW, I can do some testing of assert performance impact with dbt2 on a
Solaris box if anyone's interested in the data...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2005-11-03 23:02:27
Message-ID:	4489.1131058947@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> Seriously, I am wondering about the performance hit of always checking
> debug_assertions.
> http://archives.postgresql.org/pgsql-hackers/2005-08/msg00389.php
> indicates that even with debug_assertions=false, --enable-cassert has a
> big performance impact.

That's because MEMORY_CONTEXT_CHECKING happens anyway --- it's not
currently switched off by the GUC variable. I don't think we have any
recent data point about the cost of Asserts per se, except your own
report that it seems minimal. My thought is that it would be even
more minimal if the tests of debug_assertion were removed ;-) ...
except in association with Asserts that are actually testing an
expensive-to-check condition, of which there are very few.

regards, tom lane

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: slru.c race condition (was Re: TRAP: FailedAssertion("!((itemid)->lp_flags
Date:	2006-06-14 22:31:42
Message-ID:	200606142231.k5EMVgZ07750@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Added to TODO:

* Consider increasing internal areas when shared buffers is increased

http://archives.postgresql.org/pgsql-hackers/2005-10/msg01419.php

---------------------------------------------------------------------------

Alvaro Herrera wrote:
> Jim C. Nasby wrote:
> > Now that I've got a little better idea of what this code does, I've
> > noticed something interesting... this issue is happening on an 8-way
> > machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't this
> > greatly increase the odds of buffer conflicts? Bug aside, would it be
> > better to set NUM_SLRU_BUFFERS higher for a larger number of CPUs?
>
> We had talked about increasing NUM_SLRU_BUFFERS depending on
> shared_buffers, but it didn't get done. Something to consider for 8.2
> though. I think you could have better performance by increasing that
> setting, while at the same time dimishing the possibility that the race
> condition appears.
>
> I think you should also consider increasing PGPROC_MAX_CACHED_SUBXIDS
> (src/include/storage/proc.h), because that should decrease the chance
> that the subtrans area needs to be scanned. By how much, however, I
> wouldn't know -- it depends on the number of subtransactions you
> typically have; I guess you could activate the measuring code in
> procarray.c to get a figure.
>
> --
> Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
> www.google.com: interfaz de l?nea de comando para la web.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +