Re: [BUGS] Out of memory error causes Abort, Abort tries to allocate memory

Lists: pgsql-bugspgsql-hackers
From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Out of memory error causes Abort, Abort tries to allocate memory
Date: 2006-10-25 18:54:38
Message-ID: 1161802478.31124.43.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I found the root cause of the bug I reported at:

http://archives.postgresql.org/pgsql-bugs/2006-10/msg00211.php

What happens is this:
* Out of memory condition causes an ERROR
* ERROR triggers an AbortTransaction()
* AbortTransaction() calls RecordTransactionAbort()
* RecordTransactionAbort calls smgrGetPendingDeletes()
* smgrGetPendingDeletes() calls palloc()
* palloc() fails, resulting in ERROR, causing infinite recursion
* elog.c detects infinite recursion, and elevates it to PANIC

I'm not sure how easy this is to fix, but I asked on IRC and got some
agreement that this is a bug.

It seems to me, in order to fix it, we would have to avoid allocating
memory on the AbortTransaction path. All smgrGetPendingDeletes() needs
to allocate is a few dozen bytes (depending on the number of relations
to be deleted). Perhaps it could allocate those bytes as list of pending
deletes fills up. Or maybe we can somehow avoid needing to record the
relnodes to be deleted in order for the abort to succeed.

I'm still not sure why foreign keys on large insert statements don't eat
memory on 7.4, but do on 8.0+.

Regards,
Jeff Davis


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Out of memory error causes Abort, Abort tries to allocate memory
Date: 2006-10-25 19:20:49
Message-ID: 20061025192049.GA9340@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Jeff Davis wrote:
> I found the root cause of the bug I reported at:
>
> http://archives.postgresql.org/pgsql-bugs/2006-10/msg00211.php
>
> What happens is this:
> * Out of memory condition causes an ERROR
> * ERROR triggers an AbortTransaction()
> * AbortTransaction() calls RecordTransactionAbort()
> * RecordTransactionAbort calls smgrGetPendingDeletes()
> * smgrGetPendingDeletes() calls palloc()
> * palloc() fails, resulting in ERROR, causing infinite recursion
> * elog.c detects infinite recursion, and elevates it to PANIC
>
> I'm not sure how easy this is to fix, but I asked on IRC and got some
> agreement that this is a bug.

Hmm, maybe we could have AbortTransaction switch to ErrorContext, which
has some preallocated space, before calling RecordTransactionAbort (or
maybe have RecordTransactionAbort itself do it).

Problem is, what happens if ErrorContext is filled up by doing this? At
that point we will be severely fscked up, and you probably won't get the
PANIC either. (Maybe it doesn't happen in this particular case, but
seems a real risk.)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Out of memory error causes Abort, Abort tries to
Date: 2006-10-25 21:03:42
Message-ID: 1161810222.31124.70.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, 2006-10-25 at 16:20 -0300, Alvaro Herrera wrote:
> Jeff Davis wrote:
> > I found the root cause of the bug I reported at:
> >
> > http://archives.postgresql.org/pgsql-bugs/2006-10/msg00211.php
> >
> > What happens is this:
> > * Out of memory condition causes an ERROR
> > * ERROR triggers an AbortTransaction()
> > * AbortTransaction() calls RecordTransactionAbort()
> > * RecordTransactionAbort calls smgrGetPendingDeletes()
> > * smgrGetPendingDeletes() calls palloc()
> > * palloc() fails, resulting in ERROR, causing infinite recursion
> > * elog.c detects infinite recursion, and elevates it to PANIC
> >
> > I'm not sure how easy this is to fix, but I asked on IRC and got some
> > agreement that this is a bug.
>
> Hmm, maybe we could have AbortTransaction switch to ErrorContext, which
> has some preallocated space, before calling RecordTransactionAbort (or
> maybe have RecordTransactionAbort itself do it).
>
> Problem is, what happens if ErrorContext is filled up by doing this? At
> that point we will be severely fscked up, and you probably won't get the
> PANIC either. (Maybe it doesn't happen in this particular case, but
> seems a real risk.)
>

If we have a way to allocate memory and recover if it fails, perhaps
RecordTransactionAbort() could set the "rels to delete" part of the log
record to some special value that means "There might be relations to
delete, but I don't know which ones". Then, if necessary, it could
determine the relations that should be deleted at recovery time.

This idea assumes that we can figure out which relations are abandoned,
and also assumes that smgrGetPendingDeletes() is the only routine that
allocates memory on the path to abort a transaction due to an out of
memory error.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Out of memory error causes Abort, Abort tries to allocate memory
Date: 2006-10-25 22:15:25
Message-ID: 5606.1161814525@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Jeff Davis wrote:
>> * smgrGetPendingDeletes() calls palloc()
>> * palloc() fails, resulting in ERROR, causing infinite recursion

> Hmm, maybe we could have AbortTransaction switch to ErrorContext, which
> has some preallocated space, before calling RecordTransactionAbort (or
> maybe have RecordTransactionAbort itself do it).

Seems like it'd be smarter to try to free some memory before we push
forward with transaction abort. ErrorContext has only a limited amount
of space ...

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Out of memory error causes Abort, Abort tries to
Date: 2006-10-26 16:35:11
Message-ID: 1161880511.31124.100.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, 2006-10-25 at 18:15 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Jeff Davis wrote:
> >> * smgrGetPendingDeletes() calls palloc()
> >> * palloc() fails, resulting in ERROR, causing infinite recursion
>
> > Hmm, maybe we could have AbortTransaction switch to ErrorContext, which
> > has some preallocated space, before calling RecordTransactionAbort (or
> > maybe have RecordTransactionAbort itself do it).
>
> Seems like it'd be smarter to try to free some memory before we push
> forward with transaction abort. ErrorContext has only a limited amount
> of space ...
>

In the particular case I'm referring to, it's the referential integrity
constraints using all the memory. Is that memory allocated in a
convenient context to free before the abort?

Glancing at the code, I think that it would work to MemoryContextReset()
the query's memory context, because the pending deletes (of the
relnodes) are allocated in TopMemoryContext. After the query's memory
context is reset, there should be plenty of space to finish the abort
within that context.

Is there any data in the query's memory context that needs to be saved
after we know we're aborting?

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [BUGS] Out of memory error causes Abort, Abort tries to allocate memory
Date: 2006-11-22 18:06:56
Message-ID: 2511.1164218816@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Jeff Davis wrote:
> * smgrGetPendingDeletes() calls palloc()
> * palloc() fails, resulting in ERROR, causing infinite recursion

>> Hmm, maybe we could have AbortTransaction switch to ErrorContext, which
>> has some preallocated space, before calling RecordTransactionAbort (or
>> maybe have RecordTransactionAbort itself do it).

> Seems like it'd be smarter to try to free some memory before we push
> forward with transaction abort. ErrorContext has only a limited amount
> of space ...

I've been thinking more about this problem. There are two basic
strategies we could follow to ensure that AbortTransaction has some
room to work in:
A: Try to free space before we start the actual abort.
B: Keep some reserved space for AbortTransaction to use.
(It seems untenable to try not to ever alloc any memory at all during
AbortTransaction.) I'm not sure that either of these can be a 100%
bulletproof solution. As long as there is state data we daren't throw
away until after AbortTransaction, plan A doesn't help if we've filled
memory with that type of data. And plan B doesn't help if
AbortTransaction needs more memory than we reserved; which seems
possible for any acceptable level of reserved space (eg consider cases
with many thousands of subtransactions or deleted files ... we have to
build a very large XLOG Abort record then).

I think our best answer is probably to do some of each. For (A) it
seems that we should try to flush the pending-trigger-event list and
any executor state data that may be hanging around; those are the things
that seem both easy to delete and likely to be pretty large. For (B)
there's basically a choice of whether to try to re-use ErrorContext,
or create a separate context used only for the purposes of running
AbortTransaction. The separate context would avoid any possibility of
entanglement between what are really different subsystems, but OTOH it
seems a bit wasteful.

Comments?

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [BUGS] Out of memory error causes Abort, Abort tries
Date: 2006-11-22 19:29:41
Message-ID: 1164223781.10359.67.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, 2006-11-22 at 13:06 -0500, Tom Lane wrote:
> > Seems like it'd be smarter to try to free some memory before we push
> > forward with transaction abort. ErrorContext has only a limited amount
> > of space ...
>
> I've been thinking more about this problem. There are two basic
> strategies we could follow to ensure that AbortTransaction has some
> room to work in:
> A: Try to free space before we start the actual abort.
> B: Keep some reserved space for AbortTransaction to use.
> (It seems untenable to try not to ever alloc any memory at all during
> AbortTransaction.) I'm not sure that either of these can be a 100%
> bulletproof solution. As long as there is state data we daren't throw

When I was trying to devise a "bulletproof" solution, it seemed the only
way would be to reserve space, but to increase the reserved space
whenever the state changed such that an AbortTransaction would need
extra memory. This didn't seem worth the accounting effort. I made an
attempt, but gave up (I spent all my time in gdb keeping track of which
memory context I was in). If there are a limited number of areas that
increase potential AbortTransaction memory usage, it's a possibility I
suppose. Otherwise, I don't know how we'd expect code authors to know
whether their code increases memory requirements for AbortTransaction.

> away until after AbortTransaction, plan A doesn't help if we've filled
> memory with that type of data. And plan B doesn't help if
> AbortTransaction needs more memory than we reserved; which seems
> possible for any acceptable level of reserved space (eg consider cases
> with many thousands of subtransactions or deleted files ... we have to
> build a very large XLOG Abort record then).
>
> I think our best answer is probably to do some of each. For (A) it
> seems that we should try to flush the pending-trigger-event list and
> any executor state data that may be hanging around; those are the things

That seems like a relatively easy way to eliminate most of the problem.
It might not be 100% bulletproof, but it will drastically reduce the
chances of causing problems in "normal" situations. I'd certainly be
happy with this fix.

> that seem both easy to delete and likely to be pretty large. For (B)
> there's basically a choice of whether to try to re-use ErrorContext,
> or create a separate context used only for the purposes of running
> AbortTransaction. The separate context would avoid any possibility of
> entanglement between what are really different subsystems, but OTOH it
> seems a bit wasteful.
>

Wasteful how? Do you mean that it would clutter the code, or that it
would cause unnecessary overhead?

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] Out of memory error causes Abort, Abort tries to allocate memory
Date: 2006-11-22 20:10:33
Message-ID: 3865.1164226233@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> When I was trying to devise a "bulletproof" solution, it seemed the only
> way would be to reserve space, but to increase the reserved space
> whenever the state changed such that an AbortTransaction would need
> extra memory. This didn't seem worth the accounting effort.

Yeah. The problem I saw with it is that it's only "bulletproof" to the
extent that we get that accounting exactly right, and keep it so over
time. I think that assumption is sufficiently fragile that the extra
safety would be illusory. There's a lot of stuff that happens during
AbortTransaction :-(

>> there's basically a choice of whether to try to re-use ErrorContext,
>> or create a separate context used only for the purposes of running
>> AbortTransaction. The separate context would avoid any possibility of
>> entanglement between what are really different subsystems, but OTOH it
>> seems a bit wasteful.

> Wasteful how? Do you mean that it would clutter the code, or that it
> would cause unnecessary overhead?

Well, it'd be an extra however-many-KB of memory for each backend that
would mostly go unused. The rest of a backend's working memory pretty
much pulls its weight, but a TransactionAbortContext wouldn't. OTOH,
maybe these days a few dozen KB isn't worth worrying about.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] Out of memory error causes Abort, Abort tries to allocate memory
Date: 2006-11-23 02:41:05
Message-ID: 7385.1164249665@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> writes:
>> Wasteful how? Do you mean that it would clutter the code, or that it
>> would cause unnecessary overhead?

> Well, it'd be an extra however-many-KB of memory for each backend that
> would mostly go unused. The rest of a backend's working memory pretty
> much pulls its weight, but a TransactionAbortContext wouldn't. OTOH,
> maybe these days a few dozen KB isn't worth worrying about.

After further thought I concluded that overloading ErrorContext for this
purpose is way too risky, so I've committed changes that create a
separate context for AbortTransaction to use:
http://archives.postgresql.org/pgsql-committers/2006-11/msg00199.php

I think the patch would apply cleanly to 8.1 but have not got time to
check it now. Less sure about older branches. Are we excited about
trying to back-patch this? I think it'd really need rather more testing
before I'd want to stick it into the stable branches ...

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] Out of memory error causes Abort, Abort tries
Date: 2006-11-28 02:22:41
Message-ID: 1164680561.7773.86.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, 2006-11-22 at 21:41 -0500, Tom Lane wrote:
> I wrote:
> > Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> >> Wasteful how? Do you mean that it would clutter the code, or that it
> >> would cause unnecessary overhead?
>
> > Well, it'd be an extra however-many-KB of memory for each backend that
> > would mostly go unused. The rest of a backend's working memory pretty
> > much pulls its weight, but a TransactionAbortContext wouldn't. OTOH,
> > maybe these days a few dozen KB isn't worth worrying about.
>
> After further thought I concluded that overloading ErrorContext for this
> purpose is way too risky, so I've committed changes that create a
> separate context for AbortTransaction to use:
> http://archives.postgresql.org/pgsql-committers/2006-11/msg00199.php
>
> I think the patch would apply cleanly to 8.1 but have not got time to
> check it now. Less sure about older branches. Are we excited about
> trying to back-patch this? I think it'd really need rather more testing
> before I'd want to stick it into the stable branches ...
>

Everything in the patch makes sense to me, and it certainly solved my
test case.

I'm just fine with the fix only in 8.2. I can work around it until I can
upgrade.

Thanks!
Jeff Davis