Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect

Lists: pgsql-bugspgsql-hackerspgsql-patches
From: "Dorochevsky,Michel" <michel(dot)dorochevsky(at)softcon(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-bugs(at)postgresql(dot)org, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-23 17:28:22
Message-ID: C12B71B21705A04C9B9C3E9300430CAFA97107@jupiter.softcon-its.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

-----Ursprüngliche Nachricht-----
> Von: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> > "Dorochevsky,Michel" <michel(dot)dorochevsky(at)softcon(dot)de> writes:
> > Here are two panic runs with the Heikki's LOCK_DEBUG patched postgres:
> > www.dorochevsky.de/infos/PostgresPanicProblem-2007-04-23.zip
>
> Ok, this does provide a new clue: the problem table is being locked
> twice (under two different lock types) in the transaction, and for
> some reason the lock object is discarded after the first of these is
> released. Furthermore, it looks like both of the references arise
> indirectly from foreign-key operations.
>
> Since realizing that your test case doesn't seem to be doing anything
> unusual, I've been trying to reproduce it here by tweaking the pgbench
> script to use COMMIT PREPARED instead of just COMMIT. No luck so far,
> but the pgbench test hasn't got any foreign keys, and maybe that's
> important. Can you show us the schema of your database? (pg_dump -s
> output would be great.)
>
Tom,

I am not used to the command line tools. So I made a backup
using the pgadmin GUI. I selected options 'PLAIN format', 'with OIDs' and
'schema only'. See
www.dorochevsky.de/infos/TestSchema.txt

I hope that is what you needed. If not: Could you give me instructions
which options to use in the GUI tool?

> Also, if you haven't rebuilt the schema since the second of these runs,
> which table has OID 433757 now? I think it must be
> requiredqualities_numb5b1c0a0a but would like to confirm that.
>
Table requiredqualities_num5b1c0a0a has OID 43755.

OID 433757 identifies requiredqualities_num5b1c0a0a_pkey which is a primary
key constraint for this table containing both fields of the table.

(The table is used to represent a n:m association between the tables
'action' and 'quality'.)

Best Regards
-- Michel


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Dorochevsky,Michel" <michel(dot)dorochevsky(at)softcon(dot)de>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-bugs(at)postgresql(dot)org, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-23 18:51:51
Message-ID: 11864.1177354311@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

"Dorochevsky,Michel" <michel(dot)dorochevsky(at)softcon(dot)de> writes:
> I am not used to the command line tools. So I made a backup
> using the pgadmin GUI. I selected options 'PLAIN format', 'with OIDs' and
> 'schema only'. See
> www.dorochevsky.de/infos/TestSchema.txt
> I hope that is what you needed.

Yeah, this is great, particularly since it includes the OIDs. However,
the OIDs don't seem to entirely match up with the LOCK_DEBUG output.
I'm wondering if somehow we're locking the wrong OIDs? Hard to believe
a bug like that could've escaped detection though. Still trying to
trace through it to see where things first go wrong. (If anyone else is
looking at this, note that a constraint's index will generally have an
OID one less than the constraint, so you can infer the OIDs of indexes
that aren't explicitly given in the dump.)

regards, tom lane


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Dorochevsky,Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-23 19:30:44
Message-ID: 462D0964.5030609@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Tom Lane wrote:
> "Dorochevsky,Michel" <michel(dot)dorochevsky(at)softcon(dot)de> writes:
>> I am not used to the command line tools. So I made a backup
>> using the pgadmin GUI. I selected options 'PLAIN format', 'with OIDs' and
>> 'schema only'. See
>> www.dorochevsky.de/infos/TestSchema.txt
>> I hope that is what you needed.
>
> Yeah, this is great, particularly since it includes the OIDs. However,
> the OIDs don't seem to entirely match up with the LOCK_DEBUG output.
> I'm wondering if somehow we're locking the wrong OIDs? Hard to believe
> a bug like that could've escaped detection though. Still trying to
> trace through it to see where things first go wrong. (If anyone else is
> looking at this, note that a constraint's index will generally have an
> OID one less than the constraint, so you can infer the OIDs of indexes
> that aren't explicitly given in the dump.)

It looks like the table 433757 is locked three times in the transaction.
First in RowExclusive mode, and then twice in AccessShare mode. At
prepare time, the proclock is correctly transfered to the dummy PGPROC,
with the lock hold in both modes. But when it comes time to commit, the
lock is released twice in AccessShare mode.

Locking the same lock twice is usually handled correctly, I don't
understand why it fails in this case. I'm thinking that the locallock
structs somehow get out of sync with the lock structs in shared memory.
The twophase-records are created from the locallock structs alone, so if
there's extra entries in the locallocks table for some reason, we'd get
the symptoms we have.

Unless you have a better idea, I'd like to add some more debug-prints to
AtPrepare_Locks to see what gets written to the state file and why.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: "Dorochevsky,Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-23 19:47:20
Message-ID: 12597.1177357640@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> Locking the same lock twice is usually handled correctly, I don't
> understand why it fails in this case. I'm thinking that the locallock
> structs somehow get out of sync with the lock structs in shared memory.
> The twophase-records are created from the locallock structs alone, so if
> there's extra entries in the locallocks table for some reason, we'd get
> the symptoms we have.

Hmm. I was just noticing this comment in PostPrepare_Locks:

* We do this separately because we may have multiple locallock entries
* pointing to the same proclock, and we daren't end up with any dangling
* pointers.

I'm not clear at the moment on why such a state would exist, but could
it be related?

> Unless you have a better idea, I'd like to add some more debug-prints to
> AtPrepare_Locks to see what gets written to the state file and why.

Seems like a reasonable thing to pursue.

regards, tom lane


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Dorochevsky,Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-23 21:22:04
Message-ID: 462D237C.2000101@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> Locking the same lock twice is usually handled correctly, I don't
>> understand why it fails in this case. I'm thinking that the locallock
>> structs somehow get out of sync with the lock structs in shared memory.
>> The twophase-records are created from the locallock structs alone, so if
>> there's extra entries in the locallocks table for some reason, we'd get
>> the symptoms we have.
>
> Hmm. I was just noticing this comment in PostPrepare_Locks:
>
> * We do this separately because we may have multiple locallock entries
> * pointing to the same proclock, and we daren't end up with any dangling
> * pointers.
>
> I'm not clear at the moment on why such a state would exist, but could
> it be related?

That caught my eye as well. I'm not sure what the other alternative
would be, that might leave dangling pointers. The comment seems to be
copy-pasted from LockReleaseAll.

>> Unless you have a better idea, I'd like to add some more debug-prints to
>> AtPrepare_Locks to see what gets written to the state file and why.
>
> Seems like a reasonable thing to pursue.

Dave, would you please create a new binary with the attached patch? And
LOCK_DEBUG and assertions and debug enabled.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
atprepare-debug.patch text/x-diff 1.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: "Dorochevsky,Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-23 21:26:48
Message-ID: 13679.1177363608@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> Dave, would you please create a new binary with the attached patch? And
> LOCK_DEBUG and assertions and debug enabled.

Also, it would be worth adding "lockmode" to the set of things printed
by the panic message in the patch I sent earlier. Heikki is assuming
that the thing is trying to release AccessShareLock twice, but I'm not
100% convinced.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: "Dorochevsky,Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-23 21:34:34
Message-ID: 13773.1177364074@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> Hmm. I was just noticing this comment in PostPrepare_Locks:
>>
>> * We do this separately because we may have multiple locallock entries
>> * pointing to the same proclock, and we daren't end up with any dangling
>> * pointers.
>>
>> I'm not clear at the moment on why such a state would exist, but could
>> it be related?

> That caught my eye as well. I'm not sure what the other alternative
> would be, that might leave dangling pointers. The comment seems to be
> copy-pasted from LockReleaseAll.

I remembered what's going on there: we have a locallock entry for each
locktag + lockmode, so if you lock the same object with more than one
mode, there will be multiple locallock entries pointing to the same lock
(and proclock). The comment is worrying about what'd happen if we
deleted shared and local entries in parallel and then errored out
partway through that.

It doesn't appear possible for there to be multiple locallock entries
for the same locktag + lockmode, unless the hashtable code is completely
broken, and I imagine we'd have noticed that before now. I also spent
some time wondering if the 2PC code could mistakenly execute the same
2PC record twice, but that didn't look plausible either. Tis confusing.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-23 21:44:34
Message-ID: 13895.1177364674@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

I wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> Dave, would you please create a new binary with the attached patch? And
>> LOCK_DEBUG and assertions and debug enabled.

> Also, it would be worth adding "lockmode" to the set of things printed
> by the panic message in the patch I sent earlier.

Also: as long as we are building a custom-hacked executable to probe
into this, let's hack it to not remove the 2PC state file, so we can
double check what's really in there. I believe what you'd need to
remove is the RemoveTwoPhaseFile calls at twophase.c line 1583 (where
it thinks it's "stale") and xact.c line 4223 (where it's replaying a
XLOG_XACT_COMMIT_PREPARED WAL record).

regards, tom lane


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-23 22:15:24
Message-ID: 462D2FFC.2080501@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Tom Lane wrote:
> I wrote:
>> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>>> Dave, would you please create a new binary with the attached patch? And
>>> LOCK_DEBUG and assertions and debug enabled.
>
>> Also, it would be worth adding "lockmode" to the set of things printed
>> by the panic message in the patch I sent earlier.
>
> Also: as long as we are building a custom-hacked executable to probe
> into this, let's hack it to not remove the 2PC state file, so we can
> double check what's really in there. I believe what you'd need to
> remove is the RemoveTwoPhaseFile calls at twophase.c line 1583 (where
> it thinks it's "stale") and xact.c line 4223 (where it's replaying a
> XLOG_XACT_COMMIT_PREPARED WAL record).

Yeah, sounds like a good idea.

Patch attached that incorporates all the ideas this far:

1. More verbose PANIC message, including lockmode
2. More debug info in AtPrepare_Locks. I even put a DumpLocks call in
it, that should give us a good picture of what's in the lock structures
at the time of commit
3. Instead of removing twophase-file in recovery, rename it to
*.removed. (it will be ignored by postgresql after that, because it
doesn't follow the normal naming rules of 2PC state files)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
atprepare-debug-2.patch text/x-diff 2.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-24 02:42:38
Message-ID: 19442.1177382558@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Well hot dang, I managed to reproduce this:

$ ./followlog "dbname = tschema" <log.sql
command failed at log line 30522:
COMMIT PREPARED '257_dm0td2luMjAwMHByby8yNTM2AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==_MQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=='
PANIC: failed to re-find shared lock object
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
$

So at least now we know it's not a Windows-specific issue, which is
something I was getting more and more worried about after two days of
not being able to duplicate the crash. I went so far as to make a
program that reads the log excerpt Michel sent, and execute the commands
in sequence on as many different connections as he had active ... and
darn if it doesn't fail. It seems fully reproducible on both 8.2 and
HEAD.

Now to start debugging. We can stop pestering Michel to run various
strange binaries, at least.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-24 04:37:20
Message-ID: 23704.1177389440@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

I wrote:
> Now to start debugging.

It looks to me like the problem is that AtPrepare_Locks invokes
LockTagIsTemp, and that goes and reads various system catalogs, which
can result in new entries in the LOCALLOCK hash table, which might
result in a bucket split in same, which would result in some entries
possibly being scanned twice by the hash_seq_search scan.

Not sure about good fix, except that AtPrepare is probably a really
bad time to be doing fresh catalog searches.

Also, we have a generic issue that making fresh entries in a hashtable
might result in a concurrent hash_seq_search scan visiting existing
entries more than once; that's definitely not something any of the
existing callers are thinking about.

I'm too tired to think about fixes right now, but we've definitely
found a hotbed of actual and potential bugs.

regards, tom lane


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-24 11:51:34
Message-ID: 462DEF46.9000302@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Tom Lane wrote:
> It looks to me like the problem is that AtPrepare_Locks invokes
> LockTagIsTemp, and that goes and reads various system catalogs, which
> can result in new entries in the LOCALLOCK hash table, which might
> result in a bucket split in same, which would result in some entries
> possibly being scanned twice by the hash_seq_search scan.
>
> Not sure about good fix, except that AtPrepare is probably a really
> bad time to be doing fresh catalog searches.

Yep.

A simple fix would be to do the scan in two phases, searching for temp
tables in the first phase, and writing the 2PC records in the second.
We'd still be doing the catalog searches in the 1st phase, and might
therefore split buckets and see some entries twice, but it wouldn't lead
to duplicate 2PC records.

> Also, we have a generic issue that making fresh entries in a hashtable
> might result in a concurrent hash_seq_search scan visiting existing
> entries more than once; that's definitely not something any of the
> existing callers are thinking about.

Ouch. Note that we can also miss some entries altogether, which is
probably even worse.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-24 12:50:17
Message-ID: 462DFD09.8000508@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Heikki Linnakangas wrote:
> Tom Lane wrote:
>> Also, we have a generic issue that making fresh entries in a hashtable
>> might result in a concurrent hash_seq_search scan visiting existing
>> entries more than once; that's definitely not something any of the
>> existing callers are thinking about.
>
> Ouch. Note that we can also miss some entries altogether, which is
> probably even worse.

In case someone is wondering how that can happen, here's an example.
We're scanning a bucket that contains four entries, and we split it
after returning 1:

1 -> 2* -> 3 -> 4

* denotes the next entry the seq scan has stored.

If this is split goes example like this:

1 -> 3
2* -> 4

The seq scan will continue scanning from 2, then 4, and miss 3 altogether.

I briefly went through all callers of hash_seq_init. The only place
where we explicitly rely on being able to add entries to a hash table
while scanning it is in tbm_lossify. There's more complex loops in
portalmem.c and relcache.c, which I think are safe, but would need to
look closer. There's also the pg_prepared_statement
set-returning-function that keeps a scan open across calls, which seems
error-prone.

Should we document the fact that it's not safe to insert new entries to
a hash table while scanning it, and fix the few call sites that do that,
or does anyone see a better solution? One alternative would be to
inhibit bucket splits while a scan is in progress, but then we'd need to
take care to clean up after each scan.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-24 15:06:52
Message-ID: 3393.1177427212@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> I briefly went through all callers of hash_seq_init. The only place
> where we explicitly rely on being able to add entries to a hash table
> while scanning it is in tbm_lossify. There's more complex loops in
> portalmem.c and relcache.c, which I think are safe, but would need to
> look closer. There's also the pg_prepared_statement
> set-returning-function that keeps a scan open across calls, which seems
> error-prone.

The pending-fsync stuff in md.c is also expecting to be able to add
entries during a scan.

I don't think we can go in the direction of forbidding insertions during
a scan --- as the case at hand shows, it's just not always obvious that
that could happen, and finding/fixing such a problem is nigh impossible.
(We were darn fortunate to be able to reproduce this one.) Plus we have
a couple of places where it's really necessary to be able to do it,
anyway.

The only answer I can see that seems reasonably robust is to change
dynahash.c so that it tracks whether any seq_search scans are open on a
hashtable, and doesn't carry out any splits while one is. This wouldn't
cost anything noticeable in performance, assuming that not very many
splits are postponed. The PITA aspect of it is that we'd need to add
bookkeeping mechanisms to ensure that the count of active scans gets
cleaned up on error exit. It's not like we've not got lots of those,
though.

Possibly we could simplify matters a bit by not worrying about cleaning
up leaked counts at subtransaction abort, ie, the list of open scans
would only get forced to empty at top transaction end. This carries a
slightly higher risk of meaningful performance degradation, but in
practice I doubt it's a big problem. If we agreed that then we'd not
need ResourceOwner support --- it could be handled like LWLock counts.

pg_prepared_statement is simply broken --- what if the next-to-scan
statement is deleted between calls? It'll have to be changed.

Comments?

regards, tom lane


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-24 20:04:03
Message-ID: 462E62B3.6080808@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Tom Lane wrote:
> The pending-fsync stuff in md.c is also expecting to be able to add
> entries during a scan.

No, mdsync starts the scan from scratch after calling AbsorbFsyncRequests.

> I don't think we can go in the direction of forbidding insertions during
> a scan --- as the case at hand shows, it's just not always obvious that
> that could happen, and finding/fixing such a problem is nigh impossible.
> (We were darn fortunate to be able to reproduce this one.) Plus we have
> a couple of places where it's really necessary to be able to do it,
> anyway.
>
> The only answer I can see that seems reasonably robust is to change
> dynahash.c so that it tracks whether any seq_search scans are open on a
> hashtable, and doesn't carry out any splits while one is. This wouldn't
> cost anything noticeable in performance, assuming that not very many
> splits are postponed. The PITA aspect of it is that we'd need to add
> bookkeeping mechanisms to ensure that the count of active scans gets
> cleaned up on error exit. It's not like we've not got lots of those,
> though.

We could have two kinds of seq scans, with and without support for
concurrent inserts. If you open a scan without that support, it acts
just like today, and no extra bookkeeping or clean up by the caller is
required. If you need concurrent inserts, we inhibit bucket splits, but
it's up to the caller to explicitly close the scan, possibly with
PG_TRY/CATCH. I'm not sure if that's simpler in the end, but we could
get away without adding generic bookkeeping mechanism.

> Possibly we could simplify matters a bit by not worrying about cleaning
> up leaked counts at subtransaction abort, ie, the list of open scans
> would only get forced to empty at top transaction end. This carries a
> slightly higher risk of meaningful performance degradation, but in
> practice I doubt it's a big problem. If we agreed that then we'd not
> need ResourceOwner support --- it could be handled like LWLock counts.

Hmm. Unlike lwlocks, hash tables can live in different memory contexts,
so we can't just have list of open scans similar to held_lwlocks array.

Do we need to support multiple simultaneous seq scans of a hash table? I
suppose we do..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-24 20:25:40
Message-ID: 8407.1177446340@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> The pending-fsync stuff in md.c is also expecting to be able to add
>> entries during a scan.

> No, mdsync starts the scan from scratch after calling AbsorbFsyncRequests.

That was last month ;-). It doesn't restart any more.

> We could have two kinds of seq scans, with and without support for
> concurrent inserts.

Yeah, I considered that too, but it just seems too error-prone. We
could maybe make it trustworthy by having hash_seq_search complain if
it noticed there had been any concurrent insertions --- but then you're
putting new overhead into hash_seq_search, which kind of defeats the
argument for it (and hash_seq_search is a bit of a bottleneck, so extra
cycles there matter).

> Hmm. Unlike lwlocks, hash tables can live in different memory contexts,
> so we can't just have list of open scans similar to held_lwlocks array.

I had first thought about adding a scan counter to the hashtable control
struct, but the prospect of hash tables being deallocated while the
central list still has references to them seems pretty scary --- we
could find ourselves clobbering some other data structure entirely when
we go to decrement the count. What seems better now is to have an array
or list of HTAB pointers, one for each active scan (so the same
hashtable might appear in the list multiple times). When we are
considering whether to split, we have to look through the list to see if
our table is listed. The list is unlikely to be long so this shouldn't
affect performance. If a hash table is deallocated while we still
think it has an active scan, nothing very bad happens. The absolute
worst possible consequence is if some new hash table gets allocated at
exactly the same spot; we'd inhibit splits on it, which still doesn't
break correctness though it might kill performance. In any case we can
have checking code that complains about leaked scan pointers at
transaction end, so any such bug shouldn't survive long.

For shared hash tables, this design only works for scans being done by
the same backend doing insertion; but locking considerations would
probably require that no other backend inserts while we scan anyway
(you'd need something much more complicated than shared/exclusive locks
to manage it otherwise).

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-25 19:07:21
Message-ID: 3918.1177528041@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

I wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> We could have two kinds of seq scans, with and without support for
>> concurrent inserts.

> Yeah, I considered that too, but it just seems too error-prone. We
> could maybe make it trustworthy by having hash_seq_search complain if
> it noticed there had been any concurrent insertions --- but then you're
> putting new overhead into hash_seq_search, which kind of defeats the
> argument for it (and hash_seq_search is a bit of a bottleneck, so extra
> cycles there matter).

I just finished looking through the uses of hash_seq_search, and
realized that there is one place where it would be a bit painful to
convert to the insertion-safe approach I'm proposing; namely nodeAgg.c.
The places where the hashtable iteration is started and used are
scattered, and we don't really track whether the iteration is done or
not, so it's hard to be sure where to cancel the iteration. It could
probably be made to work but it seems like it'd be fragile.

I still don't want to introduce more checking overhead into
hash_seq_search, though, so what I'm now thinking about is a new
dynahash primitive named something like "hash_freeze", which'd mark a
hashtable as disallowing insertions. If the hashtable is frozen before
hash_seq_init then we don't add it to the central list of scans, and
therefore there is no cleanup to do at the end. nodeAgg can use this
mode since it doesn't modify its hashtable anymore after beginning its
readout scan.

BTW, we didn't really get into details, but for the insertion-safe case
I'm envisioning adding a routine "hash_seq_term", which you would need
to call if and only if you abandon a hash_seq_search scan without
running it to completion (if you do the complete scan, hash_seq_search
will automatically call hash_seq_term before returning NULL). All but
a very small number of places run their searches to completion and
therefore won't require any source code changes with this API.

Thoughts?

regards, tom lane


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-25 21:14:17
Message-ID: 462FC4A9.9020009@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Tom Lane wrote:
> I wrote:
>> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>>> We could have two kinds of seq scans, with and without support for
>>> concurrent inserts.
>
>> Yeah, I considered that too, but it just seems too error-prone. We
>> could maybe make it trustworthy by having hash_seq_search complain if
>> it noticed there had been any concurrent insertions --- but then you're
>> putting new overhead into hash_seq_search, which kind of defeats the
>> argument for it (and hash_seq_search is a bit of a bottleneck, so extra
>> cycles there matter).
>
> I just finished looking through the uses of hash_seq_search, and
> realized that there is one place where it would be a bit painful to
> convert to the insertion-safe approach I'm proposing; namely nodeAgg.c.
> The places where the hashtable iteration is started and used are
> scattered, and we don't really track whether the iteration is done or
> not, so it's hard to be sure where to cancel the iteration. It could
> probably be made to work but it seems like it'd be fragile.
>
> I still don't want to introduce more checking overhead into
> hash_seq_search, though, so what I'm now thinking about is a new
> dynahash primitive named something like "hash_freeze", which'd mark a
> hashtable as disallowing insertions. If the hashtable is frozen before
> hash_seq_init then we don't add it to the central list of scans, and
> therefore there is no cleanup to do at the end. nodeAgg can use this
> mode since it doesn't modify its hashtable anymore after beginning its
> readout scan.

This plan includes having the list of hash tables that mustn't be
expanded? And the list would be cleaned up at the end of transaction, to
avoid leaks.

> BTW, we didn't really get into details, but for the insertion-safe case
> I'm envisioning adding a routine "hash_seq_term", which you would need
> to call if and only if you abandon a hash_seq_search scan without
> running it to completion (if you do the complete scan, hash_seq_search
> will automatically call hash_seq_term before returning NULL). All but
> a very small number of places run their searches to completion and
> therefore won't require any source code changes with this API.

Sounds good to me.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Date: 2007-04-25 21:46:47
Message-ID: 6205.1177537607@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> I still don't want to introduce more checking overhead into
>> hash_seq_search, though, so what I'm now thinking about is a new
>> dynahash primitive named something like "hash_freeze", which'd mark a
>> hashtable as disallowing insertions. If the hashtable is frozen before
>> hash_seq_init then we don't add it to the central list of scans, and
>> therefore there is no cleanup to do at the end. nodeAgg can use this
>> mode since it doesn't modify its hashtable anymore after beginning its
>> readout scan.

> This plan includes having the list of hash tables that mustn't be
> expanded? And the list would be cleaned up at the end of transaction, to
> avoid leaks.

Right, all that's still the same. This is just a way to exempt certain
scans from that machinery, by allowing the caller to declare he doesn't
need to modify the hashtable anymore. AFAICS that covers our needs,
at least for the present.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>
Cc: pgsql-patches(at)postgreSQL(dot)org
Subject: Re: [BUGS] BUG #3245: PANIC: failed to re-find shared lock object
Date: 2007-04-26 19:42:58
Message-ID: 2042.1177616578@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Attached is the complete patch against HEAD to prevent hashtable bucket
splits during hash_seq_search. Any comments before I start
back-porting? I suppose we had better patch this all the way back,
even though AtPrepare_Locks() is the only known trouble spot.

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 28.6 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-bugs(at)postgresql(dot)org, Dave Page <dpage(at)postgresql(dot)org>
Subject: Re: BUG #3245: PANIC: failed to re-find shared lock object
Date: 2007-04-26 23:31:21
Message-ID: 28403.1177630281@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

I wrote:
> Also, we have a generic issue that making fresh entries in a hashtable
> might result in a concurrent hash_seq_search scan visiting existing
> entries more than once; that's definitely not something any of the
> existing callers are thinking about.

> I'm too tired to think about fixes right now, but we've definitely
> found a hotbed of actual and potential bugs.

I've committed a fix for this; it'll be in the next set of releases.

regards, tom lane


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Dorochevsky, Michel" <michel(dot)dorochevsky(at)softcon(dot)de>, pgsql-patches(at)postgreSQL(dot)org
Subject: Re: [BUGS] BUG #3245: PANIC: failed to re-find shared lock object
Date: 2007-04-27 08:44:16
Message-ID: 4631B7E0.4030407@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers pgsql-patches

Tom Lane wrote:
> Attached is the complete patch against HEAD to prevent hashtable bucket
> splits during hash_seq_search. Any comments before I start
> back-porting? I suppose we had better patch this all the way back,
> even though AtPrepare_Locks() is the only known trouble spot.

Looks good to me.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com