Re: beta1 & beta2 & Windows & heavy load

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniel Schuchardt <daniel_schuchardt(at)web(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: beta1 & beta2 & Windows & heavy load
Date: 2004-09-14 05:48:38
Message-ID: 2880.1095140918@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Daniel Schuchardt <daniel_schuchardt(at)web(dot)de> writes:
> Tom Lane schrieb:
>>> Can I see a stack trace from that? Or at least the verbose form of the
>>> error message?

> WARNING: 53200: out of shared memory
> LOCATION: ShmemAlloc, shmem.c:185
> STATEMENT: SELECT count(*) FROM art;

> ERROR: 53200: out of shared memory
> LOCATION: BufTableInsert, buf_table.c:93
> STATEMENT: SELECT count(*) FROM art;

Hmm. Okay, I think I see what is going on here. dynahash's HASH_REMOVE
operation sticks the freed space into a freelist associated with the
particular hashtable, but it never releases it for "general" use (and
since we do not have any general-purpose alloc/free code for shared
memory, there's really no way it could do that).

So what is happening is that the subxact-open loop creates new locktable
hash entries until it's run the free space in shared memory down to nil,
and then it errors out. The lock hash entries are then released ... but
only to the freelist associated with the lock table. If the shared hash
table for the buffer pool needs to grow afterwards, it's out of luck.

Had you been running the server for very long before forcing the error,
I don't think this would have happened, because the buffer hashtable
would have already expanded to its full working size.

I can think of various tweaks we could make to the hash management code
to avoid this scenario, but I'm not sure that any of those cures aren't
worse than the disease: they all seem to reduce the flexibility of
shared memory allocation instead of increasing it. And I don't want
to create a full-fledged alloc/free package for shared memory --- the
bang-for-buck ratio for that is way too low. So I'm inclined to leave
this alone. Once we fix subxacts to not hold their XID locks after
subcommit, the probability of a problem should go back down to the same
low value that's allowed us to ignore this risk for the past many years.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Satoshi Nagayasu 2004-09-14 06:07:02 sort statistics and functions
Previous Message Tom Lane 2004-09-14 01:21:41 Re: Cleaning up recovery from subtransaction start failure