Quick Links

Re: SSI patch version 14

Lists:	pgsql-hackers

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	<simon(at)2ndQuadrant(dot)com>,<markus(at)bluegap(dot)ch>, <drkp(at)csail(dot)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 14
Date:	2011-02-08 13:35:48
Message-ID:	4D50F254020000250003A5D9@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Heikki Linnakangas wrote:
> On 08.02.2011 10:43, Kevin Grittner wrote:
>
>> I see that at least three BuildFarm critters don't have UINT64_MAX
>> defined.
>
> I guess we'll have to just #define it ourselves. Or could we just
> pick another magic value, do we actually rely on
> InvalidSerCommitSeqno being higher than all other values anywhere?

It seemed more robust than a low-end number, based on how it's used.

>> Not sure why coypu is running out of connections.
>
> Hmm, it seems to choose a smaller max_connections value now, 20
> instead of 30. Looks like our shared memory usage went up by just
> enough to pass that threshold.
>
> Looks like our shared memory footprint grew by about 2MB with
> default configuration, from 37MB to 39MB. That's quite significant.
> Should we dial down the default of
> max_predicate_locks_per_transaction? Or tweak the sizing of the
> hash tables somehow?

Dialing down max_predicate_locks_per_transaction could cause the user
to see "out of shared memory" errors sooner, so I'd prefer to stay
away from that. Personally, I feel that max_connections is higher
than it should be for machines which would have trouble with the RAM
space, but I suspect I'd have trouble selling the notion that the
default for that should be reduced.

The multiplier of 10 PredXactList structures per connection is kind
of arbitrary. It affects the point at which information is pushed to
the lossy summary, so any number from 2 up will work correctly; it's
a matter of performance and false positive rate. We might want to
put that on a GUC and default it to something lower.

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	<simon(at)2ndQuadrant(dot)com>,<markus(at)bluegap(dot)ch>, <drkp(at)csail(dot)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 14
Date:	2011-02-08 16:14:44
Message-ID:	4D511794020000250003A623@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:

> The multiplier of 10 PredXactList structures per connection is
> kind of arbitrary. It affects the point at which information is
> pushed to the lossy summary, so any number from 2 up will work
> correctly; it's a matter of performance and false positive rate.
> We might want to put that on a GUC and default it to something
> lower.

If the consensus is that we want to add this knob, I can code it up
today. If we default it to something low, we can knock off a large
part of the 2MB increase in shared memory used by SSI in the default
configuration. For those not using SERIALIZABLE transactions the
only impact is that less shared memory will be reserved for
something they're not using. For those who try SERIALIZABLE
transactions, the smaller the number, the sooner performance will
start to drop off under load -- especially in the face of a
long-running READ WRITE transaction. Since it determines shared
memory allocation, it would have to be a restart-required GUC.

I do have some concern that if this defaults to too low a number,
those who try SSI without bumping it and restarting the postmaster
will not like the performance under load very much. SSI performance
would not be affected by a low setting under light load when there
isn't a long-running READ WRITE transaction.

-Kevin

From:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	heikki(dot)linnakangas(at)enterprisedb(dot)com, simon(at)2ndQuadrant(dot)com, markus(at)bluegap(dot)ch, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 14
Date:	2011-02-08 18:34:46
Message-ID:	20110208183446.GX9421@csail.mit.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 08, 2011 at 10:14:44AM -0600, Kevin Grittner wrote:
> I do have some concern that if this defaults to too low a number,
> those who try SSI without bumping it and restarting the postmaster
> will not like the performance under load very much. SSI performance
> would not be affected by a low setting under light load when there
> isn't a long-running READ WRITE transaction.

If we're worried about this, we could add a log message the first time
SummarizeOldestCommittedXact is called, to suggest increasing the GUC
for number of SerializableXacts. This also has the potential benefit of
alerting the user that there's a long-running transaction, in case that's
unexpected (say, if it were caused by a wedged client)

I don't have any particular opinion on what the default value of the
GUC should be.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	simon(at)2ndQuadrant(dot)com, markus(at)bluegap(dot)ch, drkp(at)csail(dot)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 14
Date:	2011-02-08 19:01:38
Message-ID:	4D519312.3080405@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 08.02.2011 18:14, Kevin Grittner wrote:
> I wrote:
>
>> The multiplier of 10 PredXactList structures per connection is
>> kind of arbitrary. It affects the point at which information is
>> pushed to the lossy summary, so any number from 2 up will work
>> correctly; it's a matter of performance and false positive rate.
>> We might want to put that on a GUC and default it to something
>> lower.
>
> If the consensus is that we want to add this knob, I can code it up
> today. If we default it to something low, we can knock off a large
> part of the 2MB increase in shared memory used by SSI in the default
> configuration. For those not using SERIALIZABLE transactions the
> only impact is that less shared memory will be reserved for
> something they're not using. For those who try SERIALIZABLE
> transactions, the smaller the number, the sooner performance will
> start to drop off under load -- especially in the face of a
> long-running READ WRITE transaction. Since it determines shared
> memory allocation, it would have to be a restart-required GUC.
>
> I do have some concern that if this defaults to too low a number,
> those who try SSI without bumping it and restarting the postmaster
> will not like the performance under load very much. SSI performance
> would not be affected by a low setting under light load when there
> isn't a long-running READ WRITE transaction.

Hmm, comparing InitPredicateLocks() and PredicateLockShmemSize(), it
looks like RWConflictPool is missing altogether from the calculations in
PredicateLockShmemSize().

I added an elog to InitPredicateLocks() and PredicateLockShmemSize(), to
print the actual and estimated size. Here's what I got with
max_predicate_locks_per_transaction=10 and max_connections=100:

LOG: shmemsize 635467
LOG: actual 1194392
WARNING: out of shared memory
FATAL: not enough shared memory for data structure "shmInvalBuffer"
(67224 bytes requested)

On the other hand, when I bumped max_predicate_locks_per_transaction to
100, I got:

LOG: shmemsize 3153112
LOG: actual 2339864

Which is a pretty big overestimate, percentage-wise. Taking
RWConflictPool into account in PredicateLockShmemSize() fixes the
underestimate, but makes the overestimate correspondingly larger. I've
never compared the actual and estimated shmem sizes of other parts of
the backend, so I'm not sure how large discrepancies we usually have,
but that seems quite big.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	<simon(at)2ndQuadrant(dot)com>,<markus(at)bluegap(dot)ch>, <drkp(at)csail(dot)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 14
Date:	2011-02-08 19:24:05
Message-ID:	4D5143F5020000250003A655@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> Taking RWConflictPool into account in PredicateLockShmemSize() fixes
the
> underestimate, but makes the overestimate correspondingly larger.
I've
> never compared the actual and estimated shmem sizes of other parts of

> the backend, so I'm not sure how large discrepancies we usually have,

> but that seems quite big.

Looking into it...

-Kevin

From:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 14
Date:	2011-02-08 20:53:05
Message-ID:	20110208205305.GY9421@csail.mit.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

One other nit re. the predicate lock table size GUCs: the out-of-memory
case in RegisterPredicateLockingXid (predicate.c:1592 in my tree) gives
the hint to increase max_predicate_locks_per_transaction. I don't think
that's correct, since that GUC isn't used to size SerializableXidHash.

In fact, that error shouldn't arise at all because if there was room in
PredXact to register the transaction, then there should be room to
register it's xid in SerializableXidHash. Except that it's possible for
something else to allocate all of our shared memory and thus prevent
SerializbleXidHash from reaching its intended max capacity.

In general, it might be worth considering making a HTAB's max_size a
hard limit, but that's a larger issue. Here, it's probably worth just
removing the hint.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	<simon(at)2ndQuadrant(dot)com>,<markus(at)bluegap(dot)ch>, <drkp(at)csail(dot)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 14
Date:	2011-02-08 22:04:39
Message-ID:	4D516997020000250003A67C@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> LOG: shmemsize 3153112
> LOG: actual 2339864
>
> Which is a pretty big overestimate, percentage-wise. Taking
> RWConflictPool into account in PredicateLockShmemSize() fixes the
> underestimate, but makes the overestimate correspondingly larger.
> I've never compared the actual and estimated shmem sizes of other
> parts of the backend, so I'm not sure how large discrepancies we
> usually have, but that seems quite big.

I found two things which probably explain that:

(1) When HTABs are created, there is the max_size, which is what
the PredicateLockShmemSize function must use in its calculations,
and the init_size, which is what will initially be allocated (and
so, is probably what you see in the usage at the end of the
InitPredLocks function). That's normally set to half the maximum.

(2) The predicate lock and lock target initialization code was
initially copied and modified from the code for heavyweight locks.
The heavyweight lock code adds 10% to the calculated maximum size.
So I wound up doing that for PredicateLockTargetHash and
PredicateLockHash, but didn't do it for SerializableXidHassh.
Should I eliminate this from the first two, add it to the third, or
leave it alone?

So if the space was all in HTABs, you might expect shmemsize to be
110% of the estimated maximum, and actual (at the end of the init
function) to be 50% of the estimated maximum. So the shmemsize
would be (2.2 * actual) at that point. The difference isn't that
extreme because the list-based pools now used for some structures
are allocated at full size without padding.

In addition to the omission of the RWConflictPool (which is a
biggie), the OldSerXidControlData estimate was only for a *pointer*
to it, not the structure itself. The attached patch should correct
the shmemsize numbers.

-Kevin

Attachment	Content-Type	Size
ssi-shmemsize.patch	text/plain	1.2 KB

From:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, simon(at)2ndQuadrant(dot)com, markus(at)bluegap(dot)ch, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 14
Date:	2011-02-09 00:23:12
Message-ID:	20110209002312.GB9421@csail.mit.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 08, 2011 at 04:04:39PM -0600, Kevin Grittner wrote:
> (2) The predicate lock and lock target initialization code was
> initially copied and modified from the code for heavyweight locks.
> The heavyweight lock code adds 10% to the calculated maximum size.
> So I wound up doing that for PredicateLockTargetHash and
> PredicateLockHash, but didn't do it for SerializableXidHassh.
> Should I eliminate this from the first two, add it to the third, or
> leave it alone?

Actually, I think for SerializableXidHash we should probably just
initially allocate it at its maximum size. Then it'll match the
PredXact list which is allocated in full upfront, and there's no risk
of being able to allocate a transaction but not register its xid. In
fact, I believe there would be no way for starting a new serializable
transaction to fail.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, simon(at)2ndquadrant(dot)com, markus(at)bluegap(dot)ch, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 14
Date:	2011-02-09 02:09:48
Message-ID:	AANLkTikc8rHjfm_JCSZtwqEHYFO_zr0i=bgP-hiBSWkF@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 8, 2011 at 7:23 PM, Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:
> On Tue, Feb 08, 2011 at 04:04:39PM -0600, Kevin Grittner wrote:
>> (2) The predicate lock and lock target initialization code was
>> initially copied and modified from the code for heavyweight locks.
>> The heavyweight lock code adds 10% to the calculated maximum size.
>> So I wound up doing that for PredicateLockTargetHash and
>> PredicateLockHash, but didn't do it for SerializableXidHassh.
>> Should I eliminate this from the first two, add it to the third, or
>> leave it alone?
>
> Actually, I think for SerializableXidHash we should probably just
> initially allocate it at its maximum size. Then it'll match the
> PredXact list which is allocated in full upfront, and there's no risk
> of being able to allocate a transaction but not register its xid. In
> fact, I believe there would be no way for starting a new serializable
> transaction to fail.

No way to fail is a tall order.

If we don't allocate all the memory up front, does that allow memory
to be dynamically shared between different hash tables in shared
memory? I'm thinking not, but...

Frankly, I think this is an example of how our current shared memory
model is a piece of garbage. Our insistence on using sysv shm, and
only sysv shm, is making it impossible for us to do things that other
products can do easily. My first reaction to this whole discussion
was "who gives a crap about 2MB of shared memory?" and then I said
"oh, right, we do, because it might cause someone who was going to get
24MB of shared buffers to get 16MB instead, and then performance will
suck even worse than it does already". But of course the person
should really be running with 256MB or more, in all likelihood, and
would be happy to have us do that right out of the box if it didn't
require them to do tap-dance around their kernel settings and reboot.

We really need to fix this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, simon(at)2ndquadrant(dot)com, markus(at)bluegap(dot)ch, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 14
Date:	2011-02-09 05:24:19
Message-ID:	20110209052419.GD9421@csail.mit.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 08, 2011 at 09:09:48PM -0500, Robert Haas wrote:
> No way to fail is a tall order.

Well, no way to fail due to running out of shared memory in
RegisterPredicateLock/RegisterPredicateLockingXid, but that doesn't
have quite the same ring to it...

> If we don't allocate all the memory up front, does that allow memory
> to be dynamically shared between different hash tables in shared
> memory? I'm thinking not, but...

Not in a useful way. If we only allocate some of the memory up front,
then the rest goes into the global shmem pool (actually, that has
nothing to do with the hash table per se, just the ShmemSize
calculations), and it's up for grabs for any hash table that wants to
expand, even beyond its declared maximum capacity. But once it's
claimed by a hash table it can't get returned.

This doesn't sound like a feature to me.

In particular, I'd worry that something that allocates a lot of locks
(either of the heavyweight or predicate variety) would fill up the
associated hash table, and then we're out of shared memory for the
other hash tables -- and have no way to get it back short of restarting
the whole system.

> Frankly, I think this is an example of how our current shared memory
> model is a piece of garbage. Our insistence on using sysv shm, and
> only sysv shm, is making it impossible for us to do things that other
> products can do easily. My first reaction to this whole discussion
> was "who gives a crap about 2MB of shared memory?" and then I said
> "oh, right, we do, because it might cause someone who was going to get
> 24MB of shared buffers to get 16MB instead, and then performance will
> suck even worse than it does already". But of course the person
> should really be running with 256MB or more, in all likelihood, and
> would be happy to have us do that right out of the box if it didn't
> require them to do tap-dance around their kernel settings and reboot.

I'm completely with you on this.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	simon(at)2ndQuadrant(dot)com, markus(at)bluegap(dot)ch, drkp(at)csail(dot)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 14
Date:	2011-02-09 10:27:00
Message-ID:	4D526BF4.50806@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 09.02.2011 00:04, Kevin Grittner wrote:
> (1) When HTABs are created, there is the max_size, which is what
> the PredicateLockShmemSize function must use in its calculations,
> and the init_size, which is what will initially be allocated (and
> so, is probably what you see in the usage at the end of the
> InitPredLocks function). That's normally set to half the maximum.

Oh, I see.

> (2) The predicate lock and lock target initialization code was
> initially copied and modified from the code for heavyweight locks.
> The heavyweight lock code adds 10% to the calculated maximum size.
> So I wound up doing that for PredicateLockTargetHash and
> PredicateLockHash, but didn't do it for SerializableXidHassh.
> Should I eliminate this from the first two, add it to the third, or
> leave it alone?

I'm inclined to eliminate it from the first two. Even in
LockShmemSize(), it seems a bit weird to add a safety margin, the sizes
of the lock and proclock hashes are just rough estimates anyway.

> So if the space was all in HTABs, you might expect shmemsize to be
> 110% of the estimated maximum, and actual (at the end of the init
> function) to be 50% of the estimated maximum. So the shmemsize
> would be (2.2 * actual) at that point. The difference isn't that
> extreme because the list-based pools now used for some structures
> are allocated at full size without padding.
>
> In addition to the omission of the RWConflictPool (which is a
> biggie), the OldSerXidControlData estimate was only for a *pointer*
> to it, not the structure itself. The attached patch should correct
> the shmemsize numbers.

The actual and estimated shmem sizes still didn't add up, I still saw
actual usage much higher than estimated size, with max_connections=1000
and max_predicate_locks_per_transaction=10. It turned out to be because:

* You missed that RWConflictPool is sized five times as large as
SerializableXidHash, and

* The allocation for RWConflictPool elements was wrong, while the
estimate was correct.

With these changes, the estimated and actual sizes match closely, so
that actual hash table sizes are 50% of the estimated size as expected.

I fixed those bugs, but this doesn't help with the buildfarm members
with limited shared memory yet.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	David Fetter <david(at)fetter(dot)org>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, simon(at)2ndquadrant(dot)com, markus(at)bluegap(dot)ch, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 14
Date:	2011-02-09 15:16:19
Message-ID:	20110209151619.GA1155@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 08, 2011 at 09:09:48PM -0500, Robert Haas wrote:
> If we don't allocate all the memory up front, does that allow memory
> to be dynamically shared between different hash tables in shared
> memory? I'm thinking not, but...
>
> Frankly, I think this is an example of how our current shared memory
> model is a piece of garbage.

What other model(s) might work better?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	<simon(at)2ndQuadrant(dot)com>,<markus(at)bluegap(dot)ch>, <drkp(at)csail(dot)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 14
Date:	2011-02-09 15:21:37
Message-ID:	4D525CA1020000250003A6CE@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

>> (2) The predicate lock and lock target initialization code was
>> initially copied and modified from the code for heavyweight
>> locks. The heavyweight lock code adds 10% to the calculated
>> maximum size. So I wound up doing that for
>> PredicateLockTargetHash and PredicateLockHash, but didn't do it
>> for SerializableXidHassh. Should I eliminate this from the first
>> two, add it to the third, or leave it alone?
>
> I'm inclined to eliminate it from the first two. Even in
> LockShmemSize(), it seems a bit weird to add a safety margin, the
> sizes of the lock and proclock hashes are just rough estimates
> anyway.

I'm fine with that. Trivial patch attached.

> * You missed that RWConflictPool is sized five times as large as
> SerializableXidHash, and
>
> * The allocation for RWConflictPool elements was wrong, while the
> estimate was correct.
>
> With these changes, the estimated and actual sizes match closely,
> so that actual hash table sizes are 50% of the estimated size as
> expected.
>
> I fixed those bugs

Thanks. Sorry for missing them.

> but this doesn't help with the buildfarm members with limited
> shared memory yet.

Well, if dropping the 10% fudge factor on those two HTABs doesn't
bring it down far enough (which seems unlikely), what do we do? We
could, as I said earlier, bring down the multiplier for the number
of transactions we track in SSI based on the maximum allowed
connections connections, but I would really want a GUC on it if we
do that. We could bring down the default number of predicate locks
per transaction. We could make the default configuration more
stingy about max_connections when memory is this tight. Other
ideas?

I do think that anyone using SSI with a heavy workload will need
something like the current values to see decent performance, so it
would be good if there was some way to do this which would tend to
scale up as they increased something. Wild idea: make the
multiplier equivalent to the bytes of shared memory divided by 100MB
clamped to a minimum of 2 and a maximum of 10?

-Kevin

Attachment	Content-Type	Size
ssi-drop-fudge-factor.patch	text/plain	516 bytes

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	David Fetter <david(at)fetter(dot)org>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, simon(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 14
Date:	2011-02-09 15:38:35
Message-ID:	4D52B4FB.9020306@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/09/2011 04:16 PM, David Fetter wrote:
> On Tue, Feb 08, 2011 at 09:09:48PM -0500, Robert Haas wrote:
>> Frankly, I think this is an example of how our current shared memory
>> model is a piece of garbage.
>
> What other model(s) might work better?

Thread based, dynamically allocatable and resizeable shared memory, as
most other projects and developers use, for example.

My dynshmem work is a first attempt at addressing the allocation part of
that. It would theoretically allow more dynamic use of the overall
fixed amount of shared memory available (instead of requiring every
subsystem to use a fixed fraction of the overall available shared
memory, as is required now).

It has dismissed from CF 2010-07 for good reasons (lacking evidence of
usable performance, possible patent issues (on the allocator chosen),
lots of work for questionable benefit (existing subsystems would have to
be reworked to use that allocator)).

For anybody interested, please search the archives for 'dynshmem'.

Regards

Markus Wanner

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>
Cc:	<simon(at)2ndQuadrant(dot)com>,<markus(at)bluegap(dot)ch>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 14
Date:	2011-02-09 15:58:50
Message-ID:	4D52655A020000250003A6D7@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:

> I think for SerializableXidHash we should probably just initially
> allocate it at its maximum size. Then it'll match the PredXact
> list which is allocated in full upfront, and there's no risk of
> being able to allocate a transaction but not register its xid. In
> fact, I believe there would be no way for starting a new
> serializable transaction to fail.

To be more precise, it would prevent an out of shared memory error
during an attempt to register an xid for an active serializable
transaction. That seems like a good thing. Patch to remove the
hint and initially allocate that HTAB at full size attached.

I didn't attempt to address the larger general issue of one HTAB
stealing shared memory from space calculated to belong to another,
and then holding on to it until the postmaster is shut down.

-Kevin

Attachment	Content-Type	Size
ssi-full-xid-alloc.patch	text/plain	1.8 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	David Fetter <david(at)fetter(dot)org>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, simon(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 14
Date:	2011-02-09 17:25:02
Message-ID:	AANLkTi=reTQNeUEn8u1eNkTQAPJHyK66gVhgwg=5GUG7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 9, 2011 at 10:38 AM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
> On 02/09/2011 04:16 PM, David Fetter wrote:
>> On Tue, Feb 08, 2011 at 09:09:48PM -0500, Robert Haas wrote:
>>> Frankly, I think this is an example of how our current shared memory
>>> model is a piece of garbage.
>>
>> What other model(s) might work better?
>
> Thread based, dynamically allocatable and resizeable shared memory, as
> most other projects and developers use, for example.

Or less invasively, a small sysv shm to prevent the double-postmaster
problem, and allocate the rest using POSIX shm.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	"A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
To:	PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 14
Date:	2011-02-09 19:16:18
Message-ID:	F6E286AF-660F-4FCE-A058-3C02807ABF08@themactionfaction.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Feb 9, 2011, at 12:25 PM, Robert Haas wrote:

> On Wed, Feb 9, 2011 at 10:38 AM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
>> On 02/09/2011 04:16 PM, David Fetter wrote:
>>> On Tue, Feb 08, 2011 at 09:09:48PM -0500, Robert Haas wrote:
>>>> Frankly, I think this is an example of how our current shared memory
>>>> model is a piece of garbage.
>>>
>>> What other model(s) might work better?
>>
>> Thread based, dynamically allocatable and resizeable shared memory, as
>> most other projects and developers use, for example.
>
> Or less invasively, a small sysv shm to prevent the double-postmaster
> problem, and allocate the rest using POSIX shm.

Such a patch was proposed and rejected:
http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/94791
Cheers,
M

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	"A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
Cc:	PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 14
Date:	2011-02-09 19:36:25
Message-ID:	AANLkTikB3T2x2UbYsKv-bT+RBmqBL53vSEG2QvArpPPJ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 9, 2011 at 2:16 PM, A.M. <agentm(at)themactionfaction(dot)com> wrote:
> On Feb 9, 2011, at 12:25 PM, Robert Haas wrote:
>> On Wed, Feb 9, 2011 at 10:38 AM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
>>> On 02/09/2011 04:16 PM, David Fetter wrote:
>>>> On Tue, Feb 08, 2011 at 09:09:48PM -0500, Robert Haas wrote:
>>>>> Frankly, I think this is an example of how our current shared memory
>>>>> model is a piece of garbage.
>>>>
>>>> What other model(s) might work better?
>>>
>>> Thread based, dynamically allocatable and resizeable shared memory, as
>>> most other projects and developers use, for example.
>>
>> Or less invasively, a small sysv shm to prevent the double-postmaster
>> problem, and allocate the rest using POSIX shm.
>
> Such a patch was proposed and rejected:
> http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/94791

I know. We need to revisit that for 9.2 and un-reject it. It's nice
that PostgreSQL can run on my thermostat, but it isn't nice that
that's the only place where it delivers the expected level of
performance.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	David Fetter <david(at)fetter(dot)org>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, simon(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 14
Date:	2011-02-09 19:51:44
Message-ID:	4D52F050.7050504@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/09/2011 06:25 PM, Robert Haas wrote:
> On Wed, Feb 9, 2011 at 10:38 AM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
>> Thread based, dynamically allocatable and resizeable shared memory, as
>> most other projects and developers use, for example.

I didn't mean to say we should switch to that model. It's just *the*
other model that works (whether or not it's better in general or for
Postgres is debatable).

> Or less invasively, a small sysv shm to prevent the double-postmaster
> problem, and allocate the rest using POSIX shm.

..which allows ftruncate() to resize, right? That's the main benefit
over sysv shm which we currently use.

ISTM that addresses the resizing-of-the-overall-shared-memory question,
but doesn't that require dynamic allocation or some other kind of
book-keeping? Or do you envision all subsystems to have to
re-initialize their new (grown or shrunken) chunk of it?

Regards

Markus Wanner

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	David Fetter <david(at)fetter(dot)org>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, simon(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 14
Date:	2011-02-09 20:10:52
Message-ID:	AANLkTikGEM+PfTZuutMDgQuUjWfJpdnkB-aOLxQifq5_@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 9, 2011 at 2:51 PM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
> On 02/09/2011 06:25 PM, Robert Haas wrote:
>> On Wed, Feb 9, 2011 at 10:38 AM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
>>> Thread based, dynamically allocatable and resizeable shared memory, as
>>> most other projects and developers use, for example.
>
> I didn't mean to say we should switch to that model. It's just *the*
> other model that works (whether or not it's better in general or for
> Postgres is debatable).
>
>> Or less invasively, a small sysv shm to prevent the double-postmaster
>> problem, and allocate the rest using POSIX shm.
>
> ..which allows ftruncate() to resize, right? That's the main benefit
> over sysv shm which we currently use.
>
> ISTM that addresses the resizing-of-the-overall-shared-memory question,
> but doesn't that require dynamic allocation or some other kind of
> book-keeping? Or do you envision all subsystems to have to
> re-initialize their new (grown or shrunken) chunk of it?

Basically, I'd be happy if all we got out of it was freedom from the
oppressive system shared memory limits. On a modern system, it's
hard to imagine that the default for shared_buffers should be less
than 256MB, but that blows out the default POSIX shared memory
allocation limits on every operating system I use, and some of those
need a reboot to fix it. That's needlessly reducing performance and
raising the barrier of entry for new users. I am waiting for the day
when I have to explain to the guy with a terabyte of memory that the
reason why his performance sucks so bad is because he's got a 16MB
buffer cache. The percentage of memory we're allocating to
shared_buffers should not need to be expressed in scientific notation.

But once we get out from under that, I think there might well be some
advantage to have certain subsystems allocate their own segments,
and/or using ftruncate() for resizing. I don't have a concrete
proposal in mind, though. It's very much non-trivial to resize
shared_buffers, for example, even if you assume that the size of the
shm can easily be changed. So I don't expect quick progress on this
front; but it would be nice to have those options available.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>, simon(at)2ndQuadrant(dot)com, markus(at)bluegap(dot)ch, pgsql-hackers(at)postgresql(dot)org, remi_zara(at)mac(dot)com
Subject:	Re: SSI patch version 14
Date:	2011-02-10 10:09:33
Message-ID:	4D53B95D.6070703@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 09.02.2011 17:58, Kevin Grittner wrote:
> Dan Ports<drkp(at)csail(dot)mit(dot)edu> wrote:
>
>> I think for SerializableXidHash we should probably just initially
>> allocate it at its maximum size. Then it'll match the PredXact
>> list which is allocated in full upfront, and there's no risk of
>> being able to allocate a transaction but not register its xid. In
>> fact, I believe there would be no way for starting a new
>> serializable transaction to fail.
>
> To be more precise, it would prevent an out of shared memory error
> during an attempt to register an xid for an active serializable
> transaction. That seems like a good thing. Patch to remove the
> hint and initially allocate that HTAB at full size attached.

Committed.

Curiously, coypu has gone green again. It's now choosing 40 connections
and 8 MB of shared_buffers, while it used to choose 30 connections and
24 MB of shared_buffers before the SSI patch. Looks like fixing the size
estimation bugs helped that, but I'm not entirely sure how. Maybe it
just failed with higher max_connections settings because of the
misestimate. But why does it now choose a *higher* max_connections
setting than before?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, simon(at)2ndQuadrant(dot)com, markus(at)bluegap(dot)ch, pgsql-hackers(at)postgresql(dot)org, remi_zara(at)mac(dot)com
Subject:	Re: SSI patch version 14
Date:	2011-02-10 10:29:47
Message-ID:	4D53BE1B.9020805@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/10/2011 05:09 AM, Heikki Linnakangas wrote:
> On 09.02.2011 17:58, Kevin Grittner wrote:
>> Dan Ports<drkp(at)csail(dot)mit(dot)edu> wrote:
>>
>>> I think for SerializableXidHash we should probably just initially
>>> allocate it at its maximum size. Then it'll match the PredXact
>>> list which is allocated in full upfront, and there's no risk of
>>> being able to allocate a transaction but not register its xid. In
>>> fact, I believe there would be no way for starting a new
>>> serializable transaction to fail.
>>
>> To be more precise, it would prevent an out of shared memory error
>> during an attempt to register an xid for an active serializable
>> transaction. That seems like a good thing. Patch to remove the
>> hint and initially allocate that HTAB at full size attached.
>
> Committed.
>
> Curiously, coypu has gone green again. It's now choosing 40
> connections and 8 MB of shared_buffers, while it used to choose 30
> connections and 24 MB of shared_buffers before the SSI patch. Looks
> like fixing the size estimation bugs helped that, but I'm not entirely
> sure how. Maybe it just failed with higher max_connections settings
> because of the misestimate. But why does it now choose a *higher*
> max_connections setting than before?

Rémi might have increased its available resources.

cheers

andrew