Quick Links

Re: SSI SLRU strategy choices

Lists:	pgsql-hackers

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	SSI SLRU strategy choices
Date:	2010-12-28 22:10:59
Message-ID:	4D1A0C130200002500038C44@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I'm now deep enough into the SLRU techniques to see what my options
are for storing the data appropriate for SLRU. This consists of
uint64 commitSeqNo (which is overkill enough that I'd be comfortable
stealing a bit or two from the high end in SLRU usage) which needs
to be associated with an xid. The xids would have gaps, since we
only need to track committed serializable transactions which still
matter because of a long-running transaction weren't subject to
early cleanup based on previously posted rules. These will be
looked up by xid.

The options I see are:

(1) Store the xid and commitSeqNo in each SLRU entry -- with
alignment, that's 16 bytes per entry. Simple, but requires
sequential search for the xid. Wouldn't scale well.

(2) Use 8 byte SLRU entries and map the xid values over the SLRU
space, with each spot allowing two different xid values. At first
blush that looks good, because transaction ID wrap-around techniques
mean that the two values for any one spot couldn't be active at the
same time. The high bit could flag that the xid is "present" with
the rest of the bits being from the commitSeqNo. The problem is
that the SLRU code appears to get confused about there being
wrap-around when the SLRU space is half-full, so we would get into
trouble if we burned through more than 2^30 transactions during one
long-running serializable read write transaction. I still like this
option best, with resort to killing the long-running transaction at
that point.

(3) Use two SLRU spaces. You'd look up randomly into the first one
based on xid, and get a position in the second one which would hold
the commitSeqNo, which would be assigned to sequential slots. This
would potentially allow us to burn through more transactions because
some are likely to be subject to early cleanup. The marginal
extension of the failure point doesn't seem like it merits the extra
complexity.

(4) Change SLRU to tolerate more entries. At most this raises the
number of transactions we can burn through during a long-running
transaction from 2^30 to 2^31. That hardly seems worth the
potential to destabilize a lot of critical code.

Does (2) sound good to anyone else? Other ideas? Does it sound
like I'm totally misunderstanding anything?

-Kevin

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI SLRU strategy choices
Date:	2010-12-28 23:33:09
Message-ID:	4D1A73B5.6000308@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 29.12.2010 00:10, Kevin Grittner wrote:
> (2) Use 8 byte SLRU entries and map the xid values over the SLRU
> space, with each spot allowing two different xid values. At first
> blush that looks good, because transaction ID wrap-around techniques
> mean that the two values for any one spot couldn't be active at the
> same time. The high bit could flag that the xid is "present" with
> the rest of the bits being from the commitSeqNo. The problem is
> that the SLRU code appears to get confused about there being
> wrap-around when the SLRU space is half-full, so we would get into
> trouble if we burned through more than 2^30 transactions during one
> long-running serializable read write transaction. I still like this
> option best, with resort to killing the long-running transaction at
> that point.

If you burn through more than 2^30 XIDs while a long-running transaction
is still running, you have bigger problems. 2^30 is the maximum number
of XIDs you can burn through before you reach XID wrap-around anyway.
GetNewTransaction() will stop assigning new XIDs when you approach that
limit.

(I'm not sure how you arrived at that number, though. ISTM that the slru
code can handle 2^30 *pages* before getting into trouble, assuming the
PagePrecedes function can handle that.)

The only issue I can see with that is that you allocate those 8 bytes
for every xid, even if it's a non-serializable transaction or a
subtransaction. But the overhead is probably not significant in practice.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI SLRU strategy choices
Date:	2010-12-29 15:20:20
Message-ID:	4D1AFD540200002500038D60@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> I'm not sure how you arrived at that number, though.

http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/include/access/slru.h;h=710cca70acd67e03e5f3a255b048a719ae4c4709

The way I read this, each segment is (BLCKSZ *
SLRU_PAGES_PER_SEGMENT) long, which is (8kB * 32), or 256kB. The
number of files is limited to 64k because of the 0000 to FFFF
segment file naming. So total space is limited to 16GB. When an
SLRU is used to store xids for random access, that's 4 bytes per
entry, so 2^32 entries are possible, but SLRU code considers it a
problem for the space to become more than half full. With the eight
byte entries I need, there are 2^31 slots for entries, with the
ability to use 2^30 before it becomes half full and SLRU complains.

Does that look right to you, or have I misunderstood something?

> The only issue I can see with that is that you allocate those 8
> bytes for every xid, even if it's a non-serializable transaction
> or a subtransaction. But the overhead is probably not significant
> in practice.

Right. And it avoids having to sequentially search for the desired
xid. A sequential search seems to me like it would get into O(N^2)
performance under extreme load, whereas this approach has a couple
performance plateaus at O(1) which will be, I think, the normal
case, and only goes to O(N) performance under extreme load.

-Kevin

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Kevin Grittner <kevin(dot)grittner(at)wicourts(dot)gov>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI SLRU strategy choices
Date:	2010-12-29 20:00:26
Message-ID:	1293652502-sup-7444@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Kevin Grittner's message of mié dic 29 12:20:20 -0300 2010:

> http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/include/access/slru.h;h=710cca70acd67e03e5f3a255b048a719ae4c4709
>
> The way I read this, each segment is (BLCKSZ *
> SLRU_PAGES_PER_SEGMENT) long, which is (8kB * 32), or 256kB. The
> number of files is limited to 64k because of the 0000 to FFFF
> segment file naming. So total space is limited to 16GB. When an
> SLRU is used to store xids for random access, that's 4 bytes per
> entry, so 2^32 entries are possible, but SLRU code considers it a
> problem for the space to become more than half full. With the eight
> byte entries I need, there are 2^31 slots for entries, with the
> ability to use 2^30 before it becomes half full and SLRU complains.

If these limitations become a problem, you can always change them.
A couple of zeroes at the start of the pg_clog filenames aren't going to
bother anyone, I don't think. Not so sure about your new proposed
design's space usage.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI SLRU strategy choices
Date:	2010-12-29 20:20:40
Message-ID:	4D1B43B80200002500038D8D@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:

> If these limitations become a problem, you can always change them.
> A couple of zeroes at the start of the pg_clog filenames aren't
> going to bother anyone, I don't think. Not so sure about your new
> proposed design's space usage.

I guess that's a call the community can make now -- if a
serializable transaction which is not flagged as read only remains
open long enough for over a billion other transactions to commit, is
it OK for the old transaction to be automatically canceled? Is it
worth messing with the SLRU limits to double that?

Beyond a certain point you have transaction ID wrap-around, so at
that point this would be the least of your troubles -- canceling
the old transaction might even be helpful. I thought that was at 2
billion, but Heikki was saying it's at 1 billion in an earlier post.

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI SLRU strategy choices
Date:	2010-12-29 20:36:55
Message-ID:	4D1B47880200002500038D98@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> if a serializable transaction which is not flagged as read only
> remains open long enough for over a billion other transactions to
> commit

Maybe a clarification and example would be useful. We're talking
about going through a billion transactions which were assigned a
TransactionId, not all database transactions. An example of how you
could hit that is with a sustained commit rate of 5000 transactions
per second which are modifying data while a single read write
transaction stays open for 2.3 days.

-Kevin