Quick Links

init_sequence spill to hash table

Lists:	pgsql-hackers

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	init_sequence spill to hash table
Date:	2013-11-13 09:55:43
Message-ID:	CAApHDvpAczv3XjmRsa9HJ4YEVSKiqX0QTaQew7-DoFKsuzeO5Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Here http://www.postgresql.org/message-id/24278.1352922571@sss.pgh.pa.us there
was some talk about init_sequence being a bottleneck when many sequences
are used in a single backend.

The attached I think implements what was talked about in the above link
which for me seems to double the speed of a currval() loop over 30000
sequences. It goes from about 7 seconds to 3.5 on my laptop.

I thought I would post the patch early to see if this is actually wanted
before I do too much more work on it.

My implementation maintains using the linear list for sequences up to a
defined threshold (currently 32) then it moves everything over to a
hashtable and free's off the list.

A more complete solution would contain regression tests to exercise the
hash table code.

I know there is a desire to move sequences over to a single table still,
but I see this as a separate patch and storing current values in a hash
table for each backend should still be a win even if/when the single table
stuff gets implemented.

Regards

David Rowley

Attachment	Content-Type	Size
sequence.sql	text/plain	1.1 KB
hashtab_seq_v0.1.patch	application/octet-stream	6.3 KB

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-13 11:15:32
Message-ID:	52835F54.5020404@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 13.11.2013 11:55, David Rowley wrote:
> I thought I would post the patch early to see if this is actually wanted
> before I do too much more work on it.

Seems reasonable.

> My implementation maintains using the linear list for sequences up to a
> defined threshold (currently 32) then it moves everything over to a
> hashtable and free's off the list.

Did you check how it would perform if you just always used the hash
table? Or if you just have a single entry before you move to hash table,
ie. set the threshold to 2? That would be slightly simpler.

- Heikki

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-14 12:38:26
Message-ID:	CAApHDvpWsc72i7bce-Xvf9YdXPFDCJs-yUkBikTzBE5XA31rCA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 14, 2013 at 12:15 AM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:

> On 13.11.2013 11:55, David Rowley wrote:
>
>> I thought I would post the patch early to see if this is actually wanted
>> before I do too much more work on it.
>>
>
> Seems reasonable.
>
>
> My implementation maintains using the linear list for sequences up to a
>> defined threshold (currently 32) then it moves everything over to a
>> hashtable and free's off the list.
>>
>
> Did you check how it would perform if you just always used the hash table?
> Or if you just have a single entry before you move to hash table, ie. set
> the threshold to 2? That would be slightly simpler.
>
>
I've just completed some more benchmarking of this. I didn't try dropping
the threshold down to 2 or 0 but I did tests at the cut over point and
really don't see much difference in performance between the list at 32 and
the hashtable at 33 sequences. The hash table version excels in the 16000
sequence test in comparison to the unpatched version.

Times are in milliseconds of the time it took to call currval() 100000
times for 1 sequence.
Patched Unpatched increased by 1 in cache 1856.452 1844.11 -1% 32 in
cache 1841.84 1802.433 -2% 33 in cache 1861.558 not tested N/A 16000 in
cache 1963.711 10329.22 426%

There was not much point in testing 33 sequences with the unpatched version
as there is no switching to hash table. Most likely the 1 and 2% drop in
speed is noise as the only instructions that have been added here are
adding test to check if the hashtable has been created in init_sequence and
also a counter to count the number of sequences in the list while still in
list mode.

I also tested filling the cache with 30000 sequences then inserting 100000
records into a table which used currval() for the default of a column. The
sequence I used would have been half way up the linked list in the
unpatched version, so init_sequence would have had to loop 15000 times
before finding the sequence.

Here is the average and median over 8 runs of the INSERT statement
Patched Unpatched increased by AVERAGE 482.2626 597.66375 24% MEDIAN
471.2205 567.949 21%

Just for clarity, during testing I added the following line to
the switch_to_hashtable() function

elog(NOTICE, "moved sequences into hash table");

I've attached seqeuence.sql which includes all of my tests, the functions I
used for benchmarking and comments with the results that I got during
running the tests.

I've also attached the results formatted a little better in a spreadsheet.

Please let me know if there is something else you think I should test.

Regards

David Rowley

> - Heikki
>

Attachment	Content-Type	Size
sequence.sql	text/plain	5.9 KB
init_sequence_benchmark.xls	application/vnd.ms-excel	37.0 KB

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-14 14:03:41
Message-ID:	5284D83D.8030603@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 14.11.2013 14:38, David Rowley wrote:
> I've just completed some more benchmarking of this. I didn't try dropping
> the threshold down to 2 or 0 but I did tests at the cut over point and
> really don't see much difference in performance between the list at 32 and
> the hashtable at 33 sequences. The hash table version excels in the 16000
> sequence test in comparison to the unpatched version.
>
> Times are in milliseconds of the time it took to call currval() 100000
> times for 1 sequence.
> Patched Unpatched increased by 1 in cache 1856.452 1844.11 -1% 32 in
> cache 1841.84 1802.433 -2% 33 in cache 1861.558 not tested N/A 16000 in
> cache 1963.711 10329.22 426%

If I understand those results correctly, the best case scenario with the
current code takes about 1800 ms. There's practically no difference with
N <= 32, where N is the number of sequences touched. The hash table
method also takes about 1800 ms when N=33. The performance of the hash
table is O(1), so presumably we can extrapolate from that that it's the
same for any N.

I think that means that we should just completely replace the list with
the hash table. The difference with a small N is lost in noise, so
there's no point in keeping the list as a fast path for small N. That'll
make the patch somewhat simpler.
- Heikki

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-14 14:12:10
Message-ID:	20131114141210.GE25959@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2013-11-13 22:55:43 +1300, David Rowley wrote:
> Here http://www.postgresql.org/message-id/24278.1352922571@sss.pgh.pa.us there
> was some talk about init_sequence being a bottleneck when many sequences
> are used in a single backend.
>
> The attached I think implements what was talked about in the above link
> which for me seems to double the speed of a currval() loop over 30000
> sequences. It goes from about 7 seconds to 3.5 on my laptop.

I think it'd be a better idea to integrate the sequence caching logic
into the relcache. There's a comment about it:
* (We can't
* rely on the relcache, since it's only, well, a cache, and may decide to
* discard entries.)
but that's not really accurate anymore. We have the infrastructure for
keeping values across resets and we don't discard entries.

Since we already do a relcache lookup for every sequence manipulation
(c.f. init_sequence()) relying on it won't increase, but rather decrease
the overhead.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-14 14:23:20
Message-ID:	673.1384439000@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> I think it'd be a better idea to integrate the sequence caching logic
> into the relcache. There's a comment about it:
> * (We can't
> * rely on the relcache, since it's only, well, a cache, and may decide to
> * discard entries.)
> but that's not really accurate anymore. We have the infrastructure for
> keeping values across resets and we don't discard entries.

We most certainly *do* discard entries, if they're not open when a cache
flush event comes along.

I suppose it'd be possible to mark a relcache entry for a sequence
as locked-in-core, but that doesn't attract me at all. A relcache
entry is a whole lot larger than the amount of state we really need
to keep for a sequence.

One idea is to have a hashtable for the sequence-specific data,
but to add a link field to the relcache entry that points to the
non-flushable sequence hashtable entry. That would save the second
hashtable lookup as long as the relcache entry hadn't been flushed
since last use, while not requiring any violence to the lifespan
semantics of relcache entries. (Actually, if we did that, it might
not even be worth converting the list to a hashtable? Searches would
become a lot less frequent.)

regards, tom lane

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-14 14:33:39
Message-ID:	20131114143339.GA7522@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-11-14 09:23:20 -0500, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > I think it'd be a better idea to integrate the sequence caching logic
> > into the relcache. There's a comment about it:
> > * (We can't
> > * rely on the relcache, since it's only, well, a cache, and may decide to
> > * discard entries.)
> > but that's not really accurate anymore. We have the infrastructure for
> > keeping values across resets and we don't discard entries.
>
> We most certainly *do* discard entries, if they're not open when a cache
> flush event comes along.

What I was aiming at is that we don't discard them because of a limited
cache size. I don't think it means much that we flush the entry when
it's changed but not referenced.

> I suppose it'd be possible to mark a relcache entry for a sequence
> as locked-in-core, but that doesn't attract me at all. A relcache
> entry is a whole lot larger than the amount of state we really need
> to keep for a sequence.

But effectively that's what already happens unless either somebody else
does an ALTER SEQUENCE or sinval overflow happened, right? So in
production workloads we already will keep both around since hopefully
neither altering a sequence nor sinval overflows should be a frequent
thing.

> One idea is to have a hashtable for the sequence-specific data,
> but to add a link field to the relcache entry that points to the
> non-flushable sequence hashtable entry. That would save the second
> hashtable lookup as long as the relcache entry hadn't been flushed
> since last use, while not requiring any violence to the lifespan
> semantics of relcache entries. (Actually, if we did that, it might
> not even be worth converting the list to a hashtable? Searches would
> become a lot less frequent.)

That's not a bad idea if we decide not to integrate them. And I agree,
there's not much need to have a separate hashtable in that case.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-14 14:47:18
Message-ID:	1444.1384440438@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2013-11-14 09:23:20 -0500, Tom Lane wrote:
>> We most certainly *do* discard entries, if they're not open when a cache
>> flush event comes along.

> What I was aiming at is that we don't discard them because of a limited
> cache size. I don't think it means much that we flush the entry when
> it's changed but not referenced.

Well, I don't want non-user-significant events (such as an sinval queue
overrun) causing sequence state to get discarded. We would get bug
reports about lost sequence values.

regards, tom lane

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-14 14:49:38
Message-ID:	20131114144938.GB7522@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-11-14 09:47:18 -0500, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2013-11-14 09:23:20 -0500, Tom Lane wrote:
> >> We most certainly *do* discard entries, if they're not open when a cache
> >> flush event comes along.
>
> > What I was aiming at is that we don't discard them because of a limited
> > cache size. I don't think it means much that we flush the entry when
> > it's changed but not referenced.
>
> Well, I don't want non-user-significant events (such as an sinval queue
> overrun) causing sequence state to get discarded. We would get bug
> reports about lost sequence values.

But we can easily do as you suggest and simply retain the entry in that
case. I am just not seeing the memory overhead argument as counting much
since we don't protect against it in normal operation.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-14 23:33:30
Message-ID:	CAApHDvoAmJWtRQy03O33ijmw+tPwQLaQnDxrcRd8=OpSJEVVUg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Nov 15, 2013 at 3:03 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 14.11.2013 14:38, David Rowley wrote:
>
>> I've just completed some more benchmarking of this. I didn't try dropping
>> the threshold down to 2 or 0 but I did tests at the cut over point and
>> really don't see much difference in performance between the list at 32 and
>> the hashtable at 33 sequences. The hash table version excels in the 16000
>> sequence test in comparison to the unpatched version.
>>
>> Times are in milliseconds of the time it took to call currval() 100000
>> times for 1 sequence.
>> Patched Unpatched increased by 1 in cache 1856.452 1844.11 -1% 32
>> in
>> cache 1841.84 1802.433 -2% 33 in cache 1861.558 not tested N/A 16000 in
>> cache 1963.711 10329.22 426%
>>
>
> If I understand those results correctly, the best case scenario with the
> current code takes about 1800 ms. There's practically no difference with N
> <= 32, where N is the number of sequences touched. The hash table method
> also takes about 1800 ms when N=33. The performance of the hash table is
> O(1), so presumably we can extrapolate from that that it's the same for any
> N.
>
> I think that means that we should just completely replace the list with
> the hash table. The difference with a small N is lost in noise, so there's
> no point in keeping the list as a fast path for small N. That'll make the
> patch somewhat simpler.
> - Heikki
>

I had thought that maybe the biggest type of workloads might only touch 1
or 2 sequences, though it may be small but I had thought there would be an
overhead in both cycles and memory usage in creating a hash table for these
light usages of sequence backends. It would certainly make the patch more
simple by removing this and it would also mean that I could remove the
sometimes unused ->next member from the SeqTableData struct which is just
now set to NULL when in hash table mode. If you think it's the way to go
then I can make the change, though maybe I'll hold off the refactor for now
as it looks like other ideas have come up around rel cache.

Regards

David Rowley

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-15 01:22:30
Message-ID:	CAApHDvoduD3jON1-4ZoY5-RahbE_=tPABM02WYSrwz79XUcuPQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Nov 15, 2013 at 3:12 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> Hi,
>
> On 2013-11-13 22:55:43 +1300, David Rowley wrote:
> > Here http://www.postgresql.org/message-id/24278.1352922571@sss.pgh.pa.usthere
> > was some talk about init_sequence being a bottleneck when many sequences
> > are used in a single backend.
> >
> > The attached I think implements what was talked about in the above link
> > which for me seems to double the speed of a currval() loop over 30000
> > sequences. It goes from about 7 seconds to 3.5 on my laptop.
>
> I think it'd be a better idea to integrate the sequence caching logic
> into the relcache. There's a comment about it:
> * (We can't
> * rely on the relcache, since it's only, well, a cache, and may decide to
> * discard entries.)
> but that's not really accurate anymore. We have the infrastructure for
> keeping values across resets and we don't discard entries.
>
>
I just want to check this idea against an existing todo item to move
sequences into a single table, as I think by the sounds of it this binds
sequences being relations even closer together.

The todo item reads:

"Consider placing all sequences in a single table, or create a system view"

This had been on the back of my mind while implementing the hash table
stuff for init_sequence and again when doing my benchmarks where I created
30000 sequences and went through the pain of having a path on my file
system with 30000 8k files.
It sounds like your idea overlaps with this todo a little, so maybe this is
a good idea to decide which would be best, though the more I think about
it, the more I think that moving sequences into a single table is a no-go

So for implementing moving sequences into a single system table:

1. The search_path stuff makes this a bit more complex. It sounds like this
would require some duplication of the search_path logic.

2. There is also the problem with tracking object dependency.

Currently:
create sequence t_a_seq;
create table t (a int not null default nextval('t_a_seq'));
alter sequence t_a_seq owned by t.a;
drop table t;
drop sequence t_a_seq; -- already deleted by drop table t
ERROR: sequence "t_a_seq" does not exist

Moving sequences to a single table sounds like a special case for this
logic.

3. Would moving sequences to a table still have to check that no duplicate
object existed in the pg_class?
Currently you can't have a sequence with the same name as a table

create sequence a;
create table a (a int);
ERROR: relation "a" already exists

Its not that I'm trying to shoot holes in moving sequences to a single
table, really I'd like find a way to improve the wastefulness these 1 file
per sequence laying around my file system, but if changing this is a no-go
then it would be better to come off the todo list and then we shouldn't as
many issues pouring more concrete in the sequences being relations mould.

Regards

David Rowley

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-15 05:47:18
Message-ID:	CAApHDvoCgp70vOuQYKTfguMED__Qg7YYpGnzHSN0i2sUuBtjRg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Nov 15, 2013 at 3:03 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote
>
> I think that means that we should just completely replace the list with
> the hash table. The difference with a small N is lost in noise, so there's
> no point in keeping the list as a fast path for small N. That'll make the
> patch somewhat simpler.
> - Heikki
>

Attached is a much more simple patch which gets rid of the initial linked
list.

Attachment	Content-Type	Size
hashtab_seq_v0.2.patch	application/octet-stream	4.2 KB

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-15 06:12:15
Message-ID:	CAApHDvodQ-OqJoQGKXwBObKqE7xCwLft6=EBGMu+Up9KZx40nA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Nov 15, 2013 at 3:23 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > I think it'd be a better idea to integrate the sequence caching logic
> > into the relcache. There's a comment about it:
> > * (We can't
> > * rely on the relcache, since it's only, well, a cache, and may decide
> to
> > * discard entries.)
> > but that's not really accurate anymore. We have the infrastructure for
> > keeping values across resets and we don't discard entries.
>
> We most certainly *do* discard entries, if they're not open when a cache
> flush event comes along.
>
> I suppose it'd be possible to mark a relcache entry for a sequence
> as locked-in-core, but that doesn't attract me at all. A relcache
> entry is a whole lot larger than the amount of state we really need
> to keep for a sequence.
>
> One idea is to have a hashtable for the sequence-specific data,
> but to add a link field to the relcache entry that points to the
> non-flushable sequence hashtable entry. That would save the second
> hashtable lookup as long as the relcache entry hadn't been flushed
> since last use, while not requiring any violence to the lifespan
> semantics of relcache entries. (Actually, if we did that, it might
> not even be worth converting the list to a hashtable? Searches would
> become a lot less frequent.)
>
>
Unless I've misunderstood something it looks like this would mean giving
heamam.c and relcache.c knowledge of sequences.
Currently relation_open is called from open_share_lock in sequence.c. The
only way I can see to do this would be to add something like
relation_open_sequence() in heapam.c which means we'd need to invent
RelationIdGetSequenceRelation() and use that instead
of RelationIdGetRelation() and somewhere along the line have it pass back
the SeqTableData struct which would be tagged onto RelIdCacheEnt.

I think it can be done but I don't think it will look pretty.
Perhaps if there was a more generic way... Would tagging some void
*rd_extra only the RelationData be a better way? And just have sequence.c
make use of that for storing the SeqTableData.

Also I'm wondering what we'd do with all these pointers when someone does
DISCARD SEQUENCES; would we have to invalidate the relcache or would it
just be matter of looping over it and freeing of the sequence data setting
the pointers to NULL?

Regards

David Rowley

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-15 09:43:15
Message-ID:	20131115094315.GA23517@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-11-15 14:22:30 +1300, David Rowley wrote:
> On Fri, Nov 15, 2013 at 3:12 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> > Hi,
> >
> > On 2013-11-13 22:55:43 +1300, David Rowley wrote:
> > > Here http://www.postgresql.org/message-id/24278.1352922571@sss.pgh.pa.usthere
> > > was some talk about init_sequence being a bottleneck when many sequences
> > > are used in a single backend.
> > >
> > > The attached I think implements what was talked about in the above link
> > > which for me seems to double the speed of a currval() loop over 30000
> > > sequences. It goes from about 7 seconds to 3.5 on my laptop.
> >
> > I think it'd be a better idea to integrate the sequence caching logic
> > into the relcache. There's a comment about it:
> > * (We can't
> > * rely on the relcache, since it's only, well, a cache, and may decide to
> > * discard entries.)
> > but that's not really accurate anymore. We have the infrastructure for
> > keeping values across resets and we don't discard entries.
> >
> >
> I just want to check this idea against an existing todo item to move
> sequences into a single table, as I think by the sounds of it this binds
> sequences being relations even closer together.

> This had been on the back of my mind while implementing the hash table
> stuff for init_sequence and again when doing my benchmarks where I created
> 30000 sequences and went through the pain of having a path on my file
> system with 30000 8k files.

Well. But in which real world usecases is that actually the bottleneck?

> 1. The search_path stuff makes this a bit more complex. It sounds like this
> would require some duplication of the search_path logic.

I'd assumed that if we were to do this, the sequences themselves would
still continue to live in pg_class. Just instead of a relfilenode
containing their state it would be stored in an extra table.

> 2. There is also the problem with tracking object dependency.
>
> Currently:
> create sequence t_a_seq;
> create table t (a int not null default nextval('t_a_seq'));
> alter sequence t_a_seq owned by t.a;
> drop table t;
> drop sequence t_a_seq; -- already deleted by drop table t
> ERROR: sequence "t_a_seq" does not exist
>
> Moving sequences to a single table sounds like a special case for this
> logic.

There should already be code such dependencies.

4) Scalability problems: The one block sequences use already can be a
major contention issue when you have paralell inserts to the same
table. A workload which I, unlike a couple thousand unrelated sequences,
actually think is more realistic. So we'd need to force 1 sequence tuple
per block, which we currently cannot do.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-15 09:49:39
Message-ID:	20131115094939.GB23517@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-11-15 19:12:15 +1300, David Rowley wrote:
> On Fri, Nov 15, 2013 at 3:23 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > > I think it'd be a better idea to integrate the sequence caching logic
> > > into the relcache. There's a comment about it:
> > > * (We can't
> > > * rely on the relcache, since it's only, well, a cache, and may decide
> > to
> > > * discard entries.)
> > > but that's not really accurate anymore. We have the infrastructure for
> > > keeping values across resets and we don't discard entries.
> >
> > We most certainly *do* discard entries, if they're not open when a cache
> > flush event comes along.
> >
> > I suppose it'd be possible to mark a relcache entry for a sequence
> > as locked-in-core, but that doesn't attract me at all. A relcache
> > entry is a whole lot larger than the amount of state we really need
> > to keep for a sequence.
> >
> > One idea is to have a hashtable for the sequence-specific data,
> > but to add a link field to the relcache entry that points to the
> > non-flushable sequence hashtable entry. That would save the second
> > hashtable lookup as long as the relcache entry hadn't been flushed
> > since last use, while not requiring any violence to the lifespan
> > semantics of relcache entries. (Actually, if we did that, it might
> > not even be worth converting the list to a hashtable? Searches would
> > become a lot less frequent.)
> >
> >
> Unless I've misunderstood something it looks like this would mean giving
> heamam.c and relcache.c knowledge of sequences.
> Currently relation_open is called from open_share_lock in sequence.c. The
> only way I can see to do this would be to add something like
> relation_open_sequence() in heapam.c which means we'd need to invent
> RelationIdGetSequenceRelation() and use that instead
> of RelationIdGetRelation() and somewhere along the line have it pass back
> the SeqTableData struct which would be tagged onto RelIdCacheEnt.

Look like indexes already go through that path, and store data in the
relcache, without requiring heapam.c to know all that much.

> I think it can be done but I don't think it will look pretty.
> Perhaps if there was a more generic way... Would tagging some void
> *rd_extra only the RelationData be a better way? And just have sequence.c
> make use of that for storing the SeqTableData.

I'd rather have it properly typed in ->rd_sequence, maybe in a union
against the index members.

> Also I'm wondering what we'd do with all these pointers when someone does
> DISCARD SEQUENCES; would we have to invalidate the relcache or would it
> just be matter of looping over it and freeing of the sequence data setting
> the pointers to NULL?

Good question. Have a 'discard_count' member and global variable? That
starts at zero and gets incremented everytime somebody discards?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	David Rowley <dgrowleyml(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-15 10:31:54
Message-ID:	5285F81A.6000308@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15.11.2013 07:47, David Rowley wrote:
> On Fri, Nov 15, 2013 at 3:03 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
>> wrote
>>
>> I think that means that we should just completely replace the list with
>> the hash table. The difference with a small N is lost in noise, so there's
>> no point in keeping the list as a fast path for small N. That'll make the
>> patch somewhat simpler.
>
> Attached is a much more simple patch which gets rid of the initial linked
> list.

Thanks, committed with minor copy-editing. I dialed down the initial
size of the hash table from 1000 to 16, that ought to be enough.

I ran a quick performance test of my own, based on the script you sent.
I modified it a bit to eliminate the PL/pgSQL overhead, making it more
heavily bottlenecked by the nextval/currval overhead. Results:

nextval, 10000 seqs 36772 2426
currval, 1 seq 1176 1069
currval, 2 seqs 865 857
currval, 4 seqs 742 759
currval, 5 seqs 718 711
currval, 10 seqs 680 668
currval, 100 seqs 871 656
currval, 1000 seqs 3507 700
currval, 10000 seqs 34742 1224

The performance when you touch only a few sequences is unchanged. When
you touch a lot of them, you gain. Just as you would expect.

Attached is the test script I used. After running the test, I realized
that there's a little flaw in the test methodology, but I doesn't
invalidates the results. I used the same backend for all the test runs,
so even when currval() is called repeatedly for a single sequence, the
linked list (or hash table, with the patch) nevertheless contains
entries for all 10000 sequences. However, the sequences actually used by
the test are always in the front of the list, because the nextval()
calls were made in the same order. But with the unpatched version, if
you called currval() on the lastly initialized sequence repeatedly,
instead of the firstly initialized one, you would get much would get
horrible performance, even when you touch only a single sequence.

Regarding the more grandiose ideas of using the relcache or rewriting
the way sequences are stored altogether: this patch might become
obsolete if we do any of that stuff, but that's ok. The immediate
performance problem has been solved now, but those other ideas might be
worth pursuing for other reasons.

- Heikki

Attachment	Content-Type	Size
seqtest.sql	text/x-sql	1.9 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-15 10:44:34
Message-ID:	20131115104434.GA31846@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-11-15 12:31:54 +0200, Heikki Linnakangas wrote:
> On 15.11.2013 07:47, David Rowley wrote:
> >On Fri, Nov 15, 2013 at 3:03 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> >>wrote
> >>
> >>I think that means that we should just completely replace the list with
> >>the hash table. The difference with a small N is lost in noise, so there's
> >>no point in keeping the list as a fast path for small N. That'll make the
> >>patch somewhat simpler.
> >
> >Attached is a much more simple patch which gets rid of the initial linked
> >list.
>
> Thanks, committed with minor copy-editing. I dialed down the initial size of
> the hash table from 1000 to 16, that ought to be enough.

I am slightly confused by the use of HASH_CONTEXT without setting
ctl->hcxt, that works, but seems more like an accident.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-15 11:02:51
Message-ID:	CAApHDvrdx=+gu0cEL1wgtd7NBqTXW2rN9Q7K7QsuDjCqd8NKKw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Nov 15, 2013 at 11:31 PM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:

> Thanks, committed with minor copy-editing. I dialed down the initial size
> of the hash table from 1000 to 16, that ought to be enough.
>
>
Great. Thanks for commiting.

Regards

David Rowley

> - Heikki
>

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: init_sequence spill to hash table
Date:	2013-11-15 12:25:10
Message-ID:	528612A6.2020407@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15.11.2013 12:44, Andres Freund wrote:
> On 2013-11-15 12:31:54 +0200, Heikki Linnakangas wrote:
>> On 15.11.2013 07:47, David Rowley wrote:
>>> On Fri, Nov 15, 2013 at 3:03 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
>>>> wrote
>>>>
>>>> I think that means that we should just completely replace the list with
>>>> the hash table. The difference with a small N is lost in noise, so there's
>>>> no point in keeping the list as a fast path for small N. That'll make the
>>>> patch somewhat simpler.
>>>
>>> Attached is a much more simple patch which gets rid of the initial linked
>>> list.
>>
>> Thanks, committed with minor copy-editing. I dialed down the initial size of
>> the hash table from 1000 to 16, that ought to be enough.
>
> I am slightly confused by the use of HASH_CONTEXT without setting
> ctl->hcxt, that works, but seems more like an accident.

Ugh. Accident indeed. Thanks, fixed!

- Heikki