Re: init_sequence spill to hash table

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: init_sequence spill to hash table
Date: 2013-11-14 12:38:26
Message-ID: CAApHDvpWsc72i7bce-Xvf9YdXPFDCJs-yUkBikTzBE5XA31rCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 14, 2013 at 12:15 AM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:

> On 13.11.2013 11:55, David Rowley wrote:
>
>> I thought I would post the patch early to see if this is actually wanted
>> before I do too much more work on it.
>>
>
> Seems reasonable.
>
>
> My implementation maintains using the linear list for sequences up to a
>> defined threshold (currently 32) then it moves everything over to a
>> hashtable and free's off the list.
>>
>
> Did you check how it would perform if you just always used the hash table?
> Or if you just have a single entry before you move to hash table, ie. set
> the threshold to 2? That would be slightly simpler.
>
>
I've just completed some more benchmarking of this. I didn't try dropping
the threshold down to 2 or 0 but I did tests at the cut over point and
really don't see much difference in performance between the list at 32 and
the hashtable at 33 sequences. The hash table version excels in the 16000
sequence test in comparison to the unpatched version.

Times are in milliseconds of the time it took to call currval() 100000
times for 1 sequence.
Patched Unpatched increased by 1 in cache 1856.452 1844.11 -1% 32 in
cache 1841.84 1802.433 -2% 33 in cache 1861.558 not tested N/A 16000 in
cache 1963.711 10329.22 426%

There was not much point in testing 33 sequences with the unpatched version
as there is no switching to hash table. Most likely the 1 and 2% drop in
speed is noise as the only instructions that have been added here are
adding test to check if the hashtable has been created in init_sequence and
also a counter to count the number of sequences in the list while still in
list mode.

I also tested filling the cache with 30000 sequences then inserting 100000
records into a table which used currval() for the default of a column. The
sequence I used would have been half way up the linked list in the
unpatched version, so init_sequence would have had to loop 15000 times
before finding the sequence.

Here is the average and median over 8 runs of the INSERT statement
Patched Unpatched increased by AVERAGE 482.2626 597.66375 24% MEDIAN
471.2205 567.949 21%

Just for clarity, during testing I added the following line to
the switch_to_hashtable() function

elog(NOTICE, "moved sequences into hash table");

I've attached seqeuence.sql which includes all of my tests, the functions I
used for benchmarking and comments with the results that I got during
running the tests.

I've also attached the results formatted a little better in a spreadsheet.

Please let me know if there is something else you think I should test.

Regards

David Rowley

> - Heikki
>

Attachment Content-Type Size
sequence.sql text/plain 5.9 KB
init_sequence_benchmark.xls application/vnd.ms-excel 37.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2013-11-14 12:47:38 Re: nested hstore patch
Previous Message KONDO Mitsumasa 2013-11-14 12:29:26 Re: Add min and max execute statement time in pg_stat_statement