Quick Links

Re: Optimizing pglz compressor

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Subject:	Re: Optimizing pglz compressor
Date:	2013-03-05 13:32:43
Message-ID:	5135F3FB.7040706@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I spent some more time on this, and came up with the attached patch. It
includes the changes I posted earlier, to use indexes instead of
pointers in the hash table. In addition, it makes the hash table size
variable, depending on the length of the input. This further reduces the
startup cost on small inputs. I changed the hash method slightly,
because the old method would not use any bits from the 3rd byte with a
small hash table size, but fortunately that didn't seem to negative
impact with larger hash table sizes either.

I wrote a little C extension to test this. It contains a function, which
runs pglz_compress() on a bytea input, N times. I ran that with
different kinds of inputs, and got the following results:

unpatched:

pglz-replace-pointers-with-indexes.patch (the patch I posted earlier):

pglz-variable-size-hash-table.patch:

These values are from a single run of the test, but I did repeat them
several times to make sure there isn't too much variability in them to
render the results meaningless. The negative values mean that
pglz_compress gave up on the compression in the test, ie. it shows how
long it took for pglz_compress to realize that it can't compress the
input. Compare the abs() values.

With the variable-size hash table, I'm not sure how significant the
earlier patch to use indexes instead of pointers is. But I don't think
it makes things any worse, so it's included in this.

On 01.03.2013 17:42, Heikki Linnakangas wrote:
> On 01.03.2013 17:37, Alvaro Herrera wrote:
>> My take on this is that if this patch is necessary to get Amit's patch
>> to a commitable state, it's fair game.
>
> I don't think it's necessary for that, but let's see..

With these tweaks, I was able to make pglz-based delta encoding perform
roughly as well as Amit's patch. So I think that's the approach we
should take, as it's a bit simpler and more versatile. I'll follow up on
that in the other thread.

- Heikki

Attachment	Content-Type	Size
pglz-variable-size-hash-table.patch	text/x-diff	8.3 KB
compresstest.tar.gz	application/x-gzip	2.0 KB

In response to

Re: Optimizing pglz compressor at 2013-03-01 15:42:30 from Heikki Linnakangas

Responses

Re: Optimizing pglz compressor at 2013-03-06 14:32:10 from Joachim Wieland
Re: Optimizing pglz compressor at 2013-06-19 11:01:13 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2013-03-05 13:35:16	Re: Support for REINDEX CONCURRENTLY
Previous Message	Robert Haas	2013-03-05 13:30:56	Re: sql_drop Event Trigger