Quick Links

Re: Optimizing pglz compressor

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>, "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>
Cc:	"'PostgreSQL-development'" <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Optimizing pglz compressor
Date:	2013-06-19 11:01:13
Message-ID:	008501ce6cdc$51981790$f4c846b0$@kapila@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tuesday, March 05, 2013 7:03 PM Heikki Linnakangas wrote:

> I spent some more time on this, and came up with the attached patch. It
> includes the changes I posted earlier, to use indexes instead of
> pointers in the hash table. In addition, it makes the hash table size
> variable, depending on the length of the input. This further reduces
> the startup cost on small inputs. I changed the hash method slightly,
> because the old method would not use any bits from the 3rd byte with a
> small hash table size, but fortunately that didn't seem to negative
> impact with larger hash table sizes either.
>
> I wrote a little C extension to test this. It contains a function,
> which runs pglz_compress() on a bytea input, N times. I ran that with
> different kinds of inputs, and got the following results:
>

The purpose of this patch is to improve LZ compression speed by reducing the
startup cost of initialization of hash_start array.
To achieve the same it uses variable hash and reduced the size of each
history entry by replacing pointers with int16 indexes.
It achieves it's purpose for small data, but for large data in some cases
performance is degaraded, refer second set of performance data.

1. Patch compiles cleanly and all regression tests passed.
2. Change in pglz_hist_idx macro is not very clear to me, neither it is
mentioned in comments
3. Why first entry is kept as INVALID_ENTRY? It appears to me, it is for
cleaner checks in code.

Performance Data
------------------
I have used pglz-variable-size-hash-table.patch to collect all performance
data:

Results of compress-tests.sql -- inserting large data into tmp table
------------------------------

testname |unpatched | patched
-------------------+----------+------------
5k text | 4.8932 | 4.9014
512b text | 22.6209 | 18.6849
256b text | 13.9784 | 8.9342
1K text | 20.4969 | 20.5988
2k random | 10.5826 | 10.0758
100k random | 3.9056 | 3.8200
500k random | 22.4078 | 22.1971
512b random | 15.7788 | 12.9575
256b random | 18.9213 | 12.5209
1K random | 11.3933 | 9.8853
100k of same byte | 5.5877 | 5.5960
500k of same byte | 2.6853 | 2.6500

Observation
-------------
1. This clearly shows that the patch improves performance for small data
without any impact for large data.

Performance data for directly calling lz_compress function (tests.sql)
---------------------------------------------------------------------------
select testname,
(compresstest(data, nrows, 8192)::numeric / 1000)::numeric(10,3) as
auto
from tests;

Patch(pglz-variable-size-hash-table.patch)

Observations
--------------
1. For small data perforamce is always good with patch.
2. For random small/large data performace is good.
3. For medium and large text and same byte data(3K,5K text, 10K,100K,500K
same byte), performance is degraded.

I have used attached compress-tests-init.sql to generate data.
I am really not sure why the data you reported and what I taken differ in
few cases. I had tried multiple times but the result is same.
Kindly let me know if you think I am doing something wrong.

Note - To generate data in randomhex, I used Copy from file. I used same
command you provided to generate a file.

With Regards,
Amit Kapila.

Attachment	Content-Type	Size
compress-tests-init.sql	application/octet-stream	12.2 KB

In response to

Re: Optimizing pglz compressor at 2013-03-05 13:32:43 from Heikki Linnakangas

Responses

Re: Optimizing pglz compressor at 2013-06-25 20:45:09 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Etsuro Fujita	2013-06-19 11:49:57	Re: Patch for removng unused targets
Previous Message	Cédric Villemain	2013-06-19 10:33:22	[Review] Re: [PATCH] Remove useless USE_PGXS support in contrib