Re: Optimizing pglz compressor

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>
Cc: "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, "'PostgreSQL-development'" <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Optimizing pglz compressor
Date: 2013-06-26 13:37:50
Message-ID: 01fa01ce7272$5b359700$11a0c500$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, June 26, 2013 2:15 AM Heikki Linnakangas wrote:
> On 19.06.2013 14:01, Amit Kapila wrote:
> > Observations
> > --------------
> > 1. For small data perforamce is always good with patch.
> > 2. For random small/large data performace is good.
> > 3. For medium and large text and same byte data(3K,5K text,
> > 10K,100K,500K same byte), performance is degraded.
>
> Wow, that's strange. What platform and CPU did you test on?

CPU - 4 core
RAM - 24GB
OS - SUSE 11 SP2
Kernel version - 3.0.13

> Are you
> sure you used the same compiler flags with and without the patch?

Yes.

> Can you also try the attached patch, please? It's the same as before,
> but in this version, I didn't replace the prev and next pointers in
> PGLZ_HistEntry struct with int16s. That avoids some table lookups, at
> the expense of using more memory. It's closer to what we have without
> the patch, so maybe that helps on your system.

Yes it helped a lot on my system.

Head:
testname | auto
-------------------+-----------
5k text | 3499.888
512b text | 1425.106
256b text | 1769.126
1K text | 1378.151
3K text | 4081.254
2k random | -410.928
100k random | -10.235
500k random | -2.094
512b random | -770.665
256b random | -1120.173
1K random | -570.351
10k of same byte | 3602.610
100k of same byte | 36187.863
500k of same byte | 26055.472

After your Patch (pglz-variable-size-hash-table-2.patch)

testname | auto
-------------------+-----------
5k text | 3524.306
512b text | 954.962
256b text | 832.527
1K text | 1273.970
3K text | 3963.329
2k random | -300.516
100k random | -7.538
500k random | -1.525
512b random | -439.726
256b random | -440.154
1K random | -391.070
10k of same byte | 3570.921
100k of same byte | 37498.502
500k of same byte | 26904.426

There was minor problem in you patch, in one of experiments it crashed.
Fix is not to access 0th history entry in function pglz_find_match(),
modified patch is attached.

After fix, readings are almost similar:

testname | auto
-------------------+-----------
5k text | 3347.961
512b text | 938.442
256b text | 834.496
1K text | 1243.618
3K text | 3790.835
2k random | -306.470
100k random | -7.589
500k random | -1.517
512b random | -442.649
256b random | -438.781
1K random | -392.106
10k of same byte | 3565.449
100k of same byte | 37355.595
500k of same byte | 26776.076

I guess some difference might be due to different way of accessing in
pglz_hist_add().

With Regards,
Amit Kapila.

Attachment Content-Type Size
pglz-variable-size-hash-table-3.patch application/octet-stream 7.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yuri Levinsky 2013-06-26 13:41:37 Re: Hash partitioning.
Previous Message Noah Misch 2013-06-26 13:35:37 Re: updated emacs configuration