Re: What exactly is our CRC algorithm?

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: What exactly is our CRC algorithm?
Date: 2014-11-19 16:44:33
Message-ID: 546CC8F1.3010404@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/19/2014 05:58 PM, Abhijit Menon-Sen wrote:
> At 2014-11-11 16:56:00 +0530, ams(at)2ndQuadrant(dot)com wrote:
>>
>> I'm working on this (first speeding up the default calculation using
>> slice-by-N, then adding support for the SSE4.2 CRC instruction on
>> top).
>
> I've done the first part in the attached patch, and I'm working on the
> second (especially the bits to issue CPUID at startup and decide which
> implementation to use).
>
> As a benchmark, I ran pg_xlogdump --stats against 11GB of WAL data (674
> segments) generated by running a total of 2M pgbench transactions on a
> db initialised with scale factor 25.

That's an interesting choice of workload. That sure is heavy on the CRC
calculation, but the speed of pg_xlogdump hardly matters in real life.

> With HEAD's CRC code:
>
> bin/pg_xlogdump --stats wal/000000010000000000000001 29.81s user 3.56s system 77% cpu 43.274 total
> bin/pg_xlogdump --stats wal/000000010000000000000001 29.59s user 3.85s system 75% cpu 44.227 total
>
> With slice-by-4 (a minor variant of the attached patch; the results are
> included only for curiosity's sake, but I can post the code if needed):
>
> bin/pg_xlogdump --stats wal/000000010000000000000001 13.52s user 3.82s system 48% cpu 35.808 total
> bin/pg_xlogdump --stats wal/000000010000000000000001 13.34s user 3.96s system 48% cpu 35.834 total
>
> With slice-by-8 (i.e. the attached patch):
>
> bin/pg_xlogdump --stats wal/000000010000000000000001 7.88s user 3.96s system 34% cpu 34.414 total
> bin/pg_xlogdump --stats wal/000000010000000000000001 7.85s user 4.10s system 34% cpu 35.001 total
>
> (Note the progressive reduction in user time from ~29s to ~8s.)
>
> Finally, just for comparison, here's what happens when we use the
> hardware instruction via gcc's __builtin_ia32_crc32xx intrinsics
> (i.e. the additional patch I'm working on):
>
> bin/pg_xlogdump --stats wal/000000010000000000000001 3.33s user 4.79s system 23% cpu 34.832 total

Impressive!

It would be good to see separate benchmarks on WAL generation, and WAL
replay. pg_xlogdump is probably close to WAL replay, but the WAL
generation case is somewhat different, as WAL is generated in small
pieces, and each piece is accumulated to the CRC separately. The
slice-by-X algorithm might be less effective in that scenario. I have no
doubt that it's still a lot faster than the byte-at-a-time operation,
but would be nice to have numbers on it.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-11-19 16:49:40 Re: What exactly is our CRC algorithm?
Previous Message Robert Haas 2014-11-19 16:43:48 Re: proposal: plpgsql - Assert statement