Re: Cost of XLogInsert CRC calculations

Lists: pgsql-hackers
From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-03-07 11:04:00
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113169@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Tom,

> I was profiling a case involving UPDATEs into a table with too many
indexes (brought to > mind by mysql's sql-bench, about which more later) and
got this rather surprising result > for routines costing more than 1% of the
total runtime:

(cut)

> I suppose that the bulk of the CPU cycles being attributed to XLogInsert
are going into > the inlined CRC calculations. Maybe we need to think twice
about the cost/benefit ratio > of using 64-bit CRCs to protect xlog records
that are often only a few dozen bytes.

Wow, a 64-bit CRC does seem excessive, especially when going back to Zmodem
days where a 50-100k file seemed to be easily protected by a 32-bit CRC. I'm
sure there are some error rates somewhere dependent upon the polynomial and
the types of error detected.... Try the following link towards the bottom:
http://www.ee.unb.ca/tervo/ee4253/crc.htm for some theory on detection rates
vs. CRC size.

Kind regards,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-03-07 14:39:50
Message-ID: 23031.1110206390@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> Wow, a 64-bit CRC does seem excessive, especially when going back to Zmodem
> days where a 50-100k file seemed to be easily protected by a 32-bit CRC. I'm
> sure there are some error rates somewhere dependent upon the polynomial and
> the types of error detected.... Try the following link towards the bottom:
> http://www.ee.unb.ca/tervo/ee4253/crc.htm for some theory on detection rates
> vs. CRC size.

When the CRC size was decided, I recall someone arguing that it would
really make a difference to have 1-in-2^64 chance of failure rather than
1-in-2^32. I was dubious about this at the time, but didn't have any
evidence showing that we shouldn't go for 64. I suppose we ought to try
the same example with a 32-bit CRC and see how much it helps.

regards, tom lane


From: Gaetano Mendola <mendola(at)bigfoot(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-03-07 17:45:57
Message-ID: 422C9355.4040504@bigfoot.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
>
>>Wow, a 64-bit CRC does seem excessive, especially when going back to Zmodem
>>days where a 50-100k file seemed to be easily protected by a 32-bit CRC. I'm
>>sure there are some error rates somewhere dependent upon the polynomial and
>>the types of error detected.... Try the following link towards the bottom:
>>http://www.ee.unb.ca/tervo/ee4253/crc.htm for some theory on detection rates
>>vs. CRC size.
>
>
> When the CRC size was decided, I recall someone arguing that it would
> really make a difference to have 1-in-2^64 chance of failure rather than
> 1-in-2^32. I was dubious about this at the time, but didn't have any
> evidence showing that we shouldn't go for 64. I suppose we ought to try
> the same example with a 32-bit CRC and see how much it helps.

Continuing this why not a 16-bit then ?

Regards
Gaetano Mendola


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-03-07 23:53:59
Message-ID: 1110239639.6117.197.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2005-03-07 at 09:39 -0500, Tom Lane wrote:
> "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> > Wow, a 64-bit CRC does seem excessive, especially when going back to Zmodem
> > days where a 50-100k file seemed to be easily protected by a 32-bit CRC. I'm
> > sure there are some error rates somewhere dependent upon the polynomial and
> > the types of error detected.... Try the following link towards the bottom:
> > http://www.ee.unb.ca/tervo/ee4253/crc.htm for some theory on detection rates
> > vs. CRC size.
>
> When the CRC size was decided, I recall someone arguing that it would
> really make a difference to have 1-in-2^64 chance of failure rather than
> 1-in-2^32. I was dubious about this at the time, but didn't have any
> evidence showing that we shouldn't go for 64. I suppose we ought to try
> the same example with a 32-bit CRC and see how much it helps.

I think some of the additional run-time may be coming from processor
stalls associated with some of the constants used in the CRC checks.
I'll come back with more info on that later.

Well, we're using the CRC in 3 separate places...
(1) for xlog records
(2) for complete blocks copied to xlog
(3) for control files

For (1), records are so short that probably CRC16 would be sufficient
without increasing the error rate noticeably.

I think I'd like to keep (3) at CRC64...its just too important. Plus
thats slightly less code to change.

My money is on (2) being the source of most of that run-time anyway,
since when we enclose a whole block it takes a lot longer to CRC64 all
BLCKSZ bytes than it would do to CRC a single record in (1). But of
course, longer stretches of data need better error detection rates.

If Ethernet is using CRC32, it seems somewhat strange to use anything
higher than that, seeing as we're very likely to be sending xlog files
across the net anyway. Packet size is mostly comparable to BLCKSZ isn't
it?

So, yes CRC32 seems more reasonable.

One of the things I was thinking about was whether we could use up those
cycles more effectively. If we were to include a compression routine
before we calculated the CRC that would
- reduce the size of the blocks to be written, hence reduce size of xlog
- reduce the following CRC calculation

I was thinking about using a simple run-length encoding to massively
shrink half-empty blocks with lots of zero padding, but we've already
got code to LZW the data down also.

Best Regards, Simon Riggs


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-03-08 01:50:01
Message-ID: 1881.1110246601@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> Well, we're using the CRC in 3 separate places...
> (1) for xlog records
> (2) for complete blocks copied to xlog
> (3) for control files

> For (1), records are so short that probably CRC16 would be sufficient
> without increasing the error rate noticeably.

> I think I'd like to keep (3) at CRC64...its just too important. Plus
> thats slightly less code to change.

The control files are so short that CRC16 would be plenty.

> My money is on (2) being the source of most of that run-time anyway,

Undoubtedly, so there's not going to be much win from micro-optimization
by having several different CRC functions. I would go for CRC32 across
the board, myself.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-03-08 02:18:57
Message-ID: 873bv6anhq.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>
> > For (1), records are so short that probably CRC16 would be sufficient
> > without increasing the error rate noticeably.
>
> The control files are so short that CRC16 would be plenty.

It's not really the size of the data that governs the reasonable size of the
CRC. It's the probability of there being errors. For that you have to think
about what possible causes of errors you're concerned with.

AFAIK there's no CRC check at all in tables or indexes. So it doesn't seem
like bad hardware like a bad drive or RAM is what you're concerned with here.

From the other post it sounded like the threat was the possibility of an
interrupted arriving after the transaction commit log entry is written but
before the fsync has written the rest of the log.

In that case it seems the probability of it occurring is independent of the
size of the log entry. Is a 1 in 64k chance of a power failure resulting in
data corruption acceptable? A 1 in 4 billion chance?

You could eliminate the hole completely by using two fsyncs. fsync the
transaction information once. Then once that completes, then write and fsync
the transaction commit log entry.

--
greg


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-03-08 08:31:35
Message-ID: 1110270695.6117.229.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2005-03-07 at 20:50 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Well, we're using the CRC in 3 separate places...
> > (1) for xlog records
> > (2) for complete blocks copied to xlog
> > (3) for control files
>
> > For (1), records are so short that probably CRC16 would be sufficient
> > without increasing the error rate noticeably.
>
> > I think I'd like to keep (3) at CRC64...its just too important. Plus
> > thats slightly less code to change.
>
> The control files are so short that CRC16 would be plenty.
>
> > My money is on (2) being the source of most of that run-time anyway,
>
> Undoubtedly, so there's not going to be much win from micro-optimization
> by having several different CRC functions.

Agreed.

> I would go for CRC32 across
> the board, myself.

Sold.

Best Regards, Simon Riggs


From: tzirechnoy(at)hotpop(dot)com
To: pgsql-hackers(at)postgresql(dot)org
Subject: Cost of XLogInsert CRC calculations
Date: 2005-03-09 08:13:25
Message-ID: 20050309081325.GA2573462@krondor.astelecom.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 07, 2005 at 11:53:59PM +0000, Simon Riggs wrote:
> On Mon, 2005-03-07 at 09:39 -0500, Tom Lane wrote:
> > "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:

[skipped]

> Well, we're using the CRC in 3 separate places...
> (1) for xlog records
> (2) for complete blocks copied to xlog
> (3) for control files
>
> For (1), records are so short that probably CRC16 would be sufficient
> without increasing the error rate noticeably.
>
> I think I'd like to keep (3) at CRC64...its just too important. Plus
> thats slightly less code to change.
>
> My money is on (2) being the source of most of that run-time anyway,
> since when we enclose a whole block it takes a lot longer to CRC64 all
> BLCKSZ bytes than it would do to CRC a single record in (1). But of
> course, longer stretches of data need better error detection rates.

Well, if there is no need for error recovery, than
what about using more simple algorithm, like checksum? Perhaps,
it even could be attached to one of required memcpy calls.


From: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-03-11 18:31:50
Message-ID: 4231E416.4030900@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> One of the things I was thinking about was whether we could use up those
> cycles more effectively. If we were to include a compression routine
> before we calculated the CRC that would
> - reduce the size of the blocks to be written, hence reduce size of xlog
> - reduce the following CRC calculation
>
> I was thinking about using a simple run-length encoding to massively
> shrink half-empty blocks with lots of zero padding, but we've already
> got code to LZW the data down also.
>
> Best Regards, Simon Riggs
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org

Simon,

I think having a compression routine in there could make real sense.
We have done some major I/O testing involving compression for a large
customer some time ago. We have seen that compressing / decompressing on
the fly is in MOST cases much faster than uncompressed I/O (try a simple
"cat file | ..." vs." zcat file.gz | ...") - the zcat version will be
faster on all platforms we have tried (Linux, AIX, Sun on some SAN
system, etc. ...).
Also, when building up a large database within one transaction the xlog
will eat a lot of storage - this can be quite annoying when you have to
deal with a lot of data).
Are there any technical reasons which would prevent somebody from
implementing compression?

Best regards,

Hans

--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/660/816 40 77
www.cybertec.at, www.postgresql.at


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-03-13 21:26:13
Message-ID: 1110749173.11750.89.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2005-03-11 at 19:31 +0100, Hans-Jürgen Schönig wrote:
> > One of the things I was thinking about was whether we could use up those
> > cycles more effectively. If we were to include a compression routine
> > before we calculated the CRC that would
> > - reduce the size of the blocks to be written, hence reduce size of xlog
> > - reduce the following CRC calculation
> >
> > I was thinking about using a simple run-length encoding to massively
> > shrink half-empty blocks with lots of zero padding, but we've already
> > got code to LZW the data down also.

> I think having a compression routine in there could make real sense.
> We have done some major I/O testing involving compression for a large
> customer some time ago. We have seen that compressing / decompressing on
> the fly is in MOST cases much faster than uncompressed I/O (try a simple
> "cat file | ..." vs." zcat file.gz | ...") - the zcat version will be
> faster on all platforms we have tried (Linux, AIX, Sun on some SAN
> system, etc. ...).
> Also, when building up a large database within one transaction the xlog
> will eat a lot of storage - this can be quite annoying when you have to
> deal with a lot of data).

Agreed.

> Are there any technical reasons which would prevent somebody from
> implementing compression?

Not currently, that I'm aware of. But the way additional blocks are
added to xlog records would need to be changed to allow for variable
length output.

It's on my list...

Best Regards, Simon Riggs


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Simon Riggs'" <simon(at)2ndquadrant(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-10 14:13:48
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113321@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> -----Original Message-----
> From: Mark Cave-Ayland [mailto:m(dot)cave-ayland(at)webbased(dot)co(dot)uk]
> Sent: 07 March 2005 11:04
> To: 'tgl(at)sss(dot)pgh(dot)pa(dot)us'
> Cc: 'pgsql-hackers(at)postgreSQL(dot)org'
> Subject: Re: Cost of XLogInsert CRC calculations

(cut)

> > I suppose that the bulk of the CPU cycles being attributed to
> > XLogInsert are going into > the inlined CRC calculations. Maybe we
> > need to think twice about the cost/benefit ratio > of using 64-bit
> > CRCs to protect xlog records that are often only a few dozen bytes.
>
> Wow, a 64-bit CRC does seem excessive, especially when going
> back to Zmodem days where a 50-100k file seemed to be easily
> protected by a 32-bit CRC. I'm sure there are some error
> rates somewhere dependent upon the polynomial and the types
> of error detected.... Try the following link towards the
> bottom: http://www.ee.unb.ca/tervo/ee4253/crc.htm for some
> theory on detection rates vs. CRC size.

Hi Tom/Simon,

I was just researching some articles on compression (zlib) and I saw mention
of the Adler-32 algorithm which is supposed to be slightly less accurate
than an equivalent CRC calculation but significantly faster to compute. I
haven't located a good paper comparing the error rates of the two different
checksums, however extending the Adler-32 algorithm up to 64 bits may be a
way of increasing the speed at the expense of a slight loss in the accuracy
of error detection if we decided that we want to stay at 64 bits as opposed
to 32.

The following seems to indicate that Adler-32 is at least twice as fast as
optimised CRC32: http://www.winimage.com/misc/readfile_test.htm.

Kind regards,

Mark.

------------------------
WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: "'Simon Riggs'" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-10 14:30:55
Message-ID: 10161.1115735455@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> I was just researching some articles on compression (zlib) and I saw mention
> of the Adler-32 algorithm which is supposed to be slightly less accurate
> than an equivalent CRC calculation but significantly faster to compute. I
> haven't located a good paper comparing the error rates of the two different
> checksums,

... probably because there isn't one. With all due respect to the Zip
guys, I doubt anyone has done anywhere near the analysis on Adler-32
that has been done on CRCs. I'd much prefer to stick with true CRC
and drop it to 32 bits than go with a less-tested algorithm. Throwing
more bits at the problem doesn't necessarily create a safer checksum.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, "'Simon Riggs'" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-10 14:34:33
Message-ID: 200505101434.j4AEYXd02284@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> > I was just researching some articles on compression (zlib) and I saw mention
> > of the Adler-32 algorithm which is supposed to be slightly less accurate
> > than an equivalent CRC calculation but significantly faster to compute. I
> > haven't located a good paper comparing the error rates of the two different
> > checksums,
>
> ... probably because there isn't one. With all due respect to the Zip
> guys, I doubt anyone has done anywhere near the analysis on Adler-32
> that has been done on CRCs. I'd much prefer to stick with true CRC
> and drop it to 32 bits than go with a less-tested algorithm. Throwing
> more bits at the problem doesn't necessarily create a safer checksum.

Agreed. 64-bit was overkill when we added it, and it is now shown to be
a performance problem.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-10 21:46:05
Message-ID: 1115761565.3830.196.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2005-05-10 at 10:34 -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> > > I was just researching some articles on compression (zlib) and I saw mention
> > > of the Adler-32 algorithm which is supposed to be slightly less accurate
> > > than an equivalent CRC calculation but significantly faster to compute. I
> > > haven't located a good paper comparing the error rates of the two different
> > > checksums,
> >
> > ... probably because there isn't one. With all due respect to the Zip
> > guys, I doubt anyone has done anywhere near the analysis on Adler-32
> > that has been done on CRCs. I'd much prefer to stick with true CRC
> > and drop it to 32 bits than go with a less-tested algorithm. Throwing
> > more bits at the problem doesn't necessarily create a safer checksum.
>
> Agreed. 64-bit was overkill when we added it, and it is now shown to be
> a performance problem.

Hold on... Tom has shown that there is a performance problem with the
existing CRC calculation. Thanks to Mark for staying on top of that with
some good ideas.

The cause of the performance problem has been attributed to it being a
64-bit rather than 32-bit calculation. That is certainly part of it, but
I have seen evidence that there is an Intel processor stall associated
with the use of a single byte constant somewhere in the algorithm. So
I'm unclear as to what extent the poor performance is attributable to
either issue.

That's where my experience stops so I have highlighted that for somebody
with more hardware specific assembler experience to have a look at the
algorithm. Fixing that, if possible, could greatly improve the
performance whether or not we drop from 64 to 32 bits. My hope for
outside assistance on that looks like it is not now forthcoming.

My guess would be that the algorithm's use of the byte-by-byte
calculation together with a bitmask of &FF is responsible. Perhaps
varying the length of the bitmask to either &000000FF or longer may
help? (see src/include/xlog_internal.h)

Best Regards, Simon Riggs


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-10 22:22:28
Message-ID: 27521.1115763748@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> The cause of the performance problem has been attributed to it being a
> 64-bit rather than 32-bit calculation. That is certainly part of it, but
> I have seen evidence that there is an Intel processor stall associated
> with the use of a single byte constant somewhere in the algorithm.

That's awfully vague --- can't you give any more detail?

I have seen XLogInsert eating significant amounts of time (up to 10% of
total CPU time) on non-Intel architectures, so I think that dropping
down to 32 bits is warranted in any case. But if you are correct then
that might not fix the problem on Intel machines. We need more info.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-10 22:57:52
Message-ID: 1115765872.3830.252.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2005-05-10 at 18:22 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > The cause of the performance problem has been attributed to it being a
> > 64-bit rather than 32-bit calculation. That is certainly part of it, but
> > I have seen evidence that there is an Intel processor stall associated
> > with the use of a single byte constant somewhere in the algorithm.
>
> That's awfully vague --- can't you give any more detail?
>
> I have seen XLogInsert eating significant amounts of time (up to 10% of
> total CPU time) on non-Intel architectures, so I think that dropping
> down to 32 bits is warranted in any case. But if you are correct then
> that might not fix the problem on Intel machines. We need more info.

I have seen an Intel VTune report that shows a memory stall causing high
latency associated with a single assembly instruction that in the
compiled code of the CRC calculation. The instruction was manipulating a
single byte only. I couldn't tell exactly which line of PostgreSQL code
produced the assembler. This could be either a partial register stall or
a memory order buffer stall (or another?)

Here's a discussion of this
http://www.gamasutra.com/features/19991221/barad_pfv.htm

Sorry, but thats all I know. I will try to obtain the report, which is
not in my possession.

I do *not* know with any certainty what the proportion of time lost from
the CRC calc proper in an idealised CPU against the time lost from this
hardware specific interaction. I don't know if non-Intel CPUs are
effected either.

Best Regards, Simon Riggs


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Simon Riggs'" <simon(at)2ndquadrant(dot)com>
Cc: "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-11 12:40:08
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113322@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: 10 May 2005 23:22
> To: Simon Riggs
> Cc: Bruce Momjian; Mark Cave-Ayland (External);
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations

(cut)

> That's awfully vague --- can't you give any more detail?
>
> I have seen XLogInsert eating significant amounts of time (up
> to 10% of total CPU time) on non-Intel architectures, so I
> think that dropping down to 32 bits is warranted in any case.
> But if you are correct then that might not fix the problem
> on Intel machines. We need more info.
>
> regards, tom lane

Hi Tom/Simon,

Just for the record, I found a better analysis of Adler-32 following some
links from Wikipedia. In summary, the problem with Adler-32 is that while it
is only slightly less sensitive than CRC-32, it requires roughly a 1k
"run-in" in order to attain full coverage of the bits (with respect to
sensitivity of the input). This compares to 4 bytes of "run-in" required for
CRC-32. So unless we can guarantee a minimum of 1k data per Xlog record then
Adler-32 won't be suitable. See the following two links for more
information:

http://en.wikipedia.org/wiki/Adler-32
http://www.ietf.org/rfc/rfc3309.txt

One other consideration would be that since CRC-32 calculations for Xlog
records occur so often, perhaps the CRC-32 routines could be written in
in-line assembler, falling back to C for unsupported processors. It would be
interesting to come up with some benchmarks to see if indeed this would be
faster than the current C implementation, since as the routine is called so
often it could add up to a significant saving under higher loads.

Kind regards,

Mark.

------------------------
WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, 'Bruce Momjian' <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-11 19:32:04
Message-ID: 1115839924.3830.296.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2005-05-11 at 13:40 +0100, Mark Cave-Ayland wrote:
> So unless we can guarantee a minimum of 1k data per Xlog record then
> Adler-32 won't be suitable.

Most records are either 8KB or much less than 1KB. Is the benefit gained
from the 8KB records worth the loss on the more frequent smaller
records?

> perhaps the CRC-32 routines could be written in in-line assembler

If you can do this, step right up. :-)

Best Regards, Simon Riggs


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, 'Bruce Momjian' <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-12 01:46:07
Message-ID: 4282B55F.6060406@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>perhaps the CRC-32 routines could be written in in-line assembler
>
>
> If you can do this, step right up. :-)
>
> Best Regards, Simon Riggs

Surely there's an open source code floating around somewhere?

Chris


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, "'Simon Riggs'" <simon(at)2ndquadrant(dot)com>
Cc: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-12 12:48:48
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113326@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> -----Original Message-----
> From: Christopher Kings-Lynne [mailto:chriskl(at)familyhealth(dot)com(dot)au]
> Sent: 12 May 2005 02:46
> To: Simon Riggs
> Cc: Mark Cave-Ayland (External); 'Tom Lane'; 'Bruce Momjian';
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations
>
>
> >>perhaps the CRC-32 routines could be written in in-line assembler
> >
> >
> > If you can do this, step right up. :-)
> >
> > Best Regards, Simon Riggs
>
> Surely there's an open source code floating around somewhere?
>
> Chris

Hi Chris,

I've found some x86 MASM sources using Google, but nothing in GCC-style
in-line assembler. It would be good to see how this compares in terms of
speed with the code generated by GCC.

Kind regards,

Mark.

------------------------
WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: 'Christopher Kings-Lynne' <chriskl(at)familyhealth(dot)com(dot)au>, 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, 'Bruce Momjian' <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-12 15:52:16
Message-ID: 1115913136.3830.419.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2005-05-12 at 13:48 +0100, Mark Cave-Ayland wrote:
> > From: Christopher Kings-Lynne [mailto:chriskl(at)familyhealth(dot)com(dot)au]
> >
> > >>perhaps the CRC-32 routines could be written in in-line assembler
> > >
> > >
> > > If you can do this, step right up. :-)
> > >
> > > Best Regards, Simon Riggs
> >
> > Surely there's an open source code floating around somewhere?
> >

> I've found some x86 MASM sources using Google, but nothing in GCC-style
> in-line assembler. It would be good to see how this compares in terms of
> speed with the code generated by GCC.

Is it BSD? I looked for some other BSD code last month. There was some,
but it was clearly slower than the existing code.

It might be possible to speed things up by a factor of two using a two-
byte lookup table rather than a one byte lookup. This would take up
65536 table entries, which is only 256KB. If we held that in shared
memory we could calculate the entries at startup, or read them in from a
file. It would only use the same space as if we had 255 connections, so
not a great increase in memory usage overall. Need to look at the effect
of retrieving everything from memory rather than from cache, so it
probably wouldn't be twice as fast.

Best Regards, Simon Riggs


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Simon Riggs'" <simon(at)2ndquadrant(dot)com>
Cc: "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-16 08:03:51
Message-ID: 9EB50F1A91413F4FA63019487FCD251D11332C@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> -----Original Message-----
> From: Simon Riggs [mailto:simon(at)2ndquadrant(dot)com]
> Sent: 12 May 2005 16:52
> To: Mark Cave-Ayland (External)
> Cc: 'Christopher Kings-Lynne'; 'Tom Lane'; 'Bruce Momjian';
> pgsql-hackers(at)postgresql(dot)org
> Subject: RE: [HACKERS] Cost of XLogInsert CRC calculations

(cut)

> It might be possible to speed things up by a factor of two using a
> two- byte lookup table rather than a one byte lookup. This would take
> up 65536 table entries, which is only 256KB. If we held that in shared
> memory we could calculate the entries at startup, or read them in from
> a file. It would only use the same space as if we had 255 connections,
> so not a great increase in memory usage overall. Need to look at the
> effect of retrieving everything from memory rather than from
> cache, so it probably wouldn't be twice as fast.
>
> Best Regards, Simon Riggs

Hi everyone,

As per this thread, I decided to spend a little time over the weekend
looking at the performance of the existing CRC64 code in order to determine
whether we could code the algorithm in a more efficient manner. In order to
do this, I devised a couple of test programs using code cut/pasted from
pg_crc.h (and pg_resetxlog) in order perform some timings.

These tests were done on two systems: a P4 1.8GHz laptop with MingW (Win32)
and gcc 3.2.3, and a Xeon 2.8GHz server running FC1 and gcc 3.3.2.

The program used to time the CRC64 calculation simply did the following:

1) Create an 8k buffer of data

2) Fill the buffer with some repeatable values; in this case the
algorithm used was the following:

for (i = 0 ; i < 8192 ; i++ )
{
*data = (char )(i % 256);
}

3) Calculate the CRC64 of the buffer 10,000 times and display the
result,
making a note of the start and end times. Using this information an
estimate
of the cost of a single CRC calculation can esaily be found.

I noticed looking at the code that there were two versions of the CRC code,
one using 32-bit arguments wrapped within a check for INT64_IS_BUSTED and
another using 64-bit unsigned integers. Since I wasn't sure initially which
one was being used, I decided to test both of them :) Both programs were
compiled using gcc's -O2 flag for optimisation.

The first thing to notice about the results was that I obtained different
CRC values using both algorithms on Win32, which were both different to the
results obtained under Linux:

Win32 MingW:
1) INT64_IS_BUSTED code (32-bit): 0x541d4a27 (crc1) 0x8dfa57ae
(crc0)
2) 64-bit code: 0x817f722b1f17b253 (crc0)

Linux (FC1):
1) INT64_IS_BUSTED code (32-bit): 0x21d064a2 (crc1) 0xe7c74332
(crc0)
2) 64-bit code: 0x21d064a2e7c74332 (crc0)

The second interesting thing to notice was the comparative speeds:

Win32 MingW:
1) INT64_IS_BUSTED code (32-bit): ~57us per calculation
2) 64-bit code: ~130us per calculation

Linux (FC1):
1) INT64_IS_BUSTED code (32-bit): ~29us per calculation
2) 64-bit code: ~76us per calculation

Looking at the flags in pg_config.h from both source builds, it would appear
that the 64-bit code is being used since the compiler defines
HAVE_LONG_LONG_INT_64. So that means there could be a potential 100%
performance increase by just using the existing INT64_IS_BUSTED code on x86
32 bit processors. I don't know given the rate of xlog records being written
whether or not this translates to more than a couple of minutes in an insert
of a million records or so.

I did have a look at the assembly code produced by the INT64_IS_BUSTED code
and it appears fairly tight; I'm no x86 expert but I think it would be hard
to optimise what was produced by the compiler. If anyone else is interested
in looking at these results then I will happily send the test programs
off-list - however my main concern is that the Win32 CRC64 results just
don't seem to be right :(

Kind regards,

Mark.

------------------------
WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Simon Riggs'" <simon(at)2ndquadrant(dot)com>
Cc: "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-16 11:12:55
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113332@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> -----Original Message-----
> From: Mark Cave-Ayland [mailto:m(dot)cave-ayland(at)webbased(dot)co(dot)uk]
> Sent: 16 May 2005 09:04
> To: 'Simon Riggs'
> Cc: 'Christopher Kings-Lynne'; 'Tom Lane'; 'Bruce Momjian';
> 'pgsql-hackers(at)postgresql(dot)org'
> Subject: RE: [HACKERS] Cost of XLogInsert CRC calculations

(cut)

> The program used to time the CRC64 calculation simply did the
> following:

(cut)

> 2) Fill the buffer with some repeatable values; in this case the
> algorithm used was the following:
>
> for (i = 0 ; i < 8192 ; i++ )
> {
> *data = (char )(i % 256);
> }

Sigh, so it would help if I had added the offset to the data pointer... ;)

for (i = 0 ; i < 8192 ; i++ )
{
*(data + i) = (char )(i % 256);
}

This now gives the following (correct) result on both platforms:
Win32: 1.8GHz P4, WinXP
Linux: 2.8GHz Xeon, FC1

Win32 UINT64: 0x782104059a01660 (crc0)
~158us
Win32 UINT32: 0x78210405 (crc1), 0x59a01660 (crc0) ~58us

FC1 UINT64: 0x782104059a01660 (crc0)
~76us
FC1 UINT32: 0x78210405 (crc1), 0x59a01660 (crc0)
~29us

Note that we still find that using the INT64_IS_BUSTED code is over 100%
quicker than the UINT64 code for CRC64 calculation, and I believe it is not
being used by default under Linux or Win32 for 32 bit platforms. Of course
Tom's suggestion of going for CRC32 across the board would hopefully solve
this entirely and bring the times down a little further too.

Kind regards,

Mark.

------------------------
WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Hannu Krosing <hannu(at)skype(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: 'Simon Riggs' <simon(at)2ndquadrant(dot)com>, 'Christopher Kings-Lynne' <chriskl(at)familyhealth(dot)com(dot)au>, 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, 'Bruce Momjian' <pgman(at)candle(dot)pha(dot)pa(dot)us>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-16 13:07:51
Message-ID: 1116248871.4806.33.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On E, 2005-05-16 at 12:12 +0100, Mark Cave-Ayland wrote:
> > -----Original Message-----
> > From: Mark Cave-Ayland [mailto:m(dot)cave-ayland(at)webbased(dot)co(dot)uk]
> > Sent: 16 May 2005 09:04
> > To: 'Simon Riggs'
> > Cc: 'Christopher Kings-Lynne'; 'Tom Lane'; 'Bruce Momjian';
> > 'pgsql-hackers(at)postgresql(dot)org'
> > Subject: RE: [HACKERS] Cost of XLogInsert CRC calculations
>

>
> Win32 UINT64: 0x782104059a01660 (crc0) ~158us
> Win32 UINT32: 0x78210405 (crc1), 0x59a01660 (crc0) ~58us
> FC1 UINT64: 0x782104059a01660 (crc0) ~76us
> FC1 UINT32: 0x78210405 (crc1), 0x59a01660 (crc0) ~29us

It would interesting to see how much of the time taken is actual algorithm
and how much is getting at the data (i.e the fact that the page has to go
through CPU at all, instead DMA'ing it to disk).

In your test setup you probably have the whole thing in CPU cache already,
but still it would be interesting to time the same thing with some simple
algorithm, like XOR over the whole 8k page, for comparison.

--
Hannu Krosing <hannu(at)skype(dot)net>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-16 14:01:13
Message-ID: 17220.1116252073@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> Sigh, so it would help if I had added the offset to the data pointer... ;)

Would you post the corrected program so people can try it on a few other
architectures? No point in reinventing the wheel, even if it is a
pretty trivial wheel.

regards, tom lane


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-16 14:32:15
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113335@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Tom,

I didn't post the sources to the list originally as I wasn't sure if the
topic were of enough interest to warrant a larger email. I've attached the
two corrected programs as a .tar.gz - crctest.c uses uint32, whereas
crctest64.c uses uint64.

Kind regards,

Mark.

------------------------
WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: 16 May 2005 15:01
> To: Mark Cave-Ayland (External)
> Cc: pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations
>
>
> "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> > Sigh, so it would help if I had added the offset to the data
> > pointer... ;)
>
> Would you post the corrected program so people can try it on
> a few other architectures? No point in reinventing the
> wheel, even if it is a pretty trivial wheel.
>
> regards, tom lane
>

Attachment Content-Type Size
crctest.tar.gz application/octet-stream 6.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-16 16:35:35
Message-ID: 18629.1116261335@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> I didn't post the sources to the list originally as I wasn't sure if the
> topic were of enough interest to warrant a larger email. I've attached the
> two corrected programs as a .tar.gz - crctest.c uses uint32, whereas
> crctest64.c uses uint64.

I did some experimentation and concluded that gcc is screwing up
big-time on optimizing the CRC64 code for 32-bit Intel. It does much
better on every other architecture though.

Here are some numbers with gcc 3.2.3 on an Intel Xeon machine. (I'm
showing the median of three trials in each case, but the numbers were
pretty repeatable. I also tried gcc 4.0.0 on this machine and got
similar numbers.)

gcc -O1 crctest.c 0.328571 s
gcc -O2 crctest.c 0.297978 s
gcc -O3 crctest.c 0.306894 s

gcc -O1 crctest64.c 0.358263 s
gcc -O2 crctest64.c 0.773544 s
gcc -O3 crctest64.c 0.770945 s

When -O2 is slower than -O1, you know the compiler is blowing it :-(.
I fooled around with non-default -march settings but didn't see much
change.

Similar tests on a several-year-old Pentium 4 machine, this time with
gcc version 3.4.3:

gcc -O1 -march=pentium4 crctest.c 0.486266 s
gcc -O2 -march=pentium4 crctest.c 0.520237 s
gcc -O3 -march=pentium4 crctest.c 0.520299 s

gcc -O1 -march=pentium4 crctest64.c 0.928107 s
gcc -O2 -march=pentium4 crctest64.c 1.247673 s
gcc -O3 -march=pentium4 crctest64.c 1.654102 s

Here are some comparisons showing that the performance difference is
not inherent:

IA64 (Itanium 2), gcc 3.2.3:

gcc -O1 crctest.c 0.898595 s
gcc -O2 crctest.c 0.599005 s
gcc -O3 crctest.c 0.598824 s

gcc -O1 crctest64.c 0.524257 s
gcc -O2 crctest64.c 0.524168 s
gcc -O3 crctest64.c 0.524140 s

X86_64 (Opteron), gcc 3.2.3:

gcc -O1 crctest.c 0.460000 s
gcc -O2 crctest.c 0.460000 s
gcc -O3 crctest.c 0.460000 s

gcc -O1 crctest64.c 0.410000 s
gcc -O2 crctest64.c 0.410000 s
gcc -O3 crctest64.c 0.410000 s

PPC64 (IBM POWER4+), gcc 3.2.3

gcc -O1 crctest.c 0.819492 s
gcc -O2 crctest.c 0.819427 s
gcc -O3 crctest.c 0.820616 s

gcc -O1 crctest64.c 0.751639 s
gcc -O2 crctest64.c 0.894250 s
gcc -O3 crctest64.c 0.888959 s

PPC (Mac G4), gcc 3.3

gcc -O1 crctest.c 0.949094 s
gcc -O2 crctest.c 1.011220 s
gcc -O3 crctest.c 1.013847 s
gcc -O1 crctest64.c 1.314093 s
gcc -O2 crctest64.c 1.015367 s
gcc -O3 crctest64.c 1.011468 s

HPPA, gcc 2.95.3:

gcc -O1 crctest.c 1.796604 s
gcc -O2 crctest.c 1.676023 s
gcc -O3 crctest.c 1.676476 s
gcc -O1 crctest64.c 2.022798 s
gcc -O2 crctest64.c 1.916185 s
gcc -O3 crctest64.c 1.904094 s

Given the lack of impressive advantage to the 64-bit code even on 64-bit
architectures, it might be best to go with the 32-bit code everywhere,
but I also think we have grounds to file a gcc bug report.

Anyone want to try it with non-gcc compilers? I attach a slightly
cleaned-up version of Mark's original (doesn't draw compiler warnings
or errors on what I tried it on).

regards, tom lane

Attachment Content-Type Size
crctest.tar.gz application/octet-stream 6.8 KB

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: 'Christopher Kings-Lynne' <chriskl(at)familyhealth(dot)com(dot)au>, 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, 'Bruce Momjian' <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-16 18:51:48
Message-ID: 1116269508.3830.519.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2005-05-16 at 12:12 +0100, Mark Cave-Ayland wrote:
> This now gives the following (correct) result on both platforms:
> Win32: 1.8GHz P4, WinXP
> Linux: 2.8GHz Xeon, FC1
>
>
> Win32 UINT64: 0x782104059a01660 (crc0)
> ~158us
> Win32 UINT32: 0x78210405 (crc1), 0x59a01660 (crc0) ~58us
>
> FC1 UINT64: 0x782104059a01660 (crc0)
> ~76us
> FC1 UINT32: 0x78210405 (crc1), 0x59a01660 (crc0)
> ~29us
>
>
> Note that we still find that using the INT64_IS_BUSTED code is over 100%
> quicker than the UINT64 code for CRC64 calculation, and I believe it is not
> being used by default under Linux or Win32 for 32 bit platforms. Of course
> Tom's suggestion of going for CRC32 across the board would hopefully solve
> this entirely and bring the times down a little further too.

I think perhaps that the difference in hardware is the reason for the
difference in elapsed time, not the OS.

The performance gain is disturbing. I think its supposed to be the other
way around isn't it? Like having INT64 is supposed to be good...

Perhaps the BIOS on your systems don't correctly support 64-bit, so
mimicking it costs more.

Best Regards, Simon Riggs


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-17 14:02:14
Message-ID: 9EB50F1A91413F4FA63019487FCD251D11333F@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: 16 May 2005 17:36
> To: Mark Cave-Ayland (External)
> Cc: pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations

(cut)

> I did some experimentation and concluded that gcc is screwing
> up big-time on optimizing the CRC64 code for 32-bit Intel.
> It does much better on every other architecture though.

Hi Tom,

Thanks very much for showing that the unint64 slowdown was caused by the
optimisation done by gcc - I've had a go at filing a bug with the gcc people
at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21617 so it would be
interesting see if they can solve this. Perhaps like you suggest, the short
term solution is to use the uint32 CRC64 code everywhere at the moment. I
hope to find some time later this week to write and test some CRC32
routines, and will post the results back to the list.

Many thanks,

Mark.

------------------------
WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Kris Jurka <books(at)ejurka(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-17 17:15:25
Message-ID: Pine.BSO.4.56.0505171159480.20628@leary.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 16 May 2005, Tom Lane wrote:

> I did some experimentation and concluded that gcc is screwing up
> big-time on optimizing the CRC64 code for 32-bit Intel. It does much
> better on every other architecture though.
>
> Anyone want to try it with non-gcc compilers?

Solaris 9 x86 - Sun Workshop 6 update 2 C 5.3, gcc 3.2.3

gcc -O1 crctest.c .251422
gcc -O3 crctest.c .240223
gcc -O1 crctest64.c .281369
gcc -O3 crctest64.c .631290

cc -O crctest.c .268905
cc -fast crctest.c .242429
cc -O crctest64.c .283278
cc -fast crctest64.c .255560

Kris Jurka


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-17 20:22:52
Message-ID: 3vik81tiferogv0k3in75oq7k0178obhnj@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 16 May 2005 12:35:35 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>Anyone want to try it with non-gcc compilers?

MS VC++ 6.0 with various predefined optimizer settings

2x32 64
Default (without any /O) 0.828125 0.906250
MinSize (contains /O1) 0.468750 0.593750
MaxSpeed (contains /O2) 0.312500 0.640625

Not that it really matters -- but at least this looks like another hint
that letting the compiler emulate 64 bit operations on non 64 bit
hardware is suboptimal.

Servus
Manfred


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-17 21:23:01
Message-ID: 1116364981.4797.24.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On E, 2005-05-16 at 12:35 -0400, Tom Lane wrote:

> Given the lack of impressive advantage to the 64-bit code even on 64-bit
> architectures, it might be best to go with the 32-bit code everywhere,
> but I also think we have grounds to file a gcc bug report.

Maybe on other platforms , but 20% on Power is not something we should
throw away.

crc32 compiled as 32bit executable is 10% slower than crc64 as eithet 32
or 64 bit exe, but if you compile your backend as 64bit then the
difference is almost 20%. crc64 is the same speed compiled either way.

gcc version 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)
on OpenPower5 1.8GHz

file ./crctest
./crctest: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1
(SYSV)
cc -O1 crctest.c -o crctest -- time 0.584327 s
cc -O2 crctest.c -o crctest -- time 0.594664 s
cc -O3 crctest.c -o crctest -- time 0.594764 s

file ./crctest
./crctest: ELF 64-bit MSB executable, cisco 7500, version 1 (SYSV)
cc -O1 -m64 crctest.c -o crctest -- time 0.644473 s
cc -O2 -m64 crctest.c -o crctest -- time 0.648033 s
cc -O3 -m64 crctest.c -o crctest -- time 0.688682 s

file ./crctest64
./crctest64: ELF 64-bit MSB executable, cisco 7500, version 1 (SYSV)
cc -O1 -m64 crctest64.c -o crctest64 -- time 0.545026 s
cc -O2 -m64 crctest64.c -o crctest64 -- time 0.545470 s
cc -O3 -m64 crctest64.c -o crctest64 -- time 0.545037 s

file ./crctest64
./crctest64: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1
(SYSV)
cc -O1 crctest64.c -o crctest64 -- time 0.545364 s
cc -O2 crctest64.c -o crctest64 -- time 0.644093 s
cc -O3 crctest64.c -o crctest64 -- time 0.644155 s

> Anyone want to try it with non-gcc compilers? I attach a slightly
> cleaned-up version of Mark's original (doesn't draw compiler warnings
> or errors on what I tried it on).

I'll probably get a chance to try IBM's own compiler tomorrow

--
Hannu Krosing <hannu(at)skype(dot)net>


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-17 22:29:54
Message-ID: 200505172229.j4HMTsL27844@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Manfred Koizar wrote:
> On Mon, 16 May 2005 12:35:35 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >Anyone want to try it with non-gcc compilers?
>
> MS VC++ 6.0 with various predefined optimizer settings
>
> 2x32 64
> Default (without any /O) 0.828125 0.906250
> MinSize (contains /O1) 0.468750 0.593750
> MaxSpeed (contains /O2) 0.312500 0.640625
>
> Not that it really matters -- but at least this looks like another hint
> that letting the compiler emulate 64 bit operations on non 64 bit
> hardware is suboptimal.

I don't understand why we are testing 64-bit CRC when I thought we
agreed that 32-bit was good enough for our purposes.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Manfred Koizar <mkoi-pg(at)aon(dot)at>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-18 02:37:05
Message-ID: 16235.1116383825@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> I don't understand why we are testing 64-bit CRC when I thought we
> agreed that 32-bit was good enough for our purposes.

Well, we need to understand exactly what is going on here. I'd not
like to think that we dropped back from 64 to 32 bit because of one
possibly-minor optimization bug in one compiler on one platform.
Even if that compiler+platform is 90% of the market.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Manfred Koizar <mkoi-pg(at)aon(dot)at>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-18 03:51:52
Message-ID: 200505180351.j4I3pqe20492@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > I don't understand why we are testing 64-bit CRC when I thought we
> > agreed that 32-bit was good enough for our purposes.
>
> Well, we need to understand exactly what is going on here. I'd not
> like to think that we dropped back from 64 to 32 bit because of one
> possibly-minor optimization bug in one compiler on one platform.
> Even if that compiler+platform is 90% of the market.

But isn't it obvious that almost any problem that CRC64 is going to
catch, CRC32 is going to catch, and we know CRC32 has to be faster than
CRC64?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Manfred Koizar <mkoi-pg(at)aon(dot)at>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-18 04:25:31
Message-ID: 17080.1116390331@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom Lane wrote:
>> Well, we need to understand exactly what is going on here. I'd not
>> like to think that we dropped back from 64 to 32 bit because of one
>> possibly-minor optimization bug in one compiler on one platform.
>> Even if that compiler+platform is 90% of the market.

> But isn't it obvious that almost any problem that CRC64 is going to
> catch, CRC32 is going to catch, and we know CRC32 has to be faster than
> CRC64?

Do we know that? The results I showed put at least one fundamentally
32bit platform (the PowerBook I'm typing this on) at dead par for 32bit
and 64bit CRCs. We have also established that 64bit CRC can be done
much faster on 32bit Intel than it's currently done by the default
PG-on-gcc build (hint: don't use -O2 or above). So while Mark's report
that 64bit CRC is an issue on Intel is certainly true, it doesn't
immediately follow that the only sane response is to give up 64bit CRC.
We need to study it and see what alternatives we have.

I do personally feel that 32bit is the way to go, but that doesn't
mean I think it's a done deal. We owe it to ourselves to understand
what we are buying and what we are paying for it.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Manfred Koizar <mkoi-pg(at)aon(dot)at>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-18 05:04:41
Message-ID: 87ekc5cefq.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Do we know that? The results I showed put at least one fundamentally
> 32bit platform (the PowerBook I'm typing this on) at dead par for 32bit
> and 64bit CRCs.

Wait, par for 32-bit CRCs? Or for 64-bit CRCs calculated using 32-bit ints?

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Manfred Koizar <mkoi-pg(at)aon(dot)at>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-18 05:12:26
Message-ID: 17441.1116393146@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Do we know that? The results I showed put at least one fundamentally
>> 32bit platform (the PowerBook I'm typing this on) at dead par for 32bit
>> and 64bit CRCs.

> Wait, par for 32-bit CRCs? Or for 64-bit CRCs calculated using 32-bit ints?

Right, the latter. We haven't actually tried to measure the cost of
plain 32bit CRCs... although I seem to recall that when we originally
decided to use 64bit, someone put up some benchmarks purporting to
show that there wasn't much difference.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Manfred Koizar <mkoi-pg(at)aon(dot)at>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-18 05:24:15
Message-ID: 200505180524.j4I5OF803426@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> >> Do we know that? The results I showed put at least one fundamentally
> >> 32bit platform (the PowerBook I'm typing this on) at dead par for 32bit
> >> and 64bit CRCs.
>
> > Wait, par for 32-bit CRCs? Or for 64-bit CRCs calculated using 32-bit ints?
>
> Right, the latter. We haven't actually tried to measure the cost of
> plain 32bit CRCs... although I seem to recall that when we originally
> decided to use 64bit, someone put up some benchmarks purporting to
> show that there wasn't much difference.

OK, thanks. I didn't know that.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Hannu Krosing <hannu(at)skype(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Manfred Koizar <mkoi-pg(at)aon(dot)at>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-18 07:24:20
Message-ID: 1116401061.4809.4.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On T, 2005-05-17 at 22:37 -0400, Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > I don't understand why we are testing 64-bit CRC when I thought we
> > agreed that 32-bit was good enough for our purposes.
>
> Well, we need to understand exactly what is going on here. I'd not
> like to think that we dropped back from 64 to 32 bit because of one
> possibly-minor optimization bug in one compiler on one platform.
> Even if that compiler+platform is 90% of the market.

There are cases where 32bit is about 20% slower.

I tried to send the folowing yesterday, but for some reason the mails I
send from home where To: is Tom Lane get errors from
"RCPT:tgl(at)sss(dot)pgh(dot)pa(dot)us " and fail to go through to other destinations
(like pgsql-hackers) after that :(
-----

crc32 compiled as 32bit executable is 10% slower than crc64 as eithet 32
or 64 bit exe, but if you compile your backend as 64bit then the
difference is almost 20%. crc64 is the same speed compiled either way.

gcc version 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)
on OpenPower5 1.8GHz

file ./crctest
./crctest: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1
(SYSV)
cc -O1 crctest.c -o crctest -- time 0.584327 s
cc -O2 crctest.c -o crctest -- time 0.594664 s
cc -O3 crctest.c -o crctest -- time 0.594764 s

file ./crctest
./crctest: ELF 64-bit MSB executable, cisco 7500, version 1 (SYSV)
cc -O1 -m64 crctest.c -o crctest -- time 0.644473 s
cc -O2 -m64 crctest.c -o crctest -- time 0.648033 s
cc -O3 -m64 crctest.c -o crctest -- time 0.688682 s

file ./crctest64
./crctest64: ELF 64-bit MSB executable, cisco 7500, version 1 (SYSV)
cc -O1 -m64 crctest64.c -o crctest64 -- time 0.545026 s
cc -O2 -m64 crctest64.c -o crctest64 -- time 0.545470 s
cc -O3 -m64 crctest64.c -o crctest64 -- time 0.545037 s

file ./crctest64
./crctest64: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1
(SYSV)
cc -O1 crctest64.c -o crctest64 -- time 0.545364 s
cc -O2 crctest64.c -o crctest64 -- time 0.644093 s
cc -O3 crctest64.c -o crctest64 -- time 0.644155 s
tgl(at)sss(dot)pgh(dot)pa(dot)us

--
Hannu Krosing <hannu(at)skype(dot)net>


From: Hannu Krosing <hannu(at)skype(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Manfred Koizar <mkoi-pg(at)aon(dot)at>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-18 08:06:22
Message-ID: 1116403582.4809.12.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On K, 2005-05-18 at 10:24 +0300, Hannu Krosing wrote:
> On T, 2005-05-17 at 22:37 -0400, Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > I don't understand why we are testing 64-bit CRC when I thought we
> > > agreed that 32-bit was good enough for our purposes.
> >
> > Well, we need to understand exactly what is going on here. I'd not
> > like to think that we dropped back from 64 to 32 bit because of one
> > possibly-minor optimization bug in one compiler on one platform.
> > Even if that compiler+platform is 90% of the market.
>
> There are cases where 32bit is about 20% slower.
>
> I tried to send the folowing yesterday, but for some reason the mails I
> send from home where To: is Tom Lane get errors from
> "RCPT:tgl(at)sss(dot)pgh(dot)pa(dot)us " and fail to go through to other destinations
> (like pgsql-hackers) after that :(
> -----

the same difference between 32bit and 64bit CRC when compiled as 64bit
exe is there also on ibms own compiler (vac xlc 7.0)

[hannu(at)power ~]$ c99 -O5 -qarch=pwr5 -qtune=pwr5 -qsmp=omp -qunroll=yes -q64 crctest64.c -o crctest64_c99_64
[hannu(at)power ~]$ ./crctest64_c99_64
Result of CRC64 (10000 runs): 782104059a01660 in time 0.545042 s
[hannu(at)power ~]$ c99 -O5 -qarch=pwr5 -qtune=pwr5 -qsmp=omp -qunroll=yes\
-q64 crctest.c -o crctest_c99_64
[hannu(at)power ~]$ ./crctest_c99_64
Result of CRC64 (10000 runs): 7821040 (high), 59a01660 (low) in time
0.644319 s

>
--
Hannu Krosing <hannu(at)skype(dot)net>


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Greg Stark'" <gsstark(at)mit(dot)edu>
Cc: "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-18 08:23:30
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113340@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: 18 May 2005 06:12
> To: Greg Stark
> Cc: Bruce Momjian; Manfred Koizar; Mark Cave-Ayland
> (External); pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations
>
>
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> >> Do we know that? The results I showed put at least one fundamentally
> >> 32bit platform (the PowerBook I'm typing this on) at dead par for
> >> 32bit and 64bit CRCs.
>
> > Wait, par for 32-bit CRCs? Or for 64-bit CRCs calculated using 32-bit
> > ints?
>
> Right, the latter. We haven't actually tried to measure the
> cost of plain 32bit CRCs... although I seem to recall that
> when we originally decided to use 64bit, someone put up some
> benchmarks purporting to show that there wasn't much difference.

If all goes to plan, I shall post a test program for CRC32 tomorrow along
with my results for other people to benchmark. The one thing this exercise
has shown me is that you can't necessarily trust the compiler to make the
right optimisation choices 100% of time, and so the only real way to find
out about the performance is to test out what you are trying to do with some
sample data. To quote Tom on several occasions: "Let's stop the hand
waving...."

Looking at the asm code produced by gcc, my feeling is that most of the time
would be spent reading/writing to memory rather than doing the shift and
xor. Using a 64-bit CRC, there aren't enough registers under x86-32 to store
the result of each iteration of the algorithm and so it gets pushed out of
the registers into memory at the end of each iteration and then read back in
at the beginning of the next iteration.

With a 32-bit CRC, the entire calculation could potentially be done entirely
in the registers, with the final result being written to memory at the end.
Combined with the fact that less cycles will be required for the shift (and
that main memory is only read rather than written) then I would expect a
32-bit CRC to be significantly faster. I also think that since the last
tests for 32-bit vs 64-bit CRC were done, compilers will have improved by
several orders of magnitude making the difference between the two more
noticeable. However, I shall wait until I have the code completed and
working before I report on the results :)

Kind regards,

Mark.

------------------------
WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-18 11:50:22
Message-ID: aq9m81l2k5br7rahcf9lvc9jj11lbodf09@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 18 May 2005 01:12:26 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Wait, par for 32-bit CRCs? Or for 64-bit CRCs calculated using 32-bit ints?
>
>Right, the latter. We haven't actually tried to measure the cost of
>plain 32bit CRCs... although I seem to recall that when we originally
>decided to use 64bit, someone put up some benchmarks purporting to
>show that there wasn't much difference.

That someone wasn't me (I wasn't around here at that time), but I have
done a few tests today on 32 bit Intel with VC6:

Optimization | CRC algorithms
Settings | 32 32a 32b 2x32 64 64a 64b
-------------+-----------------------------------------------
Default | 7.6 7.6 6.2 8.3 9.1 9.2 9.5
MinSize | 2.96 2.97 2.97 4.76 6.00 5.98 6.31
MaxSpeed | 2.92 2.92 2.97 3.13 6.32 6.33 6.22

32a and 32b are functionally equivalent variants of CRC32 where the crc
is a plain uint32, not a struct with just one field. Same for 64a, 64b,
and 64, respectively.

The most important figure is, that at MaxSpeed (/O2) 2x32 is almost
twice as fast as CRC32 while only being marginally slower than CRC32.

In case anybody wants to repeat my tests or find any flaw therein, the
source is attached.

Servus
Manfred

Attachment Content-Type Size
crctest.tgz application/octet-stream 8.9 KB

From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-25 19:24:59
Message-ID: 17k991hh4ddg82nj61dvu3d197tjcu615j@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 18 May 2005 13:50:22 +0200, I wrote:
>The most important figure is, that at MaxSpeed (/O2) 2x32 is almost
>twice as fast as CRC32 while only being marginally slower than CRC32.
^^^^^
Silly typo! That should have been:
The most important figure is, that at MaxSpeed (/O2) 2x32 is almost
twice as fast as CRC64 while only being marginally slower than CRC32.

Servus
Manfred


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>
Cc: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Greg Stark'" <gsstark(at)mit(dot)edu>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-27 13:40:49
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113367@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> -----Original Message-----
> From: Manfred Koizar [mailto:mkoi-pg(at)aon(dot)at]
> Sent: 25 May 2005 20:25
> To: Manfred Koizar
> Cc: Tom Lane; Greg Stark; Bruce Momjian; Mark Cave-Ayland
> (External); pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations

(cut)

> The most important figure is, that at MaxSpeed (/O2) 2x32 is
> almost twice as fast as CRC64 while only being marginally
> slower than CRC32.
>
> Servus
> Manfred


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>
Cc: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Greg Stark'" <gsstark(at)mit(dot)edu>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-27 13:49:47
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113368@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> -----Original Message-----
> From: Manfred Koizar [mailto:mkoi-pg(at)aon(dot)at]
> Sent: 25 May 2005 20:25
> To: Manfred Koizar
> Cc: Tom Lane; Greg Stark; Bruce Momjian; Mark Cave-Ayland
> (External); pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations

(cut)

> The most important figure is, that at MaxSpeed (/O2) 2x32 is
> almost twice as fast as CRC64 while only being marginally
> slower than CRC32.
>
> Servus
> Manfred

Hi Manfred,

Sorry about taking a while to respond on this one - the hard drive on my
laptop crashed :(. I repeated your tests on my P4 laptop with gcc 3.2.3 and
reproduced the results below:

Opt 32 32a 32b 2x32 64 64a 64b
--------------------------------------------------------
O1 4.91 4.86 5.43 6.00 11.4 11.39 11.39
O2 4.96 4.94 4.69 5.18 15.86 18.75 24.73
O3 4.82 4.83 4.64 5.18 15.14 13.77 14.73

^^^^^^^^^^^^

So in summary I would say:

- Calculating a CRC64 using 2 x 32 int can be 3 times as fast as
using 1 x 64 int on
my 32-bit Intel laptop with gcc.

- The time difference between CRC32 and CRC64 is about 0.5s in the
worse case
shown during testing, so staying with CRC64 would not inflict too
great a penalty.

Kind regards,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Greg Stark'" <gsstark(at)mit(dot)edu>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-27 14:00:27
Message-ID: 14334.1117202427@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> Opt 32 32a 32b 2x32 64 64a 64b
> --------------------------------------------------------
> O1 4.91 4.86 5.43 6.00 11.4 11.39 11.39
> O2 4.96 4.94 4.69 5.18 15.86 18.75 24.73
> O3 4.82 4.83 4.64 5.18 15.14 13.77 14.73

Not sure I believe these numbers. Shouldn't 2x32 be about twice as slow
as just one 32-bit CRC?

regards, tom lane


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Greg Stark'" <gsstark(at)mit(dot)edu>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-27 14:20:04
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113369@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: 27 May 2005 15:00
> To: Mark Cave-Ayland (External)
> Cc: 'Manfred Koizar'; 'Greg Stark'; 'Bruce Momjian';
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations
>
>
> "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> > Opt 32 32a 32b 2x32 64 64a 64b
> > --------------------------------------------------------
> > O1 4.91 4.86 5.43 6.00 11.4 11.39 11.39
> > O2 4.96 4.94 4.69 5.18 15.86 18.75 24.73
> > O3 4.82 4.83 4.64 5.18 15.14 13.77 14.73
>
> Not sure I believe these numbers. Shouldn't 2x32 be about
> twice as slow as just one 32-bit CRC?

Well it surprised me, although Manfred's results with VC6 on /MaxSpeed show
a similar margin. The real killer has to be that I wrote a CRC32 routine in
x86 inline assembler (which in comparison to the gcc-produced version stores
the CRC for each iteration in registers instead of in memory as part of the
current frame) which comes in at 6.5s....

Kind regards,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Greg Stark'" <gsstark(at)mit(dot)edu>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-27 15:01:00
Message-ID: 9EB50F1A91413F4FA63019487FCD251D11336B@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: 27 May 2005 15:00
> To: Mark Cave-Ayland (External)
> Cc: 'Manfred Koizar'; 'Greg Stark'; 'Bruce Momjian';
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations

(cut)

> Not sure I believe these numbers. Shouldn't 2x32 be about twice as
> slow as just one 32-bit CRC?

Also I've just quickly tested on the Xeon Linux FC1 box I used with my
original program using Manfred's program and the margin is even closer:

Opt 32 32a 32b 2x32 64 64a 64b
------------------------------------------------------
O1 2.75 2.81 2.71 3.16 3.53 3.64 7.25
O2 2.75 2.78 2.87 2.94 7.63 10.61 11.93
O3 2.84 2.85 3.03 2.99 7.63 7.64 7.71

I don't know whether gcc is just producing an inefficient CRC32 compared to
2x32 but the results seem very odd.... There must be something else we are
missing?

Kind regards,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Greg Stark'" <gsstark(at)mit(dot)edu>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-27 16:49:27
Message-ID: 15786.1117212567@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> I don't know whether gcc is just producing an inefficient CRC32 compared to
> 2x32 but the results seem very odd.... There must be something else we are
> missing?

I went back and looked at the code, and see that I was misled by
terminology: what we've been calling "2x32" in this thread is not two
independent CRC32 calculations, it is use of 32-bit arithmetic to execute
one CRC64 calculation. The inner loop looks like

while (__len-- > 0)
{
int __tab_index = ((int) (__crc1 >> 24) ^ *__data++) & 0xFF;

__crc1 = crc_table1[__tab_index] ^ ((__crc1 << 8) | (__crc0 >> 24));
__crc0 = crc_table0[__tab_index] ^ (__crc0 << 8);
}

whereas a plain CRC32 looks like

while (__len-- > 0)
{
int __tab_index = ((int) (crc >> 24) ^ *__data++) & 0xFF;

crc = crc_table[__tab_index] ^ (crc << 8);
}

where the crc variables are uint32 in both cases. (The true 64-bit
calculation looks like the latter, except that the crc variable is
uint64, as is the crc_table, and the >> 24 becomes >> 56. The "2x32"
code is an exact emulation of the true 64-bit code, with __crc1 and
__crc0 holding the high and low halves of the 64-bit crc.)

In my tests the second loop is about 10% faster than the first on an
Intel machine, and maybe 20% faster on HPPA. So evidently the bulk of
the cost is in the __tab_index calculation, and not so much in the table
fetches. This is still a bit surprising, but it's not totally silly.

Based on the numbers we've seen so far, one could argue for staying
with the 64-bit CRC, but changing the rule we use for selecting which
implementation code to use: use the true 64-bit code only when
sizeof(unsigned long) == 64, and otherwise use the 2x32 code, even if
there is a 64-bit unsigned long long type available. This essentially
assumes that the unsigned long long type isn't very efficient, which
isn't too unreasonable. This would buy most of the speedup without
giving up anything at all in the error-detection department.

Alternatively, we might say that 64-bit CRC was overkill from day one,
and we'd rather get the additional 10% or 20% or so speedup. I'm kinda
leaning in that direction, but only weakly.

Comments?

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Greg Stark'" <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-27 20:37:59
Message-ID: 200505272037.j4RKbxq04977@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Alternatively, we might say that 64-bit CRC was overkill from day one,
> and we'd rather get the additional 10% or 20% or so speedup. I'm kinda
> leaning in that direction, but only weakly.

Yes, I lean in that direction too since the CRC calculation is showing
up in our profiling.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Greg Stark'" <gsstark(at)mit(dot)edu>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-31 11:07:53
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113371@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: 27 May 2005 17:49
> To: Mark Cave-Ayland (External)
> Cc: 'Manfred Koizar'; 'Greg Stark'; 'Bruce Momjian';
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations

(cut)

> I went back and looked at the code, and see that I was misled by
> terminology: what we've been calling "2x32" in this thread is
> not two independent CRC32 calculations, it is use of 32-bit
> arithmetic to execute one CRC64 calculation.

Yeah, I did find the terminology a little confusing until I looked at the
source itself. It doesn't make much sense publishing numbers if you don't
know their meaning ;)

> Based on the numbers we've seen so far, one could argue for
> staying with the 64-bit CRC, but changing the rule we use for
> selecting which implementation code to use: use the true
> 64-bit code only when sizeof(unsigned long) == 64, and
> otherwise use the 2x32 code, even if there is a 64-bit
> unsigned long long type available. This essentially assumes
> that the unsigned long long type isn't very efficient, which
> isn't too unreasonable. This would buy most of the speedup
> without giving up anything at all in the error-detection department.

All our servers are x86 based Linux with gcc, so if a factor of 2 speedup
for CPU calculations is the minimum improvement that we get as a result of
this thread then I would be very happy.

> Alternatively, we might say that 64-bit CRC was overkill from
> day one, and we'd rather get the additional 10% or 20% or so
> speedup. I'm kinda leaning in that direction, but only weakly.

What would you need to persuade you either way? I believe that disk drives
use CRCs internally to verify that the data has been read correctly from
each sector. If the majority of the errors would be from a disk failure,
then a corrupt sector would have to pass the drive CRC *and* the PostgreSQL
CRC in order for an XLog entry to be considered valid. I would have thought
the chances of this being able to happen would be reasonably small and so
even with CRC32 this can be detected fairly accurately.

In the case of an OS crash then we could argue that there may be a partially
written sector to the disk, in which case again either one or both of the
drive CRC and the PostgreSQL CRC would be incorrect and so this condition
could also be reasonably detected using CRC32.

As far as I can tell, the main impact of this would be that we would reduce
the ability to accurately detect multiple random bit errors, which is more
the type of error I would expect to occur in RAM (alpha particles etc.). How
often would this be likely to occur? I believe that different generator
polynomials have different characteristics that can make them more sensitive
to a particular type of error. Perhaps Manfred can tell us the generator
polynomial that was used to create the lookup tables?

Kind regards,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Greg Stark'" <gsstark(at)mit(dot)edu>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-31 14:24:16
Message-ID: 11843.1117549456@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
>> Alternatively, we might say that 64-bit CRC was overkill from
>> day one, and we'd rather get the additional 10% or 20% or so
>> speedup. I'm kinda leaning in that direction, but only weakly.

> What would you need to persuade you either way? I believe that disk drives
> use CRCs internally to verify that the data has been read correctly from
> each sector. If the majority of the errors would be from a disk failure,
> then a corrupt sector would have to pass the drive CRC *and* the PostgreSQL
> CRC in order for an XLog entry to be considered valid. I would have thought
> the chances of this being able to happen would be reasonably small and so
> even with CRC32 this can be detected fairly accurately.

It's not really a matter of backstopping the hardware's error detection;
if we were trying to do that, we'd keep a CRC for every data page, which
we don't. The real reason for the WAL CRCs is as a reliable method of
identifying the end of WAL: when the "next record" doesn't checksum you
know it's bogus. This is a nontrivial point because of the way that we
re-use WAL files --- the pages beyond the last successfully written page
aren't going to be zeroes, they'll be filled with random WAL data.

Personally I think CRC32 is plenty for this job, but there were those
arguing loudly for CRC64 back when we made the decision originally ...

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Greg Stark'" <gsstark(at)mit(dot)edu>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-31 14:53:08
Message-ID: 878y1vwip7.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> It's not really a matter of backstopping the hardware's error detection;
> if we were trying to do that, we'd keep a CRC for every data page, which
> we don't. The real reason for the WAL CRCs is as a reliable method of
> identifying the end of WAL: when the "next record" doesn't checksum you
> know it's bogus. This is a nontrivial point because of the way that we
> re-use WAL files --- the pages beyond the last successfully written page
> aren't going to be zeroes, they'll be filled with random WAL data.

Is the random WAL data really the concern? It seems like a more reliable way
of dealing with that would be to just accompany every WAL entry with a
sequential id and stop when the next id isn't the correct one.

I thought the problem was that if the machine crashed in the middle of writing
a WAL entry you wanted to be sure to detect that. And there's no guarantee the
fsync will write out the WAL entry in order. So it's possible the end (and
beginning) of the WAL entry will be there but the middle still have been
unwritten.

The only truly reliable way to handle this would require two fsyncs per
transaction commit which would be really unfortunate.

> Personally I think CRC32 is plenty for this job, but there were those
> arguing loudly for CRC64 back when we made the decision originally ...

So given the frequency of database crashes and WAL replays if having one
failed replay every few million years is acceptable I think 32 bits is more
than enough. Frankly I think 16 bits would be enough.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-31 15:19:02
Message-ID: 12480.1117552742@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> It's not really a matter of backstopping the hardware's error detection;
>> if we were trying to do that, we'd keep a CRC for every data page, which
>> we don't. The real reason for the WAL CRCs is as a reliable method of
>> identifying the end of WAL: when the "next record" doesn't checksum you
>> know it's bogus.

> Is the random WAL data really the concern? It seems like a more reliable way
> of dealing with that would be to just accompany every WAL entry with a
> sequential id and stop when the next id isn't the correct one.

We do that, too (the xl_prev links and page header addresses serve that
purpose). But it's not sufficient given that WAL records can span pages
and therefore may be incompletely written.

> The only truly reliable way to handle this would require two fsyncs per
> transaction commit which would be really unfortunate.

How are two fsyncs going to be better than one?

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-31 16:02:12
Message-ID: 873bs3wfi3.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> > Is the random WAL data really the concern? It seems like a more reliable way
> > of dealing with that would be to just accompany every WAL entry with a
> > sequential id and stop when the next id isn't the correct one.
>
> We do that, too (the xl_prev links and page header addresses serve that
> purpose). But it's not sufficient given that WAL records can span pages
> and therefore may be incompletely written.

Right, so the problem isn't that there may be stale data that would be
unrecognizable from real data. The problem is that the real data may be
partially there but not all there.

> > The only truly reliable way to handle this would require two fsyncs per
> > transaction commit which would be really unfortunate.
>
> How are two fsyncs going to be better than one?

Well you fsync the WAL entry and only when that's complete do you flip a bit
marking the WAL entry as commited and fsync again.

Hm, you might need three fsyncs, one to make sure the bit isn't set before
writing out the WAL record itself.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-31 16:27:18
Message-ID: 13044.1117556838@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>>> Is the random WAL data really the concern? It seems like a more reliable way
>>> of dealing with that would be to just accompany every WAL entry with a
>>> sequential id and stop when the next id isn't the correct one.
>>
>> We do that, too (the xl_prev links and page header addresses serve that
>> purpose). But it's not sufficient given that WAL records can span pages
>> and therefore may be incompletely written.

Actually, on reviewing the code I notice two missed bets here.

1. During WAL replay, we aren't actually verifying that xl_prev matches
the address of the prior WAL record. This means we are depending only
on the page header addresses to make sure we don't replay stale WAL data
left over from the previous cycle of use of the physical WAL file. This
is fairly dangerous, considering the likelihood of partial write of a
WAL page during a power failure: the first 512-byte sector(s) of a page
may have been updated but not the rest. If an old WAL record happens to
start at exactly the sector boundary then we lose.

2. We store a separate CRC for each backup block attached to a WAL
record. Therefore the same torn-page problem could hit us if a stale
backup block starts exactly at a intrapage sector boundary --- there is
nothing guaranteeing that the backup block really goes with the WAL
record.

#1 seems like a pretty critical, but also easily fixed, bug. To fix #2
I suppose we'd have to modify the WAL format to store just one CRC
covering the whole of a WAL record and its attached backup block(s).

I think the reasoning behind the separate CRCs was to put a limit on
the amount of data guarded by one CRC, and thereby minimize the risk
of undetected errors. But using the CRCs in this way is failing to
guard against exactly the problem that we want the CRCs to guard against
in the first place, namely torn WAL records ... so worrying about
detection reliability seems misplaced.

The odds of an actual failure from case #2 are fortunately not high,
since a backup block will necessarily span across at least one WAL page
boundary and so we should be able to detect stale data by noting that
the next page's header address is wrong. (If it's not wrong, then at
least the first sector of the next page is up-to-date, so if there is
any tearing the CRC should tell us.) Therefore I don't feel any need
to try to work out a back-patchable solution for #2. But I do think we
ought to change the WAL format going forward to compute just one CRC
across a WAL record and all attached backup blocks. There was talk of
allowing compression of backup blocks, and if we do that we could no
longer feel any certainty that a page crossing would occur.

Thoughts?

regards, tom lane


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Greg Stark'" <gsstark(at)mit(dot)edu>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-31 17:58:35
Message-ID: sg8p91hdhbpqtipc0i8c0ssgs9km10bbkn@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 31 May 2005 12:07:53 +0100, "Mark Cave-Ayland"
<m(dot)cave-ayland(at)webbased(dot)co(dot)uk> wrote:
>Perhaps Manfred can tell us the generator
>polynomial that was used to create the lookup tables?

32 26 23 22 16 12 11 10 8 7 5 4 2 1
X + X + X + X + X + X + X + X + X + X + X + X + X + X + 1

-> http://www.opengroup.org/onlinepubs/009695399/utilities/cksum.html

Or google for "04c11db7".

Servus
Manfred


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, 'Manfred Koizar' <mkoi-pg(at)aon(dot)at>, 'Bruce Momjian' <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-31 23:19:12
Message-ID: 1117581552.3844.797.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2005-05-31 at 12:27 -0400, Tom Lane wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> >>> Is the random WAL data really the concern? It seems like a more reliable way
> >>> of dealing with that would be to just accompany every WAL entry with a
> >>> sequential id and stop when the next id isn't the correct one.
> >>
> >> We do that, too (the xl_prev links and page header addresses serve that
> >> purpose). But it's not sufficient given that WAL records can span pages
> >> and therefore may be incompletely written.
>
> Actually, on reviewing the code I notice two missed bets here.
>
> 1. During WAL replay, we aren't actually verifying that xl_prev matches
> the address of the prior WAL record. This means we are depending only
> on the page header addresses to make sure we don't replay stale WAL data
> left over from the previous cycle of use of the physical WAL file. This
> is fairly dangerous, considering the likelihood of partial write of a
> WAL page during a power failure: the first 512-byte sector(s) of a page
> may have been updated but not the rest. If an old WAL record happens to
> start at exactly the sector boundary then we lose.

Hmmm. I seem to recall asking myself why xl_prev existed if it wasn't
used, but passed that by. Damn.

> 2. We store a separate CRC for each backup block attached to a WAL
> record. Therefore the same torn-page problem could hit us if a stale
> backup block starts exactly at a intrapage sector boundary --- there is
> nothing guaranteeing that the backup block really goes with the WAL
> record.
>
> #1 seems like a pretty critical, but also easily fixed, bug. To fix #2
> I suppose we'd have to modify the WAL format to store just one CRC
> covering the whole of a WAL record and its attached backup block(s).
>
> I think the reasoning behind the separate CRCs was to put a limit on
> the amount of data guarded by one CRC, and thereby minimize the risk
> of undetected errors. But using the CRCs in this way is failing to
> guard against exactly the problem that we want the CRCs to guard against
> in the first place, namely torn WAL records ... so worrying about
> detection reliability seems misplaced.
>
> The odds of an actual failure from case #2 are fortunately not high,
> since a backup block will necessarily span across at least one WAL page
> boundary and so we should be able to detect stale data by noting that
> the next page's header address is wrong. (If it's not wrong, then at
> least the first sector of the next page is up-to-date, so if there is
> any tearing the CRC should tell us.) Therefore I don't feel any need
> to try to work out a back-patchable solution for #2. But I do think we
> ought to change the WAL format going forward to compute just one CRC
> across a WAL record and all attached backup blocks. There was talk of
> allowing compression of backup blocks, and if we do that we could no
> longer feel any certainty that a page crossing would occur.
>
> Thoughts?

PreAllocXLog was already a reason to have somebody prepare new xlog
files ahead of them being used. Surely the right solution here is to
have that agent prepare fresh/zeroed files prior to them being required.
That way no stale data can ever occur and both of these bugs go
away....

Fixing that can be backpatched so that the backend that switches files
can do the work, rather than bgwriter [ or ?].

Best Regards, Simon Riggs


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-06-01 02:36:29
Message-ID: 9291.1117593389@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> Hmmm. I seem to recall asking myself why xl_prev existed if it wasn't
> used, but passed that by. Damn.

I couldn't believe it'd been overlooked this long, either. It's the
sort of thing that you assume got done the first time :-(

> PreAllocXLog was already a reason to have somebody prepare new xlog
> files ahead of them being used. Surely the right solution here is to
> have that agent prepare fresh/zeroed files prior to them being required.

Uh, why? That doubles the amount of physical I/O required to maintain
the WAL, and AFAICS it doesn't really add any safety that we can't get
in a more intelligent fashion.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, 'Manfred Koizar' <mkoi-pg(at)aon(dot)at>, 'Bruce Momjian' <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-06-01 08:50:09
Message-ID: 1117615809.3844.893.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2005-05-31 at 22:36 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Hmmm. I seem to recall asking myself why xl_prev existed if it wasn't
> > used, but passed that by. Damn.
>
> I couldn't believe it'd been overlooked this long, either. It's the
> sort of thing that you assume got done the first time :-(

Guess it shows how infrequently PostgreSQL crashes and recovers.

> > PreAllocXLog was already a reason to have somebody prepare new xlog
> > files ahead of them being used. Surely the right solution here is to
> > have that agent prepare fresh/zeroed files prior to them being required.
>
> Uh, why? That doubles the amount of physical I/O required to maintain
> the WAL, and AFAICS it doesn't really add any safety that we can't get
> in a more intelligent fashion.

OK, I agree that the xl_prev linkage is the more intelligent way to go.

If I/O is a problem, then surely you will agree that PreAllocXLog is
still required and should not be run by a backend? Thats going to show
as a big response time spike for that user.

Thats the last bastion - the other changes are gonna smooth response
times right down, so can we do something with PreAllocXLog too?

Best Regards, Simon Riggs


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Greg Stark'" <gsstark(at)mit(dot)edu>
Cc: "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-06-01 09:27:41
Message-ID: 9EB50F1A91413F4FA63019487FCD251D11337C@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: 31 May 2005 17:27
> To: Greg Stark
> Cc: Mark Cave-Ayland (External); 'Manfred Koizar'; 'Bruce
> Momjian'; pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations

(cut)

> The odds of an actual failure from case #2 are fortunately
> not high, since a backup block will necessarily span across
> at least one WAL page boundary and so we should be able to
> detect stale data by noting that the next page's header
> address is wrong. (If it's not wrong, then at least the
> first sector of the next page is up-to-date, so if there is
> any tearing the CRC should tell us.) Therefore I don't feel
> any need to try to work out a back-patchable solution for #2.
> But I do think we ought to change the WAL format going
> forward to compute just one CRC across a WAL record and all
> attached backup blocks. There was talk of allowing
> compression of backup blocks, and if we do that we could no
> longer feel any certainty that a page crossing would occur.

I must admit I didn't realise that an XLog record consisted of a number of
backup blocks which were also separately CRCd. I've been through the source,
and while the XLog code is reasonably well commented, I couldn't find a
README in the transam/ directory that explained the thinking behind the
current implementation - is this something that was discussed on the mailing
lists way back in the mists of time?

I'm still a little nervous about dropping down to CRC32 from CRC64 and so
was just wondering what the net saving would be using one CRC64 across the
whole WAL record? For example, if an insert or an update uses 3 backup
blocks then this one change alone would immediately reduce the CPU usage to
one third of its original value? (something tells me that this is probably
not the case as I imagine you would have picked this up a while back). In my
view, having a longer CRC is like buying a holiday with insurance - you pay
the extra cost knowing that should anything happen then you have something
to fall back on. However, the hard part here is determining a reasonable
balance betweem the cost and the risk.

Kind regards,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Mark Cave-Ayland <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-06-01 14:05:13
Message-ID: 13490.1117634713@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> If I/O is a problem, then surely you will agree that PreAllocXLog is
> still required and should not be run by a backend?

It is still required, but it isn't run by backends --- it's fired off
during checkpoints. I think there was some discussion recently about
making it more aggressive about allocating future segments; which
strikes me as a good idea.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: "'Greg Stark'" <gsstark(at)mit(dot)edu>, "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-06-01 14:07:36
Message-ID: 13524.1117634856@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> I'm still a little nervous about dropping down to CRC32 from CRC64 and so
> was just wondering what the net saving would be using one CRC64 across the
> whole WAL record?

None to speak of; the startup/teardown time is trivial. It's the
per-byte cost that hurts.

regards, tom lane