Uh, this is *not* a 64-bit CRC ...

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Uh, this is *not* a 64-bit CRC ...
Date: 2001-02-28 21:53:09
Message-ID: 19345.983397189@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I just took a close look at the COMP_CRC64 macro in xlog.c.

This isn't a 64-bit CRC. It's two independent 32-bit CRCs, one done
on just the odd-numbered bytes and one on just the even-numbered bytes
of the datastream. That's hardly any stronger than a single 32-bit CRC;
it's certainly not what I thought we had agreed to implement.

We can't change this algorithm without forcing an initdb, which would be
a rather unpleasant thing to do at this late stage of the release cycle.
But I'm not happy with it. Comments?

regards, tom lane


From: ncm(at)zembu(dot)com (Nathan Myers)
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Uh, this is *not* a 64-bit CRC ...
Date: 2001-02-28 22:42:58
Message-ID: 20010228144258.R624@store.zembu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote:
> I just took a close look at the COMP_CRC64 macro in xlog.c.
>
> This isn't a 64-bit CRC. It's two independent 32-bit CRCs, one done
> on just the odd-numbered bytes and one on just the even-numbered bytes
> of the datastream. That's hardly any stronger than a single 32-bit CRC;
> it's certainly not what I thought we had agreed to implement.
>
> We can't change this algorithm without forcing an initdb, which would be
> a rather unpleasant thing to do at this late stage of the release cycle.
> But I'm not happy with it. Comments?

This might be a good time to update:

The CRC-64 code used in the SWISS-PROT genetic database is (now) at:

ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz

From the README:

The code in this package has been derived from the BTLib package
obtained from Christian Iseli <chris(at)ludwig-alpha(dot)unil(dot)ch>.
From his mail:

The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
B. P. Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
Press. Pages 896ff.

The generator polynomial is x64 + x4 + x3 + x1 + 1.

I would suggest that if you don't change the algorithm, at least change
the name in the sources. Were you to #ifdef in a real crc-64, and make
a compile-time option to select the old one, you could allow users who
wish to avoid the initdb a way to continue with the existing pair of
CRC-32s.

Nathan Myers
ncm(at)zembu(dot)com


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Uh, this is *not* a 64-bit CRC ...
Date: 2001-03-01 02:16:33
Message-ID: 200103010216.VAA26658@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I will just add a TODO item and we can hit it for 7.2.

> On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote:
> > I just took a close look at the COMP_CRC64 macro in xlog.c.
> >
> > This isn't a 64-bit CRC. It's two independent 32-bit CRCs, one done
> > on just the odd-numbered bytes and one on just the even-numbered bytes
> > of the datastream. That's hardly any stronger than a single 32-bit CRC;
> > it's certainly not what I thought we had agreed to implement.
> >
> > We can't change this algorithm without forcing an initdb, which would be
> > a rather unpleasant thing to do at this late stage of the release cycle.
> > But I'm not happy with it. Comments?
>
> This might be a good time to update:
>
> The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
>
> ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz
>
> From the README:
>
> The code in this package has been derived from the BTLib package
> obtained from Christian Iseli <chris(at)ludwig-alpha(dot)unil(dot)ch>.
> From his mail:
>
> The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
> B. P. Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
> Press. Pages 896ff.
>
> The generator polynomial is x64 + x4 + x3 + x1 + 1.
>
> I would suggest that if you don't change the algorithm, at least change
> the name in the sources. Were you to #ifdef in a real crc-64, and make
> a compile-time option to select the old one, you could allow users who
> wish to avoid the initdb a way to continue with the existing pair of
> CRC-32s.
>
> Nathan Myers
> ncm(at)zembu(dot)com
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Uh, this is *not* a 64-bit CRC ...
Date: 2001-03-01 02:17:19
Message-ID: 200103010217.VAA26763@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Added to TODO:

* Correct CRC WAL code to be normal CRC32 algorithm

> On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote:
> > I just took a close look at the COMP_CRC64 macro in xlog.c.
> >
> > This isn't a 64-bit CRC. It's two independent 32-bit CRCs, one done
> > on just the odd-numbered bytes and one on just the even-numbered bytes
> > of the datastream. That's hardly any stronger than a single 32-bit CRC;
> > it's certainly not what I thought we had agreed to implement.
> >
> > We can't change this algorithm without forcing an initdb, which would be
> > a rather unpleasant thing to do at this late stage of the release cycle.
> > But I'm not happy with it. Comments?
>
> This might be a good time to update:
>
> The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
>
> ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz
>
> From the README:
>
> The code in this package has been derived from the BTLib package
> obtained from Christian Iseli <chris(at)ludwig-alpha(dot)unil(dot)ch>.
> From his mail:
>
> The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
> B. P. Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
> Press. Pages 896ff.
>
> The generator polynomial is x64 + x4 + x3 + x1 + 1.
>
> I would suggest that if you don't change the algorithm, at least change
> the name in the sources. Were you to #ifdef in a real crc-64, and make
> a compile-time option to select the old one, you could allow users who
> wish to avoid the initdb a way to continue with the existing pair of
> CRC-32s.
>
> Nathan Myers
> ncm(at)zembu(dot)com
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: ncm(at)zembu(dot)com (Nathan Myers)
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Uh, this is *not* a 64-bit CRC ...
Date: 2001-03-01 02:51:18
Message-ID: 20010228185118.C624@store.zembu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 28, 2001 at 09:17:19PM -0500, Bruce Momjian wrote:
> > On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote:
> > > I just took a close look at the COMP_CRC64 macro in xlog.c.
> > >
> > > This isn't a 64-bit CRC. It's two independent 32-bit CRCs, one done
> > > on just the odd-numbered bytes and one on just the even-numbered bytes
> > > of the datastream. That's hardly any stronger than a single 32-bit CRC;
> > > it's certainly not what I thought we had agreed to implement.
> > >
> > > We can't change this algorithm without forcing an initdb, which would be
> > > a rather unpleasant thing to do at this late stage of the release cycle.
> > > But I'm not happy with it. Comments?
> >
> > This might be a good time to update:
> >
> > The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
> >
> > ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz
> >
> > From the README:
> >
> > The code in this package has been derived from the BTLib package
> > obtained from Christian Iseli <chris(at)ludwig-alpha(dot)unil(dot)ch>.
> > From his mail:
> >
> > The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
> > B. P. Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
> > Press. Pages 896ff.
> >
> > The generator polynomial is x64 + x4 + x3 + x1 + 1.
> >
> > I would suggest that if you don't change the algorithm, at least change
> > the name in the sources. Were you to #ifdef in a real crc-64, and make
> > a compile-time option to select the old one, you could allow users who
> > wish to avoid the initdb a way to continue with the existing pair of
> > CRC-32s.
>
> Added to TODO:
>
> * Correct CRC WAL code to be normal CRC32 algorithm

Um, how about

* Correct CRC WAL code to be a real CRC64 algorithm

instead?

Nathan Myers
ncm(at)zembu(dot)com


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Uh, this is *not* a 64-bit CRC ...
Date: 2001-03-01 03:30:19
Message-ID: 200103010330.WAA00180@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > Added to TODO:
> >
> > * Correct CRC WAL code to be normal CRC32 algorithm
>
> Um, how about
>
> * Correct CRC WAL code to be a real CRC64 algorithm
>
> instead?

Done.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Uh, this is *not* a 64-bit CRC ...
Date: 2001-03-05 19:00:59
Message-ID: 20276.983818859@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

ncm(at)zembu(dot)com (Nathan Myers) writes:
> The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
> ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz

> From the README:

> The code in this package has been derived from the BTLib package
> obtained from Christian Iseli <chris(at)ludwig-alpha(dot)unil(dot)ch>.
> From his mail:

> The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
> B. P. Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
> Press. Pages 896ff.

> The generator polynomial is x64 + x4 + x3 + x1 + 1.

Nathan (or anyone else with a copy of "Numerical recipes in C", which
I'm embarrassed to admit I don't own), is there any indication in there
that anyone spent any effort on choosing that particular generator
polynomial? As far as I can see, it violates one of the standard
guidelines for choosing a polynomial, namely that it be a multiple of
(x + 1) ... which in modulo-2 land is equivalent to having an even
number of terms, which this ain't got. See Ross Williams'
A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS, available from
ftp://ftp.rocksoft.com/papers/crc_v3.txt among other places, which is
by far the most thorough and readable thing I've ever seen on CRCs.

I spent some time digging around the net for standard CRC64 polynomials,
and the only thing I could find that looked like it might have been
picked by someone who understood what they were doing is in the DLT
(digital linear tape) standard, ECMA-182 (available from
http://www.ecma.ch/ecma1/STAND/ECMA-182.HTM):

x^64 + x^62 + x^57 + x^55 + x^54 + x^53 + x^52 + x^47 + x^46 + x^45 +
x^40 + x^39 + x^38 + x^37 + x^35 + x^33 + x^32 + x^31 + x^29 + x^27 +
x^24 + x^23 + x^22 + x^21 + x^19 + x^17 + x^13 + x^12 + x^10 + x^9 +
x^7 + x^4 + x + 1

I'm probably going to go with this one unless someone can come up with
an authoritative source for another one.

regards, tom lane


From: ncm(at)zembu(dot)com (Nathan Myers)
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Uh, this is *not* a 64-bit CRC ...
Date: 2001-03-12 23:44:43
Message-ID: 20010312154443.P624@store.zembu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 05, 2001 at 02:00:59PM -0500, Tom Lane wrote:
> ncm(at)zembu(dot)com (Nathan Myers) writes:
> > The CRC-64 code used in the SWISS-PROT genetic database is (now) at:
> > ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz
>
> > From the README:
>
> > The code in this package has been derived from the BTLib package
> > obtained from Christian Iseli <chris(at)ludwig-alpha(dot)unil(dot)ch>.
> > From his mail:
>
> > The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and
> > B. P. Flannery, "Numerical recipes in C", 2nd ed., Cambridge University
> > Press. Pages 896ff.
>
> > The generator polynomial is x64 + x4 + x3 + x1 + 1.
>
> Nathan (or anyone else with a copy of "Numerical recipes in C", which
> I'm embarrassed to admit I don't own), is there any indication in there
> that anyone spent any effort on choosing that particular generator
> polynomial? As far as I can see, it violates one of the standard
> guidelines for choosing a polynomial, namely that it be a multiple of
> (x + 1) ... which in modulo-2 land is equivalent to having an even
> number of terms, which this ain't got. See Ross Williams'
> A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS, available from
> ftp://ftp.rocksoft.com/papers/crc_v3.txt among other places, which is
> by far the most thorough and readable thing I've ever seen on CRCs.
>
> I spent some time digging around the net for standard CRC64 polynomials,
> and the only thing I could find that looked like it might have been
> picked by someone who understood what they were doing is in the DLT
> (digital linear tape) standard, ECMA-182 (available from
> http://www.ecma.ch/ecma1/STAND/ECMA-182.HTM):
>
> x^64 + x^62 + x^57 + x^55 + x^54 + x^53 + x^52 + x^47 + x^46 + x^45 +
> x^40 + x^39 + x^38 + x^37 + x^35 + x^33 + x^32 + x^31 + x^29 + x^27 +
> x^24 + x^23 + x^22 + x^21 + x^19 + x^17 + x^13 + x^12 + x^10 + x^9 +
> x^7 + x^4 + x + 1

I'm sorry to have taken so long to reply.

The polynomial chosen for SWISS-PROT turns out to be presented, in
Numerical Recipes, just as an example of a primitive polynomial of
that degree; no assertion is made about its desirability for error
checking. It is (in turn) drawn from E. J. Watson, "Mathematics of
Computation", vol. 16, pp368-9.

Having (x + 1) as a factor guarantees to catch all errors in which
an odd number of bits have been changed. Presumably you are then
infinitesimally less likely to catch all errors in which an even
number of bits have been changed.

I would have posted the ECMA-182 polynomial if I had found it. (That
was good searching!) One hopes that the ECMA polynomial was chosen more
carefully than entirely at random. High-degree codes are often chosen
by Monte Carlo methods, by applying statistical tests to randomly-chosen
values, because the search space is so large.

I have verified that Tom transcribed the polynomial correctly from
the PDF image. The ECMA document doesn't say whether their polynomial
is applied "bit-reversed", but the check would be equally strong either
way.

Nathan Myers
ncm(at)zembu(dot)com