Re: Enabling Checksums

From: Ants Aasma <ants(at)cybertec(dot)at>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Pflug <fgp(at)phlo(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-04-22 16:25:14
Message-ID: CA+CSw_vTfKNT+6zVM8-ukWXefk32-yPx0UvorQS3dMVsQ_GmBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 22, 2013 at 6:27 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Apr 17, 2013 at 8:21 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>>> The more I read of this thread, the more unhappy I get. It appears that
>>> the entire design process is being driven by micro-optimization for CPUs
>>> being built by Intel in 2013.
>>
>> And that's not going to get anyone past review, since all the tests I've
>> been doing the last two weeks are on how fast an AMD Opteron 6234 with OS
>> cache >> shared_buffers can run this. The main thing I'm still worried
>> about is what happens when you have a fast machine that can move memory
>> around very quickly and an in-memory workload, but it's hamstrung by the
>> checksum computation--and it's not a 2013 Intel machine.
>
> This is a good point. However, I don't completely agree with the
> conclusion that we shouldn't be worrying about any of this right now.
> While I agree with Tom that it's far too late to think about any
> CPU-specific optimizations for 9.3, I have a lot of concern, based on
> Ants's numbers, that we've picked a checksum algorithm which is hard
> to optimize for performance. If we don't get that fixed for 9.3,
> we're potentially looking at inflicting many years of serious
> suffering on our user base. If we at least get the *algorithm* right
> now, we can worry about optimizing it later. If we get it wrong,
> we'll be living with the consequence of that for a really long time.

I was just now writing up a generic C based patch based on the
parallel FNV-1a + shift that we discussed with Florian with an added
round of mixing. Testing the performance in isolation indicates that:
1) it is about an order of magnitude faster than the Sarwate CRC
method used in Postgresql.
2) it is about 2x faster than fastest software based CRC method.
3) by using -msse4.1 -funroll-loops -ftree-vectorize compilation
options the performance improves 5x. (within 20% of handcoded ASM)

This leaves lingering doubts about the quality of the checksum. It's
hard if not impossible to prove absence of interesting patterns that
would trigger collisions. I do know the checksum quality is miles
ahead of the Fletcher sum originally proposed and during the last week
I haven't been able to think of a way to make the collision rate
significantly differ from CRC.

> I wish that we had not scheduled beta quite so soon, as I am sure
> there will be even more resistance to changing this after beta. But
> I'm having a hard time escaping the conclusion that we're on the edge
> of shipping something we will later regret quite deeply. Maybe I'm
> wrong?

Its unfortunate that this got delayed by so long. The performance side
of the argument was clear a month ago.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ants Aasma 2013-04-22 16:36:36 Re: Enabling Checksums
Previous Message Robert Haas 2013-04-22 16:04:41 Re: event trigger API documentation?