Re: [REVIEW] Re: Compression of full-page-writes

From: Arthur Silva <arthurprs(at)gmail(dot)com>
To: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Date: 2014-09-14 01:42:39
Message-ID: CAO_YK0VWivUyh-FmDA4D=d3V27LoW+m5SeAt5Jqh8Bu3wFavcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 13, 2014 at 10:27 PM, ktm(at)rice(dot)edu <ktm(at)rice(dot)edu> wrote:

> On Sat, Sep 13, 2014 at 09:50:55PM -0300, Arthur Silva wrote:
> > On Sat, Sep 13, 2014 at 1:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > > Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > > > On 2014-09-13 08:52:33 +0300, Ants Aasma wrote:
> > > >> On Sat, Sep 13, 2014 at 6:59 AM, Arthur Silva <arthurprs(at)gmail(dot)com>
> > > wrote:
> > > >>> That's not entirely true. CRC-32C beats pretty much everything with
> > > the same
> > > >>> length quality-wise and has both hardware implementations and
> highly
> > > >>> optimized software versions.
> > >
> > > >> For better or for worse CRC is biased by detecting all single bit
> > > >> errors, the detection capability of larger errors is slightly
> > > >> diminished. The quality of the other algorithms I mentioned is also
> > > >> very good, while producing uniformly varying output.
> > >
> > > > There's also much more literature about the various CRCs in
> comparison
> > > > to some of these hash allgorithms.
> > >
> > > Indeed. CRCs have well-understood properties for error detection.
> > > Have any of these new algorithms been analyzed even a hundredth as
> > > thoroughly? No. I'm unimpressed by evidence-free claims that
> > > something else is "also very good".
> > >
> > > Now, CRCs are designed for detecting the sorts of short burst errors
> > > that are (or were, back in the day) common on phone lines. You could
> > > certainly make an argument that that's not the type of threat we face
> > > for PG data. However, I've not seen anyone actually make such an
> > > argument, let alone demonstrate that some other algorithm would be
> better.
> > > To start with, you'd need to explain precisely what other error pattern
> > > is more important to defend against, and why.
> > >
> > > regards, tom lane
> > >
> >
> > Mysql went this way as well, changing the CRC polynomial in 5.6.
> >
> > What we are looking for here is uniqueness thus better error detection.
> Not
> > avalanche effect, nor cryptographically secure, nor bit distribution.
> > As far as I'm aware CRC32C is unbeaten collision wise and time proven.
> >
> > I couldn't find tests with xxhash and crc32 on the same hardware so I
> spent
> > some time putting together a benchmark (see attachment, to run it just
> > start run.sh)
> >
> > I included a crc32 implementation using ssr4.2 instructions (which works
> on
> > pretty much any Intel processor built after 2008 and AMD built after
> 2012),
> > a portable Slice-By-8 software implementation and xxhash since it's the
> > fastest software 32bit hash I know of.
> >
> > Here're the results running the test program on my i5-4200M
> >
> > crc sb8: 90444623
> > elapsed: 0.513688s
> > speed: 1.485220 GB/s
> >
> > crc hw: 90444623
> > elapsed: 0.048327s
> > speed: 15.786877 GB/s
> >
> > xxhash: 7f4a8d5
> > elapsed: 0.182100s
> > speed: 4.189663 GB/s
> >
> > The hardware version is insanely and works on the majority of Postgres
> > setups and the fallback software implementations is 2.8x slower than the
> > fastest 32bit hash around.
> >
> > Hopefully it'll be useful in the discussion.
>
> Thank you for running this sample benchmark. It definitely shows that the
> hardware version of the CRC is very fast, unfortunately it is really only
> available on x64 Intel/AMD processors which leaves all the rest lacking.
> For current 64-bit hardware, it might be instructive to also try using
> the XXH64 version and just take one half of the hash. It should come in
> at around 8.5 GB/s, or very nearly the speed of the hardware accelerated
> CRC. Also, while I understand that CRC has a very venerable history and
> is well studied for transmission type errors, I have been unable to find
> any research on its applicability to validating file/block writes to a
> disk drive. While it is to quote you "unbeaten collision wise", xxhash,
> both the 32-bit and 64-bit version are its equal. Since there seems to
> be a lack of research on disk based error detection versus CRC polynomials,
> it seems likely that any of the proposed hash functions are on an equal
> footing in this regard. As Andres commented up-thread, xxhash comes along
> for "free" with lz4.
>
> Regards,
> Ken
>

For the sake of completeness the results for xxhash64 in my machine

xxhash64
speed: 7.365398 GB/s

Which is indeed very fast.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2014-09-14 02:19:48 Re: [REVIEW] Re: Compression of full-page-writes
Previous Message ktm@rice.edu 2014-09-14 01:27:51 Re: [REVIEW] Re: Compression of full-page-writes