Re: Enabling Checksums

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enabling Checksums
Date: 2013-03-04 20:27:44
Message-ID: 513503C0.3030809@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04.03.2013 18:00, Jeff Davis wrote:
> On Mon, 2013-03-04 at 10:36 +0200, Heikki Linnakangas wrote:
>> On 04.03.2013 09:11, Simon Riggs wrote:
>>> Are there objectors?
>>
>> FWIW, I still think that checksumming belongs in the filesystem, not
>> PostgreSQL.
>
> Doing checksums in the filesystem has some downsides. One is that you
> need to use a copy-on-write filesystem like btrfs or zfs, which (by
> design) will fragment the heap on random writes.

Yeah, fragmentation will certainly hurt some workloads. But how badly,
and which workloads, and how does that compare with the work that
PostgreSQL has to do to maintain the checksums? I'd like to see some
data on those things.

> There are also other issues, like what fraction of our users can freely
> move to btrfs, and when. If it doesn't happen to be already there, you
> need root to get it there, which has never been a requirement before.

If you're serious enough about your data that you want checksums, you
should be able to choose your filesystem.

>> If you go ahead with this anyway, at the very least I'd like
>> to see some sort of a comparison with e.g btrfs. How do performance,
>> error-detection rate, and behavior on error compare? Any other metrics
>> that are relevant here?
>
> I suspect it will be hard to get an apples-to-apples comparison here
> because of the heap fragmentation, which means that a sequential scan is
> not so sequential. That may be acceptable for some workloads but not for
> others, so it would get tricky to compare.

An apples-to-apples comparison is to run the benchmark and see what
happens. If it gets fragmented as hell on btrfs, and performance tanks
because of that, then that's your result. If avoiding fragmentation is
critical to the workload, then with btrfs you'll want to run the
defragmenter in the background to keep it in order, and factor that into
the test case.

I realize that performance testing is laborious. But we can't skip it
and assume that the patch performs fine, because it's hard to benchmark.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2013-03-04 20:40:00 Re: Enabling Checksums
Previous Message Magnus Hagander 2013-03-04 20:27:08 Re: Bug in tm2timestamp