Re: Checksums, state of play

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checksums, state of play
Date: 2012-03-07 20:09:50
Message-ID: CA+U5nMKzZ+YpDvCtT2CitF_8Lzh3SBfON0oPPnTXr=AX2i8JcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 7, 2012 at 5:28 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Mar 6, 2012 at 2:27 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> The feature is no where near complete, and we should not be designing
>> features at this stage.
>
> I agree, on both counts.  Although Simon did a good job pulling
> together something that basically works in a short amount of time, the
> edge cases still need a lot more thought, and work.  Yesterday's
> discussion was mostly about turning the feature on and off, which
> certainly seems to be the most significant problem with the patch as
> it stands.  But there are also a number of other things that have been
> discussed and not fully resolved, such as the performance impact of
> WAL-logging hint bit changes,

I think saying "there are some performance doubts" doesn't actually
make it so. We know it will impact performance, and how. I don't see
any big mystery. It's not pretty, but thats what it is. Prepared
transactions are slow, but they're still there.

> the exact way we're going to sand which
> this into the page header,

I'm OK with either pd_tli or pg_pagesize_version. We just need 16
contiguous bytes.

> and the right way to handle the necessary
> buffer locking.

That is resolved, AFAIK

> Simon seems to be proposing that, in lieu of spending too much more
> time fixing this, we just commit it and document the known
> limitations.  I don't agree with that.

Neither do I. It's pretty clear from our last discussion that the
"fix" proposed doesn't actually work fully so I don't think its going
to be either more robust or more certain to give low false positives.
So I don't think more time "fixing" this will actually improve the
situation. I'm not suggesting I skip any work, I think the extra work
is pointless.

I do understand the issue of risk that exists, I just think there's
other ways of ameliorating that other than heaping more software onto
the problem.

>  In particular, I think the
> idea of committing a checksum patch that can produce false positives
> in the event of a torn page situation is a really bad idea.

Again, that isn't a correct description of the issue, so yes, if that
were the case I would agree it would be a really bad idea.

If we keep misdescribing risk situations of course people will sense
high risk and want to hold back any such patch. So we need to be
balanced and accurate about the risks. Since people have been pretty
negative about this patch for some time I'm not really surprised they
now feel its a high risk decision to accept it. If I knew nothing, I
would think that also based upon what has been said.

> The whole
> point of the patch is to distinguish between hardware failure and
> software failure; if we can't reliably do that, I don't see this as
> being much of an advance over the status quo.  I think we're going to
> find that the cost of WAL-logging hints is bad enough that people are
> only going to do it when they already suspect a problem and want
> confirmation.  If they can't rely on that confirmation being real, as
> opposed to an outgrowth of a known limitation of the feature, I don't
> see the point.  I'd much rather see this feature wait for 9.3 than
> ship something that's unreliable in this regard.
>
> So I think it's time to push this one out to 9.3.

I accept this and was not expecting to commit anything now.

I think this decision was actually made quite some time previously and
reviewing these points again is just a further waste of time at this
point.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-03-07 20:14:38 Re: elegant and effective way for running jobs inside a database
Previous Message Tomas Vondra 2012-03-07 20:08:41 Re: patch for a locale-specific bug in regression tests (REL9_1_STABLE)