Re: 9a57858f1103b89a5674f0d50c5fe1f756411df6

From: Greg Stark <stark(at)mit(dot)edu>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: 9a57858f1103b89a5674f0d50c5fe1f756411df6
Date: 2014-03-13 03:51:42
Message-ID: CAM-w4HO2CAQ1k34cx3vw3_gJ8eQxUA44kgSh=pCQTpCsj5VnPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13 Mar 2014 01:36, "Stephen Frost" <sfrost(at)snowman(dot)net> wrote:
>
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> > This thread badly needs a more informative Subject line.
>
> Agreed.
>
> > But, yeah: do people think the referenced commit fixes a bug bad enough
> > to deserve a quick update release? If so, why? Multiple reports of
> > problems in the field would be a good reason, but I've not seen such.
>
> Uh, isn't what brought this to light two independent complaints from
> Peter and Greg Stark of seeing corruption in the field due to this?
>
> Peter's initial email also indicated it was two different systems which
> had gotten bit by this and Greg explicitly stated that he was working on
> an independent database from what Peter was reporting on, so that's at
> least 2 (one each), or 3 (if you count databases, as Peter had 2).
> Sure, they're all from Heroku, but I find it highly unlikely no one else
> has run into this issue. More likely, they simply haven't realized it's
> happened to them (which is another reason this is a particularly nasty
> bug..).

We have the two databases where we're sure this was the problem. On the one
I worked on the customer complained that it happened repeatedly.

The key I demonstrated here wasn't even the one the costumer was
complaining about. It seems their usage pattern made it extremely easy to
trigger and that usage pattern arose naturally from using a rails module
called counter_cache which maintains a cache of the count of a child take
in the parent table.

We also have a few other customers complaining about duplicate keys. It's
hard to be sure but these may have been standbys where the problem occurred
ages ago and they only now activated their standby and ran into the problem.

That's what worries me most about this bug. You'll only detect it if you're
routinely querying your standby. If you have a standby for HA purposes it
might be corrupt for a long time without you realising it. We may be
fielding corruption complaints for a long time without being able to
conclusively prove whether it's due to this bug or not.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message majid 2014-03-13 04:39:00 BUG #9555: pg_dump for tables with inheritance recreates the table with the wrong order of columns
Previous Message Fabrízio de Royes Mello 2014-03-13 03:11:21 Is this a bug?