Re: Explaining duplicate rows in spite of unique index

From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Explaining duplicate rows in spite of unique index
Date: 2010-02-24 09:12:38
Message-ID: D960CB61B694CF459DCFB4B0128514C20393819F@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I wrote:

> We recently found a couple of rows in a production database
> that had identical values in the columns constituting the primary key
> (The problem surfaced because a pg_dump could not be restored).
>
> Now I'm looking for explanations how this could happen.
>
> The rows originate from around the time when we had a hardware
> failure that corrupted the file system. The database came up
> after a file system check, and people continued working until
> we noticed that some tables were corrupted.
>
> At that point we restored an online backup and recovered past
> the time of the hardware failure. The WALs were intact and recovery
> completed successfully.
>
> Now does the following explanation sound plausible:
[...]

For the record, I set up a small experiment and could reproduce the
corruption like this:

- initdb new cluster, start the postmaster
- create a table with a primary key constraint
- perform an online backup
- stop the postmaster
- cp -p pg_clog/0000 pg_clog/0000.bak
- start the postmaster
- insert a record, commit
- stop the postmaster
- mv pg_clog/0000.bak pg_clog/0000
- start the postmaster
- Insert and commit a second record with the same primary key.
This will succeed since the previous commit is "gone".
- stop the postmaster
- perform PITR from the online backup to the end of WALs

The PITR will succeed, and you end up with two rows with the
same primary key, but only the second one is in the unique index.

Here's what I learned:
- Don't let anybody work on a database after a hardware problem
that affected the integrity of the file system, even if the
postmaster comes up without complaining.
- Never perform point-in-time-recovery beyond the time when
the hardware problem happened.

Yours,
Laurenz Albe

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Luigi Antognini 2010-02-24 09:26:02 Transaction isolation when applying DDLs
Previous Message Albe Laurenz 2010-02-24 08:58:28 Re: