Re: Urgent: Key constraints behaving weirdly

From: "Russell Garrett" <rg(at)tcslon(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Urgent: Key constraints behaving weirdly
Date: 2003-12-19 13:32:55
Message-ID: MKEGJINFADFODDNOKEJCIEKGENAA.rg@tcslon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Tom Lane wrote:
> "Russell Garrett" <rg(at)tcslon(dot)com> writes:
>> last=> reindex index profile_pkey;
>> ERROR: could not create unique index
>> DETAIL: Table contains duplicated values.
>
> Okay, it looks like the table actually contains duplicate rows but the
> index has managed to lose track of that fact. You can see the
> duplicates in the GROUP BY query (which isn't using the index) but
> not when you do "select * from profile where id = 1017057", because
> that query will use the index.

Ah, it went so quick I was sure it was using the index :).

> If you did "set enable_indexscan = off" then the "select * from
> profile where id = 1017057" query would probably show you two rows.
> I'd be interested to see the results of
>
> select ctid, oid, xmin, xmax, cmin, cmax from profile where id =
> 1017057;
> with enable_indexscan off.

last=> set enable_indexscan = off;
SET
last=> select ctid, oid, xmin, xmax, cmin, cmax from profile where id =
1017057;
ctid | oid | xmin | xmax | cmin | cmax
-----------+----------+----------+----------+----------+------
(482,27) | 65196187 | 21095783 | 21953819 | 21953819 | 0
(3095,56) | 64140531 | 20037571 | 20037571 | 545 | 555
(2 rows)

>> Does REINDEX use the current index to check for duplicates? :)
>
> No, it's building a new index from scratch, and so it notices the
> duplicates.

Yeah, I see now.

> What you've got here is definitely a case of index corruption that has
> led to logical corruption of the table (ie, duplicate rows). To get
> back to a valid state you will need to delete whichever of the
> duplicates seems to be out-of-date, and then do a REINDEX. But I
> think it is important first to try to determine what caused the
> corruption (software error or hardware?). Again, if you can take a
> physical dump of the data directory, that would provide an
> opportunity to study the problem later after you get the production
> machine back on its feet.

OK, I've kept a copy of the db and then fixed the problem.

We had an incidence of table corruption a few weeks back, but we just
ignored it (wishful thinking, maybe). Postgres is compiled normally, with
gcc3. OS is Red Hat 9.

Hardware is a Dell Poweredge 2.4Ghz Xeon (can't remember the model number),
DB is using XFS on a MegaRaid U320 controller, running the MegaRaid 1.18j
drivers. Naturally, the drivers don't report any errors. I doubt it's memory
corruption as the system has been rock-solid. So I'm guessing it's probably
MegaRaid, or XFS, or the kernel I suppose (although I'm using the latest).

Problem is, it's quite hard to reproduce the corruption, since it seems to
be fairly rare.

Maybe this should be moved to pgsql-general now?

Russ

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2003-12-19 15:40:39 Re: [GENERAL] CASE in where statement. BUG ??
Previous Message Peter Eisentraut 2003-12-19 12:15:40 Re: BUG #1019: src/pl/tcl/pltcl.c