Re: Table has duplicate keys, what did I do

Lists: pgsql-general
From: John Gateley <gateley(at)jriver(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Table has duplicate keys, what did I do
Date: 2008-01-28 22:04:02
Message-ID: 20080128160402.2101f41f.gateley@jriver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Somehow I have managed to have two tables with duplicate keys.
In both tables, the key is an integer, filled from a sequence.
There is only 1 duplicated entry in each table: in the first
table, there are two ID "1"s, and in the second table there are
two ID "123456"s (the second table entry is linked to the first
table's ID 1).

I noticed this because a pg_dump followed by a psql < dumpfile
will not reload.

I've figured out a fix: a script that cleans the dump file, removing
the two duplicate lines (leaving the original).

But, mostly, I'm wondering how I managed to get in this state,
if it was something I did, or perhaps caused by killing the
postmaster the wrong way (I don't think I ever did this, but
maybe), or a crash.

I did do a brief search, didn't find anything seemingly related to this.

Thanks,

j
--
John Gateley <gateley(at)jriver(dot)com>


From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "John Gateley" <gateley(at)jriver(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Table has duplicate keys, what did I do
Date: 2008-01-28 22:11:21
Message-ID: D425483C2C5C9F49B5B7A41F8944154701000C00@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> -----Original Message-----
> From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
> owner(at)postgresql(dot)org] On Behalf Of John Gateley
> Sent: Monday, January 28, 2008 2:04 PM
> To: pgsql-general(at)postgresql(dot)org
> Subject: [GENERAL] Table has duplicate keys, what did I do
>
> Somehow I have managed to have two tables with duplicate keys.
> In both tables, the key is an integer, filled from a sequence.
> There is only 1 duplicated entry in each table: in the first
> table, there are two ID "1"s, and in the second table there are
> two ID "123456"s (the second table entry is linked to the first
> table's ID 1).

Because of the nature of the values of the id's (1 and 123456) it sounds
very much like a manual insertion. Is there a unique index on the
column? It definitely sounds like there should be. At any rate, I
guess that someone manually inserted the data. Without a unique index
on the column, there is no protection against this.

> I noticed this because a pg_dump followed by a psql < dumpfile
> will not reload.
>
> I've figured out a fix: a script that cleans the dump file, removing
> the two duplicate lines (leaving the original).
>
> But, mostly, I'm wondering how I managed to get in this state,
> if it was something I did, or perhaps caused by killing the
> postmaster the wrong way (I don't think I ever did this, but
> maybe), or a crash.
>
> I did do a brief search, didn't find anything seemingly related to
this.
>
> Thanks,
>
> j
> --
> John Gateley <gateley(at)jriver(dot)com>
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster


From: John Gateley <gateley(at)jriver(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Table has duplicate keys, what did I do
Date: 2008-01-28 22:26:04
Message-ID: 20080128162604.33963ac1.gateley@jriver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 28 Jan 2008 14:11:21 -0800
"Dann Corbit" <DCorbit(at)connx(dot)com> wrote:

> > -----Original Message-----
> > From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
> > owner(at)postgresql(dot)org] On Behalf Of John Gateley
> > Sent: Monday, January 28, 2008 2:04 PM
> > To: pgsql-general(at)postgresql(dot)org
> > Subject: [GENERAL] Table has duplicate keys, what did I do
> >
> > Somehow I have managed to have two tables with duplicate keys.
> > In both tables, the key is an integer, filled from a sequence.
> > There is only 1 duplicated entry in each table: in the first
> > table, there are two ID "1"s, and in the second table there are
> > two ID "123456"s (the second table entry is linked to the first
> > table's ID 1).
>
> Because of the nature of the values of the id's (1 and 123456) it sounds
> very much like a manual insertion. Is there a unique index on the
> column? It definitely sounds like there should be. At any rate, I
> guess that someone manually inserted the data. Without a unique index
> on the column, there is no protection against this.

Yes, the id 1 definitely indicates to me that I did something.
However, there is an index on the column: it's the primary key
for the table. I'm not sure how I could manually insert it if
there were an existing index, or later create the index if it
didn't exist when I did the insert.

Thanks,

j

--
John Gateley <gateley(at)jriver(dot)com>


From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "John Gateley" <gateley(at)jriver(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Table has duplicate keys, what did I do
Date: 2008-01-28 22:33:23
Message-ID: dcc563d10801281433y24d7ec10h224a882b0cb72918@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 28, 2008 4:26 PM, John Gateley <gateley(at)jriver(dot)com> wrote:
> On Mon, 28 Jan 2008 14:11:21 -0800
> "Dann Corbit" <DCorbit(at)connx(dot)com> wrote:
>
> > > -----Original Message-----
> > > From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
> > > owner(at)postgresql(dot)org] On Behalf Of John Gateley
> > > Sent: Monday, January 28, 2008 2:04 PM
> > > To: pgsql-general(at)postgresql(dot)org
> > > Subject: [GENERAL] Table has duplicate keys, what did I do
> > >
> > > Somehow I have managed to have two tables with duplicate keys.
> > > In both tables, the key is an integer, filled from a sequence.
> > > There is only 1 duplicated entry in each table: in the first
> > > table, there are two ID "1"s, and in the second table there are
> > > two ID "123456"s (the second table entry is linked to the first
> > > table's ID 1).
> >
> > Because of the nature of the values of the id's (1 and 123456) it sounds
> > very much like a manual insertion. Is there a unique index on the
> > column? It definitely sounds like there should be. At any rate, I
> > guess that someone manually inserted the data. Without a unique index
> > on the column, there is no protection against this.
>
> Yes, the id 1 definitely indicates to me that I did something.
> However, there is an index on the column: it's the primary key
> for the table. I'm not sure how I could manually insert it if
> there were an existing index, or later create the index if it
> didn't exist when I did the insert.

Are you running with fsync=off and / or hardware that lies about fsync
(ATA / SATA are notorious for this) and possibly having an emergency
power outage of some kind? That's the most common cause of such
problems.


From: "Li, Jingfa" <jinli(at)paypal(dot)com>
To: "John Gateley" <gateley(at)jriver(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Table has duplicate keys, what did I do
Date: 2008-01-28 22:59:04
Message-ID: 0A89F5E49BE1BD418D5113AD3BB51B41024AAB9F@RHV-EXM-01.corp.ebay.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

for now, are you able to insert duplicate keys(primary-key) into the two
tables you mentioned? if you can, check if your index is valid or not.
if index is valid, check if the unique contraint is still valid or not
-- perhaps you turned off the unique constraint, and insert the dup key,
and didn't turn back on the unique constrain validation...

JF

-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org
[mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of John Gateley
Sent: Monday, January 28, 2008 2:26 PM
To: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] Table has duplicate keys, what did I do

On Mon, 28 Jan 2008 14:11:21 -0800
"Dann Corbit" <DCorbit(at)connx(dot)com> wrote:

> > -----Original Message-----
> > From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
> > owner(at)postgresql(dot)org] On Behalf Of John Gateley
> > Sent: Monday, January 28, 2008 2:04 PM
> > To: pgsql-general(at)postgresql(dot)org
> > Subject: [GENERAL] Table has duplicate keys, what did I do
> >
> > Somehow I have managed to have two tables with duplicate keys.
> > In both tables, the key is an integer, filled from a sequence.
> > There is only 1 duplicated entry in each table: in the first table,
> > there are two ID "1"s, and in the second table there are two ID
> > "123456"s (the second table entry is linked to the first table's ID
> > 1).
>
> Because of the nature of the values of the id's (1 and 123456) it
> sounds very much like a manual insertion. Is there a unique index on
> the column? It definitely sounds like there should be. At any rate,
> I guess that someone manually inserted the data. Without a unique
> index on the column, there is no protection against this.

Yes, the id 1 definitely indicates to me that I did something.
However, there is an index on the column: it's the primary key for the
table. I'm not sure how I could manually insert it if there were an
existing index, or later create the index if it didn't exist when I did
the insert.

Thanks,

j

--
John Gateley <gateley(at)jriver(dot)com>

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: John Gateley <gateley(at)jriver(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Table has duplicate keys, what did I do
Date: 2008-01-29 02:36:35
Message-ID: 11084.1201574195@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

John Gateley <gateley(at)jriver(dot)com> writes:
> Somehow I have managed to have two tables with duplicate keys.
> ...
> But, mostly, I'm wondering how I managed to get in this state,

What PG version is this? We've fixed some bugs in the past that
could give rise to duplicated rows.

regards, tom lane


From: John Gateley <gateley(at)jriver(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Table has duplicate keys, what did I do
Date: 2008-01-29 15:57:14
Message-ID: 20080129095714.df7013c5.gateley@jriver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 28 Jan 2008 21:36:35 -0500
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> John Gateley <gateley(at)jriver(dot)com> writes:
> > Somehow I have managed to have two tables with duplicate keys.
> > ...
> > But, mostly, I'm wondering how I managed to get in this state,
>
> What PG version is this? We've fixed some bugs in the past that
> could give rise to duplicated rows.

Currently 8.1.4, and it's been tracking Ubuntu 6.06 LTS for
updates, so it could have been earlier than 8.1.4 when the duplicate
row actually was created.

Thanks,

j

--
John Gateley <gateley(at)jriver(dot)com>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: John Gateley <gateley(at)jriver(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Table has duplicate keys, what did I do
Date: 2008-01-29 17:53:11
Message-ID: 23005.1201629191@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

John Gateley <gateley(at)jriver(dot)com> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> John Gateley <gateley(at)jriver(dot)com> writes:
>>> Somehow I have managed to have two tables with duplicate keys.

>> What PG version is this? We've fixed some bugs in the past that
>> could give rise to duplicated rows.

> Currently 8.1.4, and it's been tracking Ubuntu 6.06 LTS for
> updates, so it could have been earlier than 8.1.4 when the duplicate
> row actually was created.

There was a fix released in 8.1.9 for a problem that could cause VACUUM
FULL to create duplicate copies of a row that had recently been updated.
Does that sound like a plausible scenario for your usage?

regards, tom lane


From: John Gateley <gateley(at)jriver(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Table has duplicate keys, what did I do
Date: 2008-01-29 18:13:16
Message-ID: 20080129121316.a2192234.gateley@jriver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 29 Jan 2008 12:53:11 -0500
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> John Gateley <gateley(at)jriver(dot)com> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> John Gateley <gateley(at)jriver(dot)com> writes:
> >>> Somehow I have managed to have two tables with duplicate keys.
>
> >> What PG version is this? We've fixed some bugs in the past that
> >> could give rise to duplicated rows.
>
> > Currently 8.1.4, and it's been tracking Ubuntu 6.06 LTS for
> > updates, so it could have been earlier than 8.1.4 when the duplicate
> > row actually was created.
>
> There was a fix released in 8.1.9 for a problem that could cause VACUUM
> FULL to create duplicate copies of a row that had recently been updated.
> Does that sound like a plausible scenario for your usage?

Yes, it does, very much so. The row in question is updated once
a minute (it is "test" data that is used by our system monitor
to ensure that the database is up, and one of the tests is updating
the row), and the database is vacuumed full once a day.

Thanks,

j

--
John Gateley <gateley(at)jriver(dot)com>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: John Gateley <gateley(at)jriver(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Table has duplicate keys, what did I do
Date: 2008-01-29 19:47:11
Message-ID: 24684.1201636031@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

John Gateley <gateley(at)jriver(dot)com> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There was a fix released in 8.1.9 for a problem that could cause VACUUM
>> FULL to create duplicate copies of a row that had recently been updated.
>> Does that sound like a plausible scenario for your usage?

> Yes, it does, very much so. The row in question is updated once
> a minute (it is "test" data that is used by our system monitor
> to ensure that the database is up, and one of the tests is updating
> the row), and the database is vacuumed full once a day.

Yeah, that fits exactly. IIRC that VACUUM FULL bug could only be
triggered if there had been a series of multiple updates to the same row
within the lifespan of the oldest open transaction, so repeated updates
on a short timescale would form part of the triggering condition.

Sounds like you need to pester Ubuntu to freshen their package ...

regards, tom lane