Re: BUG #2379: Duplicate pkeys in table

Lists: pgsql-bugs
From: "Philip Warner" <pjw(at)rhyme(dot)com(dot)au>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 06:29:29
Message-ID: 200604060629.k366TTYv083632@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 2379
Logged by: Philip Warner
Email address: pjw(at)rhyme(dot)com(dot)au
PostgreSQL version: 8.0.7
Operating system: FreeBSD
Description: Duplicate pkeys in table
Details:

We have an intermittent bug that occurs on a table which is updated several
times per second. The bug occurs every few days/weeks. It is usually
preceeded by a "tuple concurrently updated" messages, but I could not swear
it is always preceeded by it.

The result of the bug is demonstrated by:

select id,count(*) from xxx group by id having count(*)>1;
id | count
-------+-------
24613 | 6
(1 row)

(where ID is the pkey (and xxx is not the real table name)).

The row concerned is one of the most frequestly updated rows, and shows only
one row when I do a 'select * from xxx where id = 24613'.

If I do a select id from xxx order by id, I get 6 duplicate rows.

We have tried a reindex:

mail=# reindex table xxx;
ERROR: could not create unique index
DETAIL: Table contains duplicated values.

All of which seems to suggest that there really are 6 data rows with the
same pkey.

None of our code changes the pkey, but we do have several places that update
this table intriggers.

Any help or suggestions would be greatly appreciated.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 12:12:31
Message-ID: 20060406121231.GA6814@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Philip Warner wrote:

> We have an intermittent bug that occurs on a table which is updated several
> times per second. The bug occurs every few days/weeks. It is usually
> preceeded by a "tuple concurrently updated" messages, but I could not swear
> it is always preceeded by it.
>
> The result of the bug is demonstrated by:
>
> select id,count(*) from xxx group by id having count(*)>1;
> id | count
> -------+-------
> 24613 | 6
> (1 row)

Please do a

SELECT xmin, xmax, cmin, cmax FROM xxx where id = 24613;

if you still have that particular manifestation.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 12:20:21
Message-ID: 20060406122021.GA52575@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Thu, Apr 06, 2006 at 08:12:31AM -0400, Alvaro Herrera wrote:
> Please do a
>
> SELECT xmin, xmax, cmin, cmax FROM xxx where id = 24613;
>
> if you still have that particular manifestation.

Also, you'll probably need to set enable_indexscan to off prior to
running the above query.

--
Michael Fuhr


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Michael Fuhr <mike(at)fuhr(dot)org>, alvherre(at)commandprompt(dot)com
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 13:03:23
Message-ID: 4435119B.8000002@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Michael Fuhr wrote:
> On Thu, Apr 06, 2006 at 08:12:31AM -0400, Alvaro Herrera wrote:
>
>> Please do a
>>
>> SELECT xmin, xmax, cmin, cmax FROM xxx where id = 24613;
>>
# set enable_indexscan=off;
# SELECT xmin, xmax, cmin, cmax FROM xxx where id = 24613;
xmin | xmax | cmin | cmax
----------+----------+------+----------
32902771 | 0 | 20 | 32902872
32902771 | 0 | 20 | 32902872
32902771 | 0 | 20 | 32902872
32902771 | 0 | 20 | 32902872
32902771 | 0 | 20 | 32902872
32902771 | 33048159 | 20 | 20
(6 rows)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org, Michael Fuhr <mike(at)fuhr(dot)org>, alvherre(at)commandprompt(dot)com
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 14:24:15
Message-ID: 17857.1144333455@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> # set enable_indexscan=off;
> # SELECT xmin, xmax, cmin, cmax FROM xxx where id = 24613;
> xmin | xmax | cmin | cmax
> ----------+----------+------+----------
> 32902771 | 0 | 20 | 32902872
> 32902771 | 0 | 20 | 32902872
> 32902771 | 0 | 20 | 32902872
> 32902771 | 0 | 20 | 32902872
> 32902771 | 0 | 20 | 32902872
> 32902771 | 33048159 | 20 | 20
> (6 rows)

For completeness, could we also see ctid in that query?

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org, Michael Fuhr <mike(at)fuhr(dot)org>
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 14:27:23
Message-ID: 20060406142723.GD15753@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Philip Warner wrote:

> # set enable_indexscan=off;
> # SELECT xmin, xmax, cmin, cmax FROM xxx where id = 24613;
> xmin | xmax | cmin | cmax
> ----------+----------+------+----------
> 32902771 | 0 | 20 | 32902872
> 32902771 | 0 | 20 | 32902872
> 32902771 | 0 | 20 | 32902872
> 32902771 | 0 | 20 | 32902872
> 32902771 | 0 | 20 | 32902872
> 32902771 | 33048159 | 20 | 20
> (6 rows)

Ugh.

Do the triggers involved have EXCEPTION clauses? (I assume they are
written in PL/pgSQL -- are there any in other languages?)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 14:39:08
Message-ID: 4435280C.7000003@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:
> For completeness, could we also see ctid in that query?
mail=# set enable_indexscan=off;
mail=# SELECT xmin, xmax, cmin, cmax,ctid FROM xxx where id = 24613;
xmin | xmax | cmin | cmax | ctid
----------+----------+------+----------+---------
32902771 | 0 | 20 | 32902872 | (0,7)
32902771 | 0 | 20 | 32902872 | (2,27)
32902771 | 0 | 20 | 32902872 | (58,27)
32902771 | 0 | 20 | 32902872 | (60,28)
32902771 | 0 | 20 | 32902872 | (69,3)
32902771 | 33048159 | 20 | 20 | (72,27)
(6 rows)


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 14:40:02
Message-ID: 44352842.1050105@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alvaro Herrera wrote:
> Do the triggers involved have EXCEPTION clauses? (I assume they are
> written in PL/pgSQL -- are there any in other languages?)
Triggers that update this table are in pl/pgsql, and can *raise*
exceptions (using RAISE) if that is what you mean. They do not handle
them -- is that even possible on pl/pgsql?

Other triggers (for slony) are written in plpgsql and C, and I know some
of the pl/pgsql triggers raise exceptions.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: pgsql-bugs(at)postgreSQL(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 14:53:42
Message-ID: 18735.1144335222@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> mail=# set enable_indexscan=off;
> mail=# SELECT xmin, xmax, cmin, cmax,ctid FROM xxx where id = 24613;
> xmin | xmax | cmin | cmax | ctid
> ----------+----------+------+----------+---------
> 32902771 | 0 | 20 | 32902872 | (0,7)
> 32902771 | 0 | 20 | 32902872 | (2,27)
> 32902771 | 0 | 20 | 32902872 | (58,27)
> 32902771 | 0 | 20 | 32902872 | (60,28)
> 32902771 | 0 | 20 | 32902872 | (69,3)
> 32902771 | 33048159 | 20 | 20 | (72,27)
> (6 rows)

The "cmax" values in the first 5 rows are evidently really xvac values,
ie, these have all been moved by VACUUM FULL. (I assume you run VACUUM
FULL regularly on this table?) The thing that is striking though is
that the xmin/cmin values are all the same, indicating that all six
tuples were inserted by the same command. That seems pretty odd. Can
you show us the procedure by which rows are inserted in this table?

Also, the last tuple has either been deleted or locked-for-update by
transaction 33048159; if it were an attempted deletion we'd have to
conclude that 33048159 failed to commit. Do you use SELECT FOR UPDATE
on this table?

BTW, which of these rows is selected by an indexscan-enabled query,
ie, set enable_indexscan=on then repeat same query?

regards, tom lane


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 15:03:11
Message-ID: 44352DAF.3060100@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:
> Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
>
>> mail=# set enable_indexscan=off;
>> mail=# SELECT xmin, xmax, cmin, cmax,ctid FROM xxx where id = 24613;
>> xmin | xmax | cmin | cmax | ctid
>> ----------+----------+------+----------+---------
>> 32902771 | 0 | 20 | 32902872 | (0,7)
>> 32902771 | 0 | 20 | 32902872 | (2,27)
>> 32902771 | 0 | 20 | 32902872 | (58,27)
>> 32902771 | 0 | 20 | 32902872 | (60,28)
>> 32902771 | 0 | 20 | 32902872 | (69,3)
>> 32902771 | 33048159 | 20 | 20 | (72,27)
>> (6 rows)
>>
>
> The "cmax" values in the first 5 rows are evidently really xvac values,
> ie, these have all been moved by VACUUM FULL. (I assume you run VACUUM
> FULL regularly on this table?)
Yes, every minute. Table has about 1500 rows and grows *very* fast due
to updates.

> The thing that is striking though is
> that the xmin/cmin values are all the same, indicating that all six
> tuples were inserted by the same command. That seems pretty odd. Can
> you show us the procedure by which rows are inserted in this table?
>
The original insertion is probably not relevant (it happened months
ago); there are many places that update the table. And for the specific
row in question, it was probably inserted directly by psql. Other rows
exhibit this problem (less often), were usually inserted by a long pgsql
procedure.

Updates happen regularly from many sources, but the procedure that does
the most updates is a trigger. Do you want to see that?

> Also, the last tuple has either been deleted or locked-for-update by
> transaction 33048159; if it were an attempted deletion we'd have to
> conclude that 33048159 failed to commit. Do you use SELECT FOR UPDATE
> on this table?
>
No. But when a new row is added, I do lock the table in exclusive mode:

Lock Table xxx In Exclusive Mode;

The specific row in these examples will never be deleted.

> BTW, which of these rows is selected by an indexscan-enabled query,
> ie, set enable_indexscan=on then repeat same query?
>

xmin | xmax | cmin | cmax | ctid
----------+----------+------+------+---------
32902771 | 33048159 | 20 | 20 | (72,27)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 15:09:30
Message-ID: 24629.1144336170@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> Tom Lane wrote:
>> The thing that is striking though is
>> that the xmin/cmin values are all the same, indicating that all six
>> tuples were inserted by the same command. That seems pretty odd. Can
>> you show us the procedure by which rows are inserted in this table?
>>
> The original insertion is probably not relevant (it happened months
> ago); there are many places that update the table.

OK, what I should say is that all these tuples were updated by the same
command.

> Updates happen regularly from many sources, but the procedure that does
> the most updates is a trigger. Do you want to see that?

Please.

Also, if you care to run pg_filedump -i -F over the table, it'd be
interesting to see the complete header info for each of these tuples.

regards, tom lane


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 15:39:38
Message-ID: 4435363A.2060609@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:
>> Updates happen regularly from many sources, but the procedure that does
>> the most updates is a trigger. Do you want to see that?
>>
>
> Please.
>
public | tg_update_qqq_date | "trigger"
| | mail | plpgsql |
Declare
uid bigint;
Begin
uid = (select owner_id from yyy m where m.f1 = NEW.f1);
if (uid <> 0 and not uid is null) then
update xxx set qqq_date = 'now' where id=uid;
end if;
Return NEW;
End; |

and there's also a rewrite rule:

zzz_update_r1 AS
ON UPDATE TO zzz DO INSTEAD UPDATE xxx SET f1 = new.f1
WHERE xxx.id = new.id
zzz_update_r2 AS
ON UPDATE TO zzz DO INSTEAD UPDATE xxx SET f2 = new.f2
WHERE xxx.id = new.id

> Also, if you care to run pg_filedump -i -F over the table, it'd be
> interesting to see the complete header info for each of these tuples.
>
obviously from different blocks (do you need more details?):

Item 7 -- Length: 168 Offset: 3920 (0x0f50) Flags: USED
XMIN: 32902771 CMIN: 20 XMAX: 0 CMAX|XVAC: 32902872
Block Id: 0 linp Index: 7 Attributes: 34 Size: 36
infomask: 0x2913
(HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
t_bits: [0]: 0x9f [1]: 0x80 [2]: 0x7e [3]: 0x84
[4]: 0x00

0f50: 730ef601 14000000 00000000 d80ef601 s...............
0f60: 00000000 07002200 1329249f 807e8400 ......"..)$..~..
0f70: d37e0000 25600000 00000000 09000000 .~..%`..........
0f80: 00000000 00000000 00000000 00000000 ................
0f90: 00000000 00000000 04000000 12bcf968 ...............h
0fa0: d28fa741 22000000 5f5f4021 696e7465 ...A"(dot)(dot)(dot)__(at)!inte
0fb0: 726e616c 5f64656c 69766572 795f7573 rnal_delivery_us
0fc0: 65722140 5f5f0000 00000000 00000000 er!(at)__(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)
0fd0: 01000000 00000000 3c307819 0e1fa441 ........<0x....A
0fe0: 3c307819 0e1fa441 3c307819 0e1fa441 <0x....A<0x....A
0ff0: 00000000 00000000 ........

Item 27 -- Length: 168 Offset: 2700 (0x0a8c) Flags: USED
XMIN: 32902771 CMIN: 20 XMAX: 0 CMAX|XVAC: 32902872
Block Id: 2 linp Index: 27 Attributes: 34 Size: 36
infomask: 0x2913
(HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
t_bits: [0]: 0x9f [1]: 0x80 [2]: 0x7e [3]: 0x84
[4]: 0x00

0a8c: 730ef601 14000000 00000000 d80ef601 s...............
0a9c: 00000200 1b002200 1329249f 807e8400 ......"..)$..~..
0aac: d37e0000 25600000 00000000 09000000 .~..%`..........
0abc: 00000000 00000000 00000000 00000000 ................
0acc: 00000000 00000000 04000000 12bcf968 ...............h
0adc: d28fa741 22000000 5f5f4021 696e7465 ...A"(dot)(dot)(dot)__(at)!inte
0aec: 726e616c 5f64656c 69766572 795f7573 rnal_delivery_us
0afc: 65722140 5f5f0000 00000000 00000000 er!(at)__(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)
0b0c: 01000000 00000000 3c307819 0e1fa441 ........<0x....A
0b1c: 3c307819 0e1fa441 3c307819 0e1fa441 <0x....A<0x....A
0b2c: 00000000 00000000 ........

Item 27 -- Length: 168 Offset: 7724 (0x1e2c) Flags: USED
XMIN: 32902771 CMIN: 20 XMAX: 0 CMAX|XVAC: 32902872
Block Id: 58 linp Index: 27 Attributes: 34 Size: 36
infomask: 0x2913
(HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
t_bits: [0]: 0x9f [1]: 0x80 [2]: 0x7e [3]: 0x84
[4]: 0x00

1e2c: 730ef601 14000000 00000000 d80ef601 s...............
1e3c: 00003a00 1b002200 1329249f 807e8400 ..:..."..)$..~..
1e4c: d37e0000 25600000 00000000 09000000 .~..%`..........
1e5c: 00000000 00000000 00000000 00000000 ................
1e6c: 00000000 00000000 04000000 12bcf968 ...............h
1e7c: d28fa741 22000000 5f5f4021 696e7465 ...A"(dot)(dot)(dot)__(at)!inte
1e8c: 726e616c 5f64656c 69766572 795f7573 rnal_delivery_us
1e9c: 65722140 5f5f0000 00000000 00000000 er!(at)__(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)
1eac: 01000000 00000000 3c307819 0e1fa441 ........<0x....A
1ebc: 3c307819 0e1fa441 3c307819 0e1fa441 <0x....A<0x....A
1ecc: 00000000 00000000 ........

Item 28 -- Length: 168 Offset: 8024 (0x1f58) Flags: USED
XMIN: 32902771 CMIN: 20 XMAX: 0 CMAX|XVAC: 32902872
Block Id: 60 linp Index: 28 Attributes: 34 Size: 36
infomask: 0x2913
(HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
t_bits: [0]: 0x9f [1]: 0x80 [2]: 0x7e [3]: 0x84
[4]: 0x00

1f58: 730ef601 14000000 00000000 d80ef601 s...............
1f68: 00003c00 1c002200 1329249f 807e8400 ..<..."..)$..~..
1f78: d37e0000 25600000 00000000 09000000 .~..%`..........
1f88: 00000000 00000000 00000000 00000000 ................
1f98: 00000000 00000000 04000000 12bcf968 ...............h
1fa8: d28fa741 22000000 5f5f4021 696e7465 ...A"(dot)(dot)(dot)__(at)!inte
1fb8: 726e616c 5f64656c 69766572 795f7573 rnal_delivery_us
1fc8: 65722140 5f5f0000 00000000 00000000 er!(at)__(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)
1fd8: 01000000 00000000 3c307819 0e1fa441 ........<0x....A
1fe8: 3c307819 0e1fa441 3c307819 0e1fa441 <0x....A<0x....A
1ff8: 00000000 00000000 ........

Item 3 -- Length: 168 Offset: 8024 (0x1f58) Flags: USED
XMIN: 32902771 CMIN: 20 XMAX: 0 CMAX|XVAC: 32902872
Block Id: 69 linp Index: 3 Attributes: 34 Size: 36
infomask: 0x2913
(HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
t_bits: [0]: 0x9f [1]: 0x80 [2]: 0x7e [3]: 0x84
[4]: 0x00

1f58: 730ef601 14000000 00000000 d80ef601 s...............
1f68: 00004500 03002200 1329249f 807e8400 ..E..."..)$..~..
1f78: d37e0000 25600000 00000000 09000000 .~..%`..........
1f88: 00000000 00000000 00000000 00000000 ................
1f98: 00000000 00000000 04000000 12bcf968 ...............h
1fa8: d28fa741 22000000 5f5f4021 696e7465 ...A"(dot)(dot)(dot)__(at)!inte
1fb8: 726e616c 5f64656c 69766572 795f7573 rnal_delivery_us
1fc8: 65722140 5f5f0000 00000000 00000000 er!(at)__(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)
1fd8: 01000000 00000000 3c307819 0e1fa441 ........<0x....A
1fe8: 3c307819 0e1fa441 3c307819 0e1fa441 <0x....A<0x....A
1ff8: 00000000 00000000 ........

Item 27 -- Length: 168 Offset: 8024 (0x1f58) Flags: USED
XMIN: 32902771 CMIN: 20 XMAX: 33048159 CMAX|XVAC: 20
Block Id: 318 linp Index: 6 Attributes: 34 Size: 36
infomask: 0x2913
(HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
t_bits: [0]: 0x9f [1]: 0x80 [2]: 0x7e [3]: 0x84
[4]: 0x00

1f58: 730ef601 14000000 5f46f801 14000000 s......._F......
1f68: 00003e01 06002200 1329249f 807e8400 ..>..."..)$..~..
1f78: d37e0000 25600000 00000000 09000000 .~..%`..........
1f88: 00000000 00000000 00000000 00000000 ................
1f98: 00000000 00000000 04000000 12bcf968 ...............h
1fa8: d28fa741 22000000 5f5f4021 696e7465 ...A"(dot)(dot)(dot)__(at)!inte
1fb8: 726e616c 5f64656c 69766572 795f7573 rnal_delivery_us
1fc8: 65722140 5f5f0000 00000000 00000000 er!(at)__(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)(dot)
1fd8: 01000000 00000000 3c307819 0e1fa441 ........<0x....A
1fe8: 3c307819 0e1fa441 3c307819 0e1fa441 <0x....A<0x....A
1ff8: 00000000 00000000 ........


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 15:51:29
Message-ID: 24907.1144338689@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> public | tg_update_qqq_date | "trigger"
> | | mail | plpgsql |
> Declare
> uid bigint;
> Begin
> uid = (select owner_id from yyy m where m.f1 = NEW.f1);
> if (uid <> 0 and not uid is null) then
> update xxx set qqq_date = 'now' where id=uid;
> end if;
> Return NEW;
> End; |

> and there's also a rewrite rule:

> zzz_update_r1 AS
> ON UPDATE TO zzz DO INSTEAD UPDATE xxx SET f1 = new.f1
> WHERE xxx.id = new.id
> zzz_update_r2 AS
> ON UPDATE TO zzz DO INSTEAD UPDATE xxx SET f2 = new.f2
> WHERE xxx.id = new.id

OK, I'm a bit confused by the obfuscation here. The table with the
duplicates is xxx, or qqq? Which one is the trigger attached to?
zzz is a view I suppose?

regards, tom lane


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 16:01:42
Message-ID: 44353B66.6090803@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:
> OK, I'm a bit confused by the obfuscation here. The table with the
> duplicates is xxx, or qqq?
Yes, xxx is the broken table. The two rewrite rules map updates on a
view to an underlying table (the broken one).

Updates on the view occur very frequently. Perhaps 400,000 per day? And,
sadly, 200,000 of those on this one row.

> Which one is the trigger attached to?
> zzz is a view I suppose?
>
The trigger is attached to yet another table; when an insert or update
occurs in this third table, a date field on xxx is updated.

Updates/inserts on this happen less frequently -- perhaps low 10's of
thousands per day.

I had forgotten about the view/rewrite rules until I looked again at the
db source (I thought they were triggers). These are the source of 90% of
the updates.


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: pgsql-bugs(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 16:04:51
Message-ID: 44353C23.4080503@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:
> OK, I'm a bit confused by the obfuscation here. The table with the
> duplicates is xxx, or qqq?

Possibly less obscure version:

public | tg_update_anotherTable_date | "trigger"
| | mail | plpgsql |
Declare
uid bigint;
Begin
uid = (select owner_id from anotherNotherTable m where
m.keyField = NEW.fkField);
if (uid <> 0 and not uid is null) then
update brokenTable set some_date = 'now' where id=uid;
end if;
Return NEW;
End; |

and there's also a rewrite rule:

aView_update_r1 AS
ON UPDATE TO aView DO INSTEAD UPDATE brokenTable SET f1 = new.f1
WHERE brokenTable.id = new.id
aView_update_r2 AS
ON UPDATE TO aView DO INSTEAD UPDATE brokenTable SET f2 = new.f2
WHERE brokenTable.id = new.id


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 16:07:08
Message-ID: 25078.1144339628@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> aView_update_r1 AS
> ON UPDATE TO aView DO INSTEAD UPDATE brokenTable SET f1 = new.f1
> WHERE brokenTable.id = new.id
> aView_update_r2 AS
> ON UPDATE TO aView DO INSTEAD UPDATE brokenTable SET f2 = new.f2
> WHERE brokenTable.id = new.id

OK, that's a bit clearer. It'd help to see the view definition too.

regards, tom lane


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 16:14:49
Message-ID: 44353E79.4040100@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:
> Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
>
>> aView_update_r1 AS
>> ON UPDATE TO aView DO INSTEAD UPDATE brokenTable SET f1 = new.f1
>> WHERE brokenTable.id = new.id
>> aView_update_r2 AS
>> ON UPDATE TO aView DO INSTEAD UPDATE brokenTable SET f2 = new.f2
>> WHERE brokenTable.id = new.id
>>
>
> OK, that's a bit clearer. It'd help to see the view definition too.
>
Nothing special, just some 'fake' fields and no join (mainly there for
access control and compatibility across several applications):

SELECT
brokenTable.id,
brokenTable.someField1 AS someAlias1,
brokenTable."someField2"::character varying(64) AS someAlias2,
0::bigint AS someAlias3,
brokenTable.someField4,
'crypt'::character varying AS someAlias5,
brokenTable.fieldUpdatedByTrigger,
brokenTable.fieldUpdatedByRule1
FROM brokenTable;


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 16:20:21
Message-ID: 44353FC5.4050606@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Oops. Minor change. Last two fields are updated by rules.

Tom Lane wrote:
> Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
>
>> aView_update_r1 AS
>> ON UPDATE TO aView DO INSTEAD UPDATE brokenTable SET f1 = new.f1
>> WHERE brokenTable.id = new.id
>> aView_update_r2 AS
>> ON UPDATE TO aView DO INSTEAD UPDATE brokenTable SET f2 = new.f2
>> WHERE brokenTable.id = new.id
>>
>
> OK, that's a bit clearer. It'd help to see the view definition too.
>
Nothing special, just some 'fake' fields and no join (mainly there for
access control and compatibility across several applications):

SELECT
brokenTable.id,
brokenTable.someField1 AS someAlias1,
brokenTable."someField2"::character varying(64) AS someAlias2,
0::bigint AS someAlias3,
brokenTable.someField4,
'crypt'::character varying AS someAlias5,
brokenTable.fieldUpdatedByRule1,
brokenTable.fieldUpdatedByRule2
FROM brokenTable;


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 16:56:29
Message-ID: 20060406165629.GF15753@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Philip Warner wrote:

> Item 7 -- Length: 168 Offset: 3920 (0x0f50) Flags: USED
> XMIN: 32902771 CMIN: 20 XMAX: 0 CMAX|XVAC: 32902872
> Block Id: 0 linp Index: 7 Attributes: 34 Size: 36
> infomask: 0x2913
> (HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
> t_bits: [0]: 0x9f [1]: 0x80 [2]: 0x7e [3]: 0x84
> [4]: 0x00

Hmm, shouldn't we see the MOVED_OFF bit set also if the cmax/xvac field
is actually xvac?

> Item 27 -- Length: 168 Offset: 8024 (0x1f58) Flags: USED
> XMIN: 32902771 CMIN: 20 XMAX: 33048159 CMAX|XVAC: 20
> Block Id: 318 linp Index: 6 Attributes: 34 Size: 36
> infomask: 0x2913
> (HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
> t_bits: [0]: 0x9f [1]: 0x80 [2]: 0x7e [3]: 0x84
> [4]: 0x00

I'm confused -- the original report showed this tuple with ctid (72,27),
but this seems to be in a different block?

What's the explanation for this tuple to have cmin=cmax? Is this
normal?

Sorry I have only questions :-(

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Philip Warner <pjw(at)rhyme(dot)com(dot)au>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-06 17:16:38
Message-ID: 25612.1144343798@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Philip Warner wrote:
>> Item 7 -- Length: 168 Offset: 3920 (0x0f50) Flags: USED
>> XMIN: 32902771 CMIN: 20 XMAX: 0 CMAX|XVAC: 32902872
>> Block Id: 0 linp Index: 7 Attributes: 34 Size: 36
>> infomask: 0x2913
>> (HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
>> t_bits: [0]: 0x9f [1]: 0x80 [2]: 0x7e [3]: 0x84
>> [4]: 0x00

> Hmm, shouldn't we see the MOVED_OFF bit set also if the cmax/xvac field
> is actually xvac?

These are probably MOVED_IN not MOVED_OFF tuples; VACUUM FULL clears
their MOVED_IN flags at the end (see update_hint_bits()), which explains
the lack of any flag being set.

>> Item 27 -- Length: 168 Offset: 8024 (0x1f58) Flags: USED
>> XMIN: 32902771 CMIN: 20 XMAX: 33048159 CMAX|XVAC: 20
>> Block Id: 318 linp Index: 6 Attributes: 34 Size: 36
>> infomask: 0x2913
>> (HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
>> t_bits: [0]: 0x9f [1]: 0x80 [2]: 0x7e [3]: 0x84
>> [4]: 0x00

> I'm confused -- the original report showed this tuple with ctid (72,27),
> but this seems to be in a different block?

The blockid/linp fields are pointing to the updated version of the row
that xid 33048159 tried to create. It's a plausible theory that the
reason 33048159 failed (as it evidently did, XMAX_INVALID) is that it
failed on insertion of a duplicate key into the pkey index.

It is interesting that the CMIN and CMAX are the same, but that probably
just says that 32902771 and 33048159 were instances of the exact same
SQL command pattern, and so their attempted updates both came 20
commands into the transaction.

Philip suggested to me off-list that the initial error may have been the
VACUUM FULL (xid 32902872) creating duplicate moved copies of a single
valid row. That seems plausible because VACUUM FULL suppresses
duplicate-index checks, and it's real hard to see any other way that a
single transaction could have inserted all of these tuples without
triggering the btree duplicate-key check (barring a completely corrupt
index anyway). I wouldn't be surprised in the least bit to find another
corner-case bug in the VACUUM FULL tuple chain moving code ... that bit
of spaghetti has been too complex since day one ...

regards, tom lane


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2379: Duplicate pkeys in table
Date: 2006-04-07 01:56:37
Message-ID: 4435C6D5.8050602@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:
> Philip suggested to me off-list that the initial error may have been the
> VACUUM FULL (xid 32902872) creating duplicate moved copies of a single
> valid row. That seems plausible because VACUUM FULL suppresses
> duplicate-index checks, and it's real hard to see any other way that a
> single transaction could have inserted all of these tuples without
> triggering the btree duplicate-key check (barring a completely corrupt
> index anyway).
Another interesting factor -- these problems have not yet happened on
any replicated DB. Slony replication works by using a trigger to store
data changes in a log table; these changes are then applied on the
destination DB. Slony also disables triggers on the destination. Because
of the update load we also run the same vacuum regime.

This seems to make it unlikely that vacuum is the sole culprit.

Another difference is that the replicated DB is only ever updated.
Virtually no load from applications reading data. I wonder if shared
buffers in some way also interact here.