Weird ..... (a=1 or a=2) <> (a=2 or a=1)

Lists: pgsql-generalpgsql-hackers
From: "Rafael Martinez, Guerrero" <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: pgsql-general(at)postgresql(dot)org
Subject: Weird ..... (a=1 or a=2) <> (a=2 or a=1)
Date: 2006-05-16 13:55:41
Message-ID: 1147787741.3362.156.camel@bbking.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hello

Today, one user complained that one of the tickets in our system had
disappeared some places but could be accessed other places. I thought
this was weird and startet debugging.

I have found out the sql statement with 'problems'. Can anybody explain
me why A) returns 12 rows and B) returns 13 rows?. The only different is
the "open OR new" / "new OR open" part. Should not they return the samme
result?

PostgreSQL ver.: 7.4.12

--------------------------------------------------
A)
--------------------------------------------------
rtprod=# SELECT id,effectiveid,status,type,queue FROM Tickets main
WHERE ((main.EffectiveId = main.id)) AND ((main.Status != 'deleted'))
AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status
= 'open') OR (main.Status = 'new') ) ) ORDER BY main.id ASC LIMIT 50;
id | effectiveid | status | type | queue
-------+-------------+--------+--------+-------
37968 | 37968 | open | ticket | 29
38052 | 38052 | open | ticket | 29
38176 | 38176 | open | ticket | 29
38185 | 38185 | open | ticket | 29
38386 | 38386 | open | ticket | 29
38394 | 38394 | open | ticket | 29
38403 | 38403 | open | ticket | 29
38406 | 38406 | open | ticket | 29
38422 | 38422 | open | ticket | 29
38467 | 38467 | open | ticket | 29
38474 | 38474 | open | ticket | 29
38530 | 38530 | open | ticket | 29
(12 rows)
--------------------------------------------------
B)
--------------------------------------------------
rtprod=# SELECT id,effectiveid,status,type,queue FROM Tickets main
WHERE ((main.EffectiveId = main.id)) AND ((main.Status != 'deleted'))
AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status
= 'new') OR (main.Status = 'open') ) ) ORDER BY main.id ASC LIMIT 50;
id | effectiveid | status | type | queue
-------+-------------+--------+--------+-------
37775 | 37775 | new | ticket | 29
37968 | 37968 | open | ticket | 29
38052 | 38052 | open | ticket | 29
38176 | 38176 | open | ticket | 29
38185 | 38185 | open | ticket | 29
38386 | 38386 | open | ticket | 29
38394 | 38394 | open | ticket | 29
38403 | 38403 | open | ticket | 29
38406 | 38406 | open | ticket | 29
38422 | 38422 | open | ticket | 29
38467 | 38467 | open | ticket | 29
38474 | 38474 | open | ticket | 29
38530 | 38530 | open | ticket | 29
(13 rows)
--------------------------------------------------

The ticket with id=37775 is the one that disappear some places in the
application.

Here is the explain analyze of these statements:
--------------------------------------------------

rtprod=# explain analyze SELECT id,effectiveid,status,type,queue FROM
Tickets main WHERE ((main.EffectiveId = main.id)) AND ((main.Status !=
'deleted')) AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND (
(main.Status = 'open') OR (main.Status = 'new') ) ) ORDER BY main.id
ASC LIMIT 50;

QUERY
PLAN
--------------------------------------------------------------------------------------
Limit (cost=1658.83..1658.84 rows=1 width=33) (actual
time=4.003..4.028 rows=12 loops=1)
-> Sort (cost=1658.83..1658.84 rows=1 width=33) (actual
time=3.999..4.008 rows=12 loops=1)
Sort Key: id
-> Index Scan using tickets6, tickets6 on tickets main
(cost=0.00..1658.82 rows=1 width=33) (actual time=1.001..3.969 rows=12
loops=1)
Index Cond: (((status)::text = 'open'::text) OR
((status)::text = 'new'::text))
Filter: ((effectiveid = id) AND ((status)::text <>
'deleted'::text) AND (("type")::text = 'ticket'::text) AND (queue = 29))
Total runtime: 4.104 ms
(7 rows)

rtprod=# explain analyze SELECT id,effectiveid,status,type,queue FROM
Tickets main WHERE ((main.EffectiveId = main.id)) AND ((main.Status !=
'deleted')) AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND (
(main.Status = 'new') OR (main.Status = 'open') ) ) ORDER BY main.id
ASC LIMIT 50;

QUERY
PLAN
----------------------------------------------------------------------------------------
Limit (cost=1658.83..1658.84 rows=1 width=33) (actual
time=3.946..3.974 rows=13 loops=1)
-> Sort (cost=1658.83..1658.84 rows=1 width=33) (actual
time=3.944..3.954 rows=13 loops=1)
Sort Key: id
-> Index Scan using tickets6, tickets6 on tickets main
(cost=0.00..1658.82 rows=1 width=33) (actual time=0.790..3.909 rows=13
loops=1)
Index Cond: (((status)::text = 'new'::text) OR
((status)::text = 'open'::text))
Filter: ((effectiveid = id) AND ((status)::text <>
'deleted'::text) AND (("type")::text = 'ticket'::text) AND (queue = 29))
Total runtime: 4.052 ms
(7 rows)
--------------------------------------------------

Thanks in advance
--
Rafael Martinez, <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Rafael Martinez, Guerrero" <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Weird ..... (a=1 or a=2) <> (a=2 or a=1)
Date: 2006-05-16 15:14:21
Message-ID: 12469.1147792461@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

"Rafael Martinez, Guerrero" <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no> writes:
> I have found out the sql statement with 'problems'. Can anybody explain
> me why A) returns 12 rows and B) returns 13 rows?. The only different is
> the "open OR new" / "new OR open" part. Should not they return the samme
> result?

That is ... simply bizarre. Could we see all the system columns for the
rows in question, ie
select ctid,oid,xmin,xmax,cmin,cmax, id,effectiveid, ...
(the rest as in your second query that gets all the rows)
Leave out "oid" if you have the table made without oids.

I suspect this may be a question of a corrupt index, in which case
REINDEXing the index being used would fix it. But before you try that,
please save a physical copy of the index file (immediately after doing a
CHECKPOINT, if the database is being actively modified). I would like
to dig through it and try to understand the nature of the corruption,
if that's what the problem is.

regards, tom lane


From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Weird ..... (a=1 or a=2) <> (a=2 or a=1)
Date: 2006-05-16 16:26:05
Message-ID: 1147796766.4700.13.camel@linux.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Tue, 2006-05-16 at 11:14 -0400, Tom Lane wrote:
> "Rafael Martinez, Guerrero" <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no> writes:
> > I have found out the sql statement with 'problems'. Can anybody explain
> > me why A) returns 12 rows and B) returns 13 rows?. The only different is
> > the "open OR new" / "new OR open" part. Should not they return the samme
> > result?
>
> That is ... simply bizarre. Could we see all the system columns for the
> rows in question, ie
> select ctid,oid,xmin,xmax,cmin,cmax, id,effectiveid, ...
> (the rest as in your second query that gets all the rows)
> Leave out "oid" if you have the table made without oids.
>

rtprod=# SELECT
ctid,oid,xmin,xmax,cmin,cmax,id,effectiveid,status,type,queue FROM
Tickets main WHERE ((main.EffectiveId = main.id)) AND ((main.Status !=
'deleted'))AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND
( (main.Status= 'new') OR (main.Status = 'open') ) ) ORDER BY main.id
ASC LIMIT 50;

ctid | oid | xmin | xmax | cmin | cmax | id |
effectiveid | status | type | queue
-----------+---------+-----------+------+------+------+-------+-------------+--------+--------+-------
(880,18) | 5080015 | 431831923 | 149 | 149 | 0 | 37775 |
37775 | new | ticket | 29
(1004,13) | 5103190 | 440233858 | 0 | 0 | 0 | 37968 |
37968 | open | ticket | 29
(1004,10) | 5112604 | 440233792 | 0 | 0 | 0 | 38052 |
38052 | open | ticket | 29
(995,13) | 5130149 | 440233870 | 0 | 0 | 0 | 38176 |
38176 | open | ticket | 29
(1020,2) | 5132134 | 441184224 | 0 | 0 | 0 | 38185 |
38185 | open | ticket | 29
(1004,5) | 5160459 | 440828297 | 38 | 38 | 0 | 38386 |
38386 | open | ticket | 29
(1004,3) | 5161571 | 440233745 | 0 | 0 | 0 | 38394 |
38394 | open | ticket | 29
(1020,5) | 5163792 | 441195836 | 38 | 38 | 0 | 38403 |
38403 | open | ticket | 29
(1019,3) | 5164449 | 441183696 | 38 | 38 | 0 | 38406 |
38406 | open | ticket | 29
(1015,14) | 5167225 | 441188439 | 38 | 38 | 0 | 38422 |
38422 | open | ticket | 29
(1021,3) | 5172082 | 441185101 | 38 | 38 | 0 | 38474 |
38474 | open | ticket | 29
(968,37) | 5176170 | 440990670 | 0 | 0 | 0 | 38530 |
38530 | open | ticket | 29
(1015,11) | 5177554 | 441183605 | 0 | 0 | 0 | 38539 |
38539 | open | ticket | 29
(13 rows)

> I suspect this may be a question of a corrupt index, in which case
> REINDEXing the index being used would fix it.

This is what I thought when I found out the problem. So before I sent my
first e-mail I executed a "reindex index tickets6" but it did not help.

> But before you try that,
> please save a physical copy of the index file (immediately after doing a
> CHECKPOINT, if the database is being actively modified). I would like
> to dig through it and try to understand the nature of the corruption,
> if that's what the problem is.
>

--------------------------------------------------
rtprod=# SELECT relname,relfilenode from pg_class where relname =
'tickets6';
relname | relfilenode
----------+-------------
tickets6 | 5177103

rtprod=# checkpoint;
CHECKPOINT

-bash-2.05b$ ls -l 5177103
-rw------- 1 postgres pgdba 1056768 May 16 18:10 5177103
--------------------------------------------------

The index file after a CHECKPOINT can be downloaded from
http://folk.uio.no/rafael/5177103

A new reindex does not help:
--------------------------------------------------
rtprod=# reindex index tickets6;
REINDEX

SELECT id,effectiveid,status,type,queue FROM Tickets main WHERE
((main.EffectiveId = main.id)) AND ((main.Status != 'deleted'))AND
((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status=
'open') OR (main.Status = 'new') ) ) ORDER BY main.id ASC LIMIT 50;
id | effectiveid | status | type | queue
-------+-------------+--------+--------+-------
37968 | 37968 | open | ticket | 29
38052 | 38052 | open | ticket | 29
38176 | 38176 | open | ticket | 29
38185 | 38185 | open | ticket | 29
38386 | 38386 | open | ticket | 29
38394 | 38394 | open | ticket | 29
38403 | 38403 | open | ticket | 29
38406 | 38406 | open | ticket | 29
38422 | 38422 | open | ticket | 29
38474 | 38474 | open | ticket | 29
38530 | 38530 | open | ticket | 29
38539 | 38539 | open | ticket | 29
(12 rows)

rtprod=# SELECT id,effectiveid,status,type,queue FROM Tickets main WHERE
((main.EffectiveId = main.id)) AND ((main.Status != 'deleted'))AND
((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status=
'new') OR (main.Status = 'open') ) ) ORDER BY main.id ASC LIMIT 50;
id | effectiveid | status | type | queue
-------+-------------+--------+--------+-------
37775 | 37775 | new | ticket | 29
37968 | 37968 | open | ticket | 29
38052 | 38052 | open | ticket | 29
38176 | 38176 | open | ticket | 29
38185 | 38185 | open | ticket | 29
38386 | 38386 | open | ticket | 29
38394 | 38394 | open | ticket | 29
38403 | 38403 | open | ticket | 29
38406 | 38406 | open | ticket | 29
38422 | 38422 | open | ticket | 29
38474 | 38474 | open | ticket | 29
38530 | 38530 | open | ticket | 29
38539 | 38539 | open | ticket | 29
(13 rows)
--------------------------------------------------

--
Rafael Martinez, <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Weird ..... (a=1 or a=2) <> (a=2 or a=1)
Date: 2006-05-16 16:32:48
Message-ID: 13741.1147797168@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no> writes:
> On Tue, 2006-05-16 at 11:14 -0400, Tom Lane wrote:
>> I suspect this may be a question of a corrupt index, in which case
>> REINDEXing the index being used would fix it.

> This is what I thought when I found out the problem. So before I sent my
> first e-mail I executed a "reindex index tickets6" but it did not help.

So much for that theory. If you copy the table (create table foo as
select * from tickets) and build a similar index on the copy, does the
behavior persist in the copy?

I'm wondering a little bit about encoding issues. What encoding does
the database have (see \l) and what do SHOW LC_COLLATE and SHOW LC_CTYPE
show?

regards, tom lane


From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Weird ..... (a=1 or a=2) <> (a=2 or a=1)
Date: 2006-05-16 16:53:41
Message-ID: 1147798422.4700.19.camel@linux.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Tue, 2006-05-16 at 12:32 -0400, Tom Lane wrote:
> Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no> writes:
> > On Tue, 2006-05-16 at 11:14 -0400, Tom Lane wrote:
> >> I suspect this may be a question of a corrupt index, in which case
> >> REINDEXing the index being used would fix it.
>
> > This is what I thought when I found out the problem. So before I sent my
> > first e-mail I executed a "reindex index tickets6" but it did not help.
>
> So much for that theory. If you copy the table (create table foo as
> select * from tickets) and build a similar index on the copy, does the
> behavior persist in the copy?
>

The new table behaves well:
----------------------------------------------
rtprod=# CREATE TABLE foo as select * from tickets;
SELECT

rtprod=# CREATE INDEX foo6 on foo (status);
CREATE INDEX

rtprod=# SELECT id,effectiveid,status,type,queue FROM foo main WHERE
((main.EffectiveId = main.id)) AND ((main.Status != 'deleted'))AND
((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status=
'open') OR (main.Status = 'new') ) ) ORDER BY main.id ASC LIMIT 50;
id | effectiveid | status | type | queue
-------+-------------+--------+--------+-------
37775 | 37775 | new | ticket | 29
37968 | 37968 | open | ticket | 29
38052 | 38052 | open | ticket | 29
38176 | 38176 | open | ticket | 29
38185 | 38185 | open | ticket | 29
38386 | 38386 | open | ticket | 29
38394 | 38394 | open | ticket | 29
38403 | 38403 | open | ticket | 29
38406 | 38406 | open | ticket | 29
38422 | 38422 | open | ticket | 29
38474 | 38474 | open | ticket | 29
38530 | 38530 | open | ticket | 29
38539 | 38539 | open | ticket | 29
(13 rows)

rtprod=# SELECT id,effectiveid,status,type,queue FROM foo main WHERE
((main.EffectiveId = main.id)) AND ((main.Status != 'deleted'))AND
((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status=
'new') OR (main.Status = 'open') ) ) ORDER BY main.id ASC LIMIT 50;
id | effectiveid | status | type | queue
-------+-------------+--------+--------+-------
37775 | 37775 | new | ticket | 29
37968 | 37968 | open | ticket | 29
38052 | 38052 | open | ticket | 29
38176 | 38176 | open | ticket | 29
38185 | 38185 | open | ticket | 29
38386 | 38386 | open | ticket | 29
38394 | 38394 | open | ticket | 29
38403 | 38403 | open | ticket | 29
38406 | 38406 | open | ticket | 29
38422 | 38422 | open | ticket | 29
38474 | 38474 | open | ticket | 29
38530 | 38530 | open | ticket | 29
38539 | 38539 | open | ticket | 29
(13 rows)
----------------------------------------------

> I'm wondering a little bit about encoding issues. What encoding does
> the database have (see \l)

SQL_ASCII

> and what do SHOW LC_COLLATE and SHOW LC_CTYPE
> show?
>

----------------------------------------------
rtprod=# SHOW LC_COLLATE;
lc_collate
------------
C
(1 row)

rtprod=# SHOW LC_CTYPE;
lc_ctype
----------
C
(1 row)
----------------------------------------------

--
Rafael Martinez, <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Weird ..... (a=1 or a=2) <> (a=2 or a=1)
Date: 2006-05-16 17:01:51
Message-ID: 14225.1147798911@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no> writes:
> On Tue, 2006-05-16 at 11:14 -0400, Tom Lane wrote:
>> So much for that theory. If you copy the table (create table foo as
>> select * from tickets) and build a similar index on the copy, does the
>> behavior persist in the copy?

> The new table behaves well:

Did you check that you were getting the same indexscan plans there?
(If not, try ANALYZEing the copied table.) You probably were, but
just in case.

This is definitely pretty baffling. I'm getting to the point where
I want to step through the code with a debugger. I assume that's not
very practical on your live server. Would it be feasible at all to
get a physical copy of the database for testing? Alternatively, do
you know C and gdb well enough to try to debug it for yourself?

regards, tom lane


From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Weird ..... (a=1 or a=2) <> (a=2 or a=1)
Date: 2006-05-16 18:05:48
Message-ID: 1147802749.4700.44.camel@linux.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Tue, 2006-05-16 at 13:01 -0400, Tom Lane wrote:
> Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no> writes:
> > On Tue, 2006-05-16 at 11:14 -0400, Tom Lane wrote:
> >> So much for that theory. If you copy the table (create table foo as
> >> select * from tickets) and build a similar index on the copy, does the
> >> behavior persist in the copy?
>
> > The new table behaves well:
>
> Did you check that you were getting the same indexscan plans there?
> (If not, try ANALYZEing the copied table.) You probably were, but
> just in case.
>

Yes I did, it looks good:
-------------------------------------------------------
rtprod=# explain analyze SELECT id,effectiveid,status,type,queue FROM
foo main WHERE ((main.EffectiveId = main.id)) AND ((main.Status !=
'deleted'))AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND
( (main.Status= 'open') OR (main.Status = 'new') ) ) ORDER BY main.id
ASC LIMIT 50;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1842.29..1842.30 rows=1 width=33) (actual
time=4.112..4.142 rows=13 loops=1)
-> Sort (cost=1842.29..1842.30 rows=1 width=33) (actual
time=4.109..4.119 rows=13 loops=1)
Sort Key: id
-> Index Scan using foo6, foo6 on foo main
(cost=0.00..1842.28 rows=1 width=33) (actual time=1.895..4.072 rows=13
loops=1)
Index Cond: (((status)::text = 'open'::text) OR
((status)::text = 'new'::text))
Filter: ((effectiveid = id) AND ((status)::text <>
'deleted'::text) AND (("type")::text = 'ticket'::text) AND (queue = 29))
Total runtime: 4.216 ms
(7 rows)

rtprod=# explain analyze SELECT id,effectiveid,status,type,queue FROM
foo main WHERE ((main.EffectiveId = main.id)) AND ((main.Status !=
'deleted'))AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND
( (main.Status= 'new') OR (main.Status = 'open') ) ) ORDER BY main.id
ASC LIMIT 50;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1842.29..1842.30 rows=1 width=33) (actual
time=4.121..4.149 rows=13 loops=1)
-> Sort (cost=1842.29..1842.30 rows=1 width=33) (actual
time=4.117..4.128 rows=13 loops=1)
Sort Key: id
-> Index Scan using foo6, foo6 on foo main
(cost=0.00..1842.28 rows=1 width=33) (actual time=0.800..4.084 rows=13
loops=1)
Index Cond: (((status)::text = 'new'::text) OR
((status)::text = 'open'::text))
Filter: ((effectiveid = id) AND ((status)::text <>
'deleted'::text) AND (("type")::text = 'ticket'::text) AND (queue = 29))
Total runtime: 4.228 ms
(7 rows)
-------------------------------------------------------

> This is definitely pretty baffling. I'm getting to the point where
> I want to step through the code with a debugger. I assume that's not
> very practical on your live server. Would it be feasible at all to
> get a physical copy of the database for testing? Alternatively, do
> you know C and gdb well enough to try to debug it for yourself?
>

I am going to make a physical copy of the database and install it in a
test server (it is ca.3.6GB). I can C and some gdb but I do not work
with them on a daily basis and it is a long time since I battled with
them. I think I don't have the level needed to find the reason of this
problem in this case.

I have to get the approval from the system owner before an external
person can get access to the test server (tomorrow is the national day
here in Norway, so I can not do anything until thursday). I don't think
this will be a problem, the best thing will be to get your public ssh
key so you can login without a password.

--
Rafael Martinez, <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Rafael Martinez, Guerrero" <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [GENERAL] Weird ..... (a=1 or a=2) <> (a=2 or a=1)
Date: 2006-05-19 15:34:15
Message-ID: 29491.1148052855@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Many thanks for allowing me to trace through your problem case.
It's a real Postgres bug, and a nasty one. The problem is a thinko in
nodeIndexscan.c's code that tests whether the same tuple has already
been emitted in a previous OR'd scan: it is looking for a match on
tuple->t_data->t_ctid, when what it should really be looking at is
tuple->t_self. What I find is that the indexscan for status == open
is returning TID (880,5), which has XMAX_INVALID and a t_ctid pointing
at (880,18). (This is perfectly normal, it just indicates that
somebody tried to update the row but the updating transaction rolled
back, and the updated version at 880,18 was later recycled by VACUUM.)
So this causes a bogus rejection when TID (880,18) is scanned during the
second indexscan.

This only affects the 7.4 and 8.0 branches, because earlier and later
versions of Postgres don't use this technique for detecting duplicates.
But it's surprising we didn't find it before.

Patches will appear in next week's releases. Thanks again!

regards, tom lane


From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [GENERAL] Weird ..... (a=1 or a=2) <> (a=2 or a=1)
Date: 2006-05-19 16:08:05
Message-ID: 1148054885.26700.6.camel@linux.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, 2006-05-19 at 11:34 -0400, Tom Lane wrote:
[.......]
>
> This only affects the 7.4 and 8.0 branches, because earlier and later
> versions of Postgres don't use this technique for detecting duplicates.
> But it's surprising we didn't find it before.
>
> Patches will appear in next week's releases. Thanks again!
>

Thanks to you for finding and fixing the problem :-)
It looks like you are finish so I will update the server and you will
lose access to it.

regards
--
Rafael Martinez, <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Rafael Martinez, Guerrero" <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Subject: Re: [GENERAL] Weird ..... (a=1 or a=2) <> (a=2 or a=1)
Date: 2006-05-23 18:51:44
Message-ID: b42b73150605231151r5c3e1a07r329eba455691dde0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 5/19/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> This only affects the 7.4 and 8.0 branches, because earlier and later
> versions of Postgres don't use this technique for detecting duplicates.
> But it's surprising we didn't find it before.

hm. about a year ago I reported a case where the database allowed
multiple records with the same p-key which were causing problems with
dump/reload from 8.0->8.1. It was pretty rare, but it looked like
under certain circumstances unique constraint was not getting applied.
I was unable to reproduce it, though.

Is it possible that this bug was the cause of that particular problem?

Merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, "Rafael Martinez, Guerrero" <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Subject: Re: [GENERAL] Weird ..... (a=1 or a=2) <> (a=2 or a=1)
Date: 2006-05-23 19:18:21
Message-ID: 17772.1148411901@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

"Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
> On 5/19/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> This only affects the 7.4 and 8.0 branches, because earlier and later
>> versions of Postgres don't use this technique for detecting duplicates.
>> But it's surprising we didn't find it before.

> hm. about a year ago I reported a case where the database allowed
> multiple records with the same p-key which were causing problems with
> dump/reload from 8.0->8.1. It was pretty rare, but it looked like
> under certain circumstances unique constraint was not getting applied.
> I was unable to reproduce it, though.

Yeah, I remember.

> Is it possible that this bug was the cause of that particular problem?

No, this is unrelated. It only occurs in a query that's fetching rows
using OR'd indexscans, eg

SELECT ... WHERE indexedcol = 42 OR indexedcol = 47 OR ...;

(you can spell it "indexedcol IN (42,47,...)" with same results) and
the problem is basically incorrect detection of fetching the same row
more than once, ie, a bug in the code that's in charge of not returning
rows multiple times if query is like

SELECT ... WHERE indexedcol = 42 OR indexedcol = 42 OR ...;

This is nowhere near the unique-constraint enforcement mechanism.

regards, tom lane