Lists: | pgsql-general |
---|
From: | Florian Weimer <fweimer(at)bfk(dot)de> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | DISTINCT is not quite distinct |
Date: | 2006-11-06 13:51:38 |
Message-ID: | 82lkmoy979.fsf@mid.bfk.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
I run this innocent query
CREATE TABLE foo AS SELECT DISTINCT bar FROM baz ORDER BY bar;
and the resulting table contains duplicate rows. 8-(
According to EXPLAIN, an index scan on the bar column is used (using
the underlying B-tree index). This is with PostgreSQL 8.1.4 (Debian
package 8.1.4-6). Is this a known problem?
If I drop the DISTINCT, the output is not correctly ordered, either.
Perhaps this is an index corruption issue? The hardware itself seems
fine.
--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | Florian Weimer <fweimer(at)bfk(dot)de> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: DISTINCT is not quite distinct |
Date: | 2006-11-06 14:34:31 |
Message-ID: | 454F47F7.6090108@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Florian Weimer wrote:
> I run this innocent query
>
> CREATE TABLE foo AS SELECT DISTINCT bar FROM baz ORDER BY bar;
>
> and the resulting table contains duplicate rows. 8-(
>
> According to EXPLAIN, an index scan on the bar column is used (using
> the underlying B-tree index). This is with PostgreSQL 8.1.4 (Debian
> package 8.1.4-6). Is this a known problem?
>
> If I drop the DISTINCT, the output is not correctly ordered, either.
> Perhaps this is an index corruption issue? The hardware itself seems
> fine.
Could be index corruption perhaps.
I take it SELECT DISTINCT bar... shows the same problem?
If so, can you do:
SELECT OID,xmin,cmin,xmax,cmax,bar FROM baz
WHERE bar = <something with duplicates>
--
Richard Huxton
Archonet Ltd
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Florian Weimer <fweimer(at)bfk(dot)de> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: DISTINCT is not quite distinct |
Date: | 2006-11-06 14:42:03 |
Message-ID: | 10738.1162824123@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Florian Weimer <fweimer(at)bfk(dot)de> writes:
> I run this innocent query
> CREATE TABLE foo AS SELECT DISTINCT bar FROM baz ORDER BY bar;
> and the resulting table contains duplicate rows. 8-(
> According to EXPLAIN, an index scan on the bar column is used (using
> the underlying B-tree index).
Do you mean an indexscan followed immediately by a Unique node? If
so, yeah, that would depend entirely on correct ordering of the
indexscan output to produce distinct results.
> If I drop the DISTINCT, the output is not correctly ordered, either.
> Perhaps this is an index corruption issue? The hardware itself seems
> fine.
Perhaps. Do you want to save off a physical copy of the index and then
try REINDEXing? If that fixes it, I'd be interested to compare the two
versions of the index.
regards, tom lane
From: | Florian Weimer <fweimer(at)bfk(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: DISTINCT is not quite distinct |
Date: | 2006-11-06 17:10:30 |
Message-ID: | 82bqnkxzzt.fsf@mid.bfk.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
* Tom Lane:
>> According to EXPLAIN, an index scan on the bar column is used (using
>> the underlying B-tree index).
>
> Do you mean an indexscan followed immediately by a Unique node? If
> so, yeah, that would depend entirely on correct ordering of the
> indexscan output to produce distinct results.
Yes.
>> If I drop the DISTINCT, the output is not correctly ordered, either.
>> Perhaps this is an index corruption issue? The hardware itself seems
>> fine.
>
> Perhaps. Do you want to save off a physical copy of the index and then
> try REINDEXing?
The duplicate row is gone.
> If that fixes it, I'd be interested to compare the two versions of
> the index.
The index files are about 155 MB and 98 MB, compressed. How shall we
transfer them? (Their contents is not super-secret, but I don't want
to distribute them widely, either.)
--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99