DISTINCT is not quite distinct

Lists: pgsql-general
From: Florian Weimer <fweimer(at)bfk(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: DISTINCT is not quite distinct
Date: 2006-11-06 13:51:38
Message-ID: 82lkmoy979.fsf@mid.bfk.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I run this innocent query

CREATE TABLE foo AS SELECT DISTINCT bar FROM baz ORDER BY bar;

and the resulting table contains duplicate rows. 8-(

According to EXPLAIN, an index scan on the bar column is used (using
the underlying B-tree index). This is with PostgreSQL 8.1.4 (Debian
package 8.1.4-6). Is this a known problem?

If I drop the DISTINCT, the output is not correctly ordered, either.
Perhaps this is an index corruption issue? The hardware itself seems
fine.

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99


From: Richard Huxton <dev(at)archonet(dot)com>
To: Florian Weimer <fweimer(at)bfk(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: DISTINCT is not quite distinct
Date: 2006-11-06 14:34:31
Message-ID: 454F47F7.6090108@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Florian Weimer wrote:
> I run this innocent query
>
> CREATE TABLE foo AS SELECT DISTINCT bar FROM baz ORDER BY bar;
>
> and the resulting table contains duplicate rows. 8-(
>
> According to EXPLAIN, an index scan on the bar column is used (using
> the underlying B-tree index). This is with PostgreSQL 8.1.4 (Debian
> package 8.1.4-6). Is this a known problem?
>
> If I drop the DISTINCT, the output is not correctly ordered, either.
> Perhaps this is an index corruption issue? The hardware itself seems
> fine.

Could be index corruption perhaps.

I take it SELECT DISTINCT bar... shows the same problem?

If so, can you do:
SELECT OID,xmin,cmin,xmax,cmax,bar FROM baz
WHERE bar = <something with duplicates>

--
Richard Huxton
Archonet Ltd


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Florian Weimer <fweimer(at)bfk(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: DISTINCT is not quite distinct
Date: 2006-11-06 14:42:03
Message-ID: 10738.1162824123@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Florian Weimer <fweimer(at)bfk(dot)de> writes:
> I run this innocent query
> CREATE TABLE foo AS SELECT DISTINCT bar FROM baz ORDER BY bar;
> and the resulting table contains duplicate rows. 8-(

> According to EXPLAIN, an index scan on the bar column is used (using
> the underlying B-tree index).

Do you mean an indexscan followed immediately by a Unique node? If
so, yeah, that would depend entirely on correct ordering of the
indexscan output to produce distinct results.

> If I drop the DISTINCT, the output is not correctly ordered, either.
> Perhaps this is an index corruption issue? The hardware itself seems
> fine.

Perhaps. Do you want to save off a physical copy of the index and then
try REINDEXing? If that fixes it, I'd be interested to compare the two
versions of the index.

regards, tom lane


From: Florian Weimer <fweimer(at)bfk(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: DISTINCT is not quite distinct
Date: 2006-11-06 17:10:30
Message-ID: 82bqnkxzzt.fsf@mid.bfk.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

* Tom Lane:

>> According to EXPLAIN, an index scan on the bar column is used (using
>> the underlying B-tree index).
>
> Do you mean an indexscan followed immediately by a Unique node? If
> so, yeah, that would depend entirely on correct ordering of the
> indexscan output to produce distinct results.

Yes.

>> If I drop the DISTINCT, the output is not correctly ordered, either.
>> Perhaps this is an index corruption issue? The hardware itself seems
>> fine.
>
> Perhaps. Do you want to save off a physical copy of the index and then
> try REINDEXing?

The duplicate row is gone.

> If that fixes it, I'd be interested to compare the two versions of
> the index.

The index files are about 155 MB and 98 MB, compressed. How shall we
transfer them? (Their contents is not super-secret, but I don't want
to distribute them widely, either.)

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99