Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search archives
  Advanced Search

limited field duplicates


  • From: Don Isgitt <djisgitt(at)soundenergy(dot)com>
  • To: postgresql general <pgsql-general(at)postgresql(dot)org>
  • Subject: limited field duplicates
  • Date: Fri, 31 Jan 2003 10:17:10 -0600
  • Message-id: <3E3AA186.3080800@soundenergy.com> <text/plain>

Hello,

I am trying to find rows in a table that are duplicates only in the sense of certain columns; i.e, not all columns are the same, and then for each duplicate entry, I want to retrieve one of the columns that is not part of the duplication criteria. Ok, enough English description, here is the real information (simplified).
               Table "master" --~925000 records
     Column       |     Type     | Modifiers
-------------------+--------------+-----------
apinum            | text         | not null
state             | character(2) | not null
county            | text         | not null
township_n        | integer      |
township_d        | character(1) |
range_n           | integer      |
range_d           | character(1) |
section           | integer      |
location          | text         | not null

(Indexes on state,county,township,range,apinum)

Find duplicates based on state,county,township_n||township_d,range_n||range_d,section

select state,county,township_n||township_d as township,range_n||range_d as range,section,location,count(*) from master where state='OK' and county='MAJOR' group by state,county,township,range,section,location having count(*)>1;

Works great, is efficient: now, how do I get the apinum associated with each of the limited duplicate rows (i.e., those returned in the above query). I need the apinum because another system involved only knows about apinum.

It must be simple, but I have failed miserably in finding the answer. The following horrible query works, but it takes forever for even the small sample above.

explain select m.apinum,m.township_n||m.township_d as township,m.range_n||m.range_d as range,m.section,m.location from master m where (select count(*) from master m1 where m.township_n||m.township_d=m1.township_n||m1.township_d and m.range_n||m.range_d=m1.range_n||m1.range_d and m.section=m1.section and m.location=m1.location and m1.state='OK' and m1.county='MAJOR') > 1 and m.state='OK' and m.county='MAJOR' order by township,range,section;
NOTICE:  QUERY PLAN:

Sort  (cost=7040332.16..7040332.16 rows=450 width=55)
-> Index Scan using mstcty on master m (cost=0.00..7040312.32 rows=450 width=55)
       SubPlan
         ->  Aggregate  (cost=5208.86..5208.86 rows=1 width=0)
-> Index Scan using mstcty on master m1 (cost=0.00..5208.85 rows=1 width=0)

EXPLAIN


So, somebody please enlighten me. All help is appreciated.

Don





Home | Main Index | Thread Index

Privacy Policy | About PostgreSQL
Copyright © 1996 – 2012 PostgreSQL Global Development Group