Re: Deleting orphaned records to establish Ref Integrity

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: romanf(at)fusemail(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Deleting orphaned records to establish Ref Integrity
Date: 2005-06-02 06:36:09
Message-ID: 10442.1117694169@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Roman F" <romanf(at)fusemail(dot)com> writes:
> ... Executing something like
> the following statement would work, but even with indexes it takes an
> insane amount of time to execute for each of the tables:

> DELETE FROM child_table WHERE parentid NOT IN
> (SELECT parentid FROM parent_table)

Uh, what sort of query plan are you getting for that?

PG 7.4 and up can do a reasonable job with NOT IN if the sub-select is
small enough to fit into an in-memory hash table (of size sort_mem).
I'm betting that your sort_mem setting is not high enough to encourage
the planner to try the hash method. You could try increasing sort_mem
... but given the size of your tables, you might end up with a hash
table large enough to drive the system into swapping, in which case
it'll still be mighty slow.

Another idea is to try an outer join:

SELECT child_table.parentid INTO tmp_table
FROM child_table LEFT JOIN parent_table
ON (child_table.parentid = parent_table.parentid)
WHERE parent_table.parentid IS NULL;

which essentially does a join and then pulls out just the child_table
rows that failed to match. This will probably end up getting done via a
merge join or hybrid hash join, either of which are more scalable than
the NOT IN code. You still have to do the actual deletions in
child_table, but as long as there aren't too many, a NOT IN using
tmp_table should work OK.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dinesh Pandey 2005-06-02 07:04:58 index row size 2728 exceeds btree maximum, 2713
Previous Message Joachim Zobel 2005-06-02 06:33:03 Limits of SQL