regarding ROW comparisons

From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To: pgsql <pgsql-general(at)postgresql(dot)org>
Subject: regarding ROW comparisons
Date: 2011-03-06 17:32:27
Message-ID: AANLkTi=c37RyoheHOF34Kd5WnCJL0cqvKdzdHWndwoDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I am working with a system which periodically has to perform this operation:

update all of the rows in table A which match rows in table B, using a
subset of the columns for comparison (and one of the columns in tableA
is NULL).

This is what I've tried:

update tableA
SET column1 = some_value
FROM tableB
WHERE
ROW(tableA.column2, tableA.column4)
IS NOT DISTINCT FROM
ROW(tableB.column2, NULL)
AND
(tableB.column3 = 1 OR tableB.column3 = 2)

I've also tried:

update tableA
SET column1 = some_value
FROM tableB
WHERE
tableA.column2 = tableB.column2
AND
tableA.column4 IS NULL
AND
(tableB.column3 = 1 OR tableB.column3 = 2)

and:

update tableA
SET column1 = some_value
WHERE
EXISTS (
SELECT 1 FROM tableB
WHERE
tableA.column2 = tableB.column2
AND
(tableB.column3 = 1 OR tableB.column3 = 2)
)
AND
tableA.column4 IS NULL

assuming column2 from either table will never be NULL, and
tableB.column3 will also never be NULL.
Furthermore, assume that tableA.column4 is usually NULL and that the
number of rows in tableB is typically only a small percentage of the
number of rows in tableA.

The costs associated with each if these look like this:
The basic join version:
Hash Join (cost=33441.00..54830.46 rows=302878 width=81)

The ROW version:
Nested Loop (cost=0.00..42872.28 rows=1 width=81)

The EXISTS version:
Hash Semi Join (cost=18739.28..50014.85 rows=302878 width=81)

The basic join version usually runs fast enough (say, 6-10 seconds).
The EXISTS version starts out faster (typically a bit faster than the
basic join version).
The ROW version never completes (going on 45 minutes now, but I've let
them go for up to 6 hours).

I can only assume I'm doing something wrong.
The column types are nothing special (INT, TEXT, sometimes INET).
There are indexes on tableA but not on table B.

I am using postgresql 8.4.5 and I have tried on both CentOS and
openSUSE with the same results.

--
Jon

Browse pgsql-general by date

  From Date Subject
Next Message ray 2011-03-06 18:43:22 How to Create Table from CSV
Previous Message Reid Thompson 2011-03-06 16:24:21 Re: Web Hosting