Re: full outer performance problem

From: Kim Bisgaard <kib+pg(at)dmi(dot)dk>
To: pgsql-performance(at)postgresql(dot)org
Cc: Kim Bisgaard <kib+pg(at)dmi(dot)dk>
Subject: Re: full outer performance problem
Date: 2005-06-08 12:32:28
Message-ID: 42A6E55C.4030203@dmi.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Bruno,

Thanks for the moral support! I feel so too - but I am confident it will
show up soon.

W.r.t. your rewrite of the query, I get this "ERROR: could not devise a
query plan for the given query" but no further details - I will try google

Regards,
Kim.

Bruno Wolff III wrote:

>On Wed, Jun 08, 2005 at 11:37:40 +0200,
> Kim Bisgaard <kib+pg(at)dmi(dot)dk> wrote:
>
>
>>Hi,
>>
>>I'm having problems with the query optimizer and FULL OUTER JOIN on
>>PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins.
>>I might be naive, but I think that it should be possible?
>>
>>I have two BIG tables (virtually identical) with 3 NOT NULL columns
>>Station_id, TimeObs, Temp_XXXX, with unique indexes on (Station_id,
>>TimeObs) and valid ANALYSE (set statistics=100). I want to join the two
>>tables with a FULL OUTER JOIN.
>>
>>When I specify the query as:
>>
>>SELECT station_id, timeobs,temp_grass, temp_dry_at_2m
>> FROM temp_dry_at_2m a
>> FULL OUTER JOIN temp_grass b
>> USING (station_id, timeobs)
>> WHERE station_id = 52981
>> AND timeobs = '2004-1-1 0:0:0'
>>
>>I get the correct results
>>
>>station_id | timeobs | temp_grass | temp_dry_at_2m
>>------------+---------------------+------------+----------------
>> 52944 | 2004-01-01 00:10:00 | | -1.1
>>(1 row)
>>
>>BUT LOUSY performance, and the following EXPLAIN:
>>
>> QUERY PLAN
>>------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>Merge Full Join (cost=1542369.83..1618958.58 rows=6956994 width=32)
>>(actual time=187176.408..201436.264 rows=1 loops=1)
>> Merge Cond: (("outer".station_id = "inner".station_id) AND
>> ("outer".timeobs = "inner".timeobs))
>> Filter: ((COALESCE("outer".station_id, "inner".station_id) = 52981) AND
>> (COALESCE("outer".timeobs, "inner".timeobs) = '2004-01-01
>> 00:00:00'::timestamp without time zone))
>> -> Sort (cost=1207913.44..1225305.93 rows=6956994 width=16) (actual
>> time=145748.253..153851.607 rows=6956994 loops=1)
>> Sort Key: a.station_id, a.timeobs
>> -> Seq Scan on temp_dry_at_2m a (cost=0.00..117549.94
>> rows=6956994 width=16) (actual time=0.049..54226.770 rows=6956994
>> loops=1)
>> -> Sort (cost=334456.38..340472.11 rows=2406292 width=16) (actual
>> time=31668.876..34491.123 rows=2406292 loops=1)
>> Sort Key: b.station_id, b.timeobs
>> -> Seq Scan on temp_grass b (cost=0.00..40658.92 rows=2406292
>> width=16) (actual time=0.052..5484.489 rows=2406292 loops=1)
>>Total runtime: 201795.989 ms
>>(10 rows)
>>
>>
>
>Someone else will need to comment on why Postgres can't use a more
>efficient plan. What I think will work for you is to restrict
>the station_id and timeobs on each side and then do a full join.
>You can try something like the sample query below (which hasn't been tested):
>SELECT station_id, timeobs, temp_grass, temp_dry_at_2m
> FROM
> (SELECT station_id, timeobs, temp_dry_at_2m
> FROM temp_dry_at_2m
> WHERE
> station_id = 52981
> AND
> timeobs = '2004-1-1 0:0:0') a
> FULL OUTER JOIN
> (SELECT station_id, timeobs, temp_grass
> FROM temp_grass
> WHERE
> station_id = 52981
> AND
> timeobs = '2004-1-1 0:0:0') b
> USING (station_id, timeobs)
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message George Essig 2005-06-08 13:34:29 Re: SELECT DISTINCT Performance Issue
Previous Message Martin Fandel 2005-06-08 12:25:52 Re: Filesystem