Re: Largely inconsistent query execution speed, involving psql_tmp

From: Spiros Ioannou <sivann(at)inaccess(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Largely inconsistent query execution speed, involving psql_tmp
Date: 2014-07-09 11:27:01
Message-ID: CACKh8C_zekThevJoqWPr3s64mwUDOAiALf4-3F0cvOwfFbjY_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

There are indices:
table:measurement_events "measurement_events_pkey" PRIMARY KEY, btree
(measurement_source_id, measurement_time)
table:measurement_sources "measurement_sources_pkey" PRIMARY KEY, btree
(measurement_source_id)

-Spiros

On 8 July 2014 18:10, Andy Colson <andy(at)squeakycode(dot)net> wrote:

> On 7/8/2014 4:47 AM, Spiros Ioannou wrote:
>
>> While executing the following query through psql :
>>
>> SELECT me.* FROM measurement_events me JOIN msrcs_timestamps mt ON
>> me.measurement_source_id=mt.measurement_source_id WHERE
>> measurement_time > last_update_time
>>
>> there are two behaviors observed by postgresql (8.4):
>> 1) Either the query performs lots of reads on the database and completes
>> in about 4 hours (that is the normal-expected behavior)
>> 2) Either the query starts filling-up pgsql_tmp and this causes large
>> write I/O on the server, and the query never actually completes on a
>> reasonable time (we stop it after 10h).
>>
>> For some strange reason, behaviour 2 is always observed when running
>> psql through a bash script, while behavior 1 is only observed while
>> running psql interactively from command line (but not always).
>>
>> explain:
>> # explain SELECT me.* FROM measurement_events me JOIN msrcs_timestamps
>> mt ON me.measurement_source_id=mt.measurement_source_id WHERE
>> measurement_time > last_update_time;
>> QUERY PLAN
>> ------------------------------------------------------------
>> ----------------------------------
>> Hash Join (cost=10111.78..422893652.69 rows=2958929695 width=103)
>> Hash Cond: (me.measurement_source_id = mt.measurement_source_id)
>> Join Filter: (me.measurement_time > mt.last_update_time)
>> -> Seq Scan on measurement_events me (cost=0.00..234251772.85
>> rows=8876789085 width=103)
>> -> Hash (cost=5733.57..5733.57 rows=350257 width=24)
>> -> Seq Scan on msrcs_timestamps mt (cost=0.00..5733.57
>> rows=350257 width=24)
>> (6 rows)
>>
>>
>> We have tried so far fiddling with work_mem up to 512M - no difference.
>> Any suggestions?
>>
>>
>>
>> Thanks for any help,
>> -Spiros Ioannou
>> inaccess
>>
>>
> Is there any reason you don't have an index?
>
> One, or both, of these will help:
>
> create index measurement_events_pk on measurement_events(
> measurement_source_id);
>
> create index msrcs_timestamps_pk on msrcs_timestamps(measurement_
> source_id);
>
>
>
> measurement_events has 8 billion rows, so expect it to take a while, but
> its a one time cost and should _dramatically_ increase your query
> performance.
>
> -Andy
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message hubert depesz lubaczewski 2014-07-09 11:29:44 Re: BAKUP ISSUE
Previous Message Bill Moran 2014-07-09 11:14:32 Re: php password authentication failed for user ...