same query in high number of times

Lists: pgsql-performance
From: Peter Alban <peter(dot)alban2(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: same query in high number of times
Date: 2009-06-21 10:54:40
Message-ID: 477dfcc10906210354m6e00b2b3v195e890c344cb97f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hey folks !

Still kind of analyzing the situation , I realized that I do have a
reasonably high shared_memory and effective_cache_size , though if the same
query is being run in a number of times ~100-200 concurrent connection it is
not being cached .

Should PG realize that if the table data is same should the query result set
also be the same ? Instead each query takes up to 1-2 seconds .

Where do I see what the PG does ? I can see now the query's that take long
time ,but do not have information about what the optimizer does neither when
the DB decides about to table scan or cache ?

cheers,
Peter


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Alban <peter(dot)alban2(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: same query in high number of times
Date: 2009-06-21 17:42:23
Message-ID: 603c8f070906211042k4dc3383dm10c9475eab195182@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Sun, Jun 21, 2009 at 6:54 AM, Peter Alban<peter(dot)alban2(at)gmail(dot)com> wrote:
> Should PG realize that if the table data is same should the query result set
> also be the same ?

No. That's not so easy to implement as you might think. Saving the
results of each previous query in case someone issues the same query
again without having changed anything in the meantime would probably
cost more in performance on average that you'd get out of it.

> Where do I see what the PG does ? I can see now the query's that take long
> time ,but do not have information about what the optimizer does neither when
> the DB decides about to table scan or cache ?

Can't you get this from EXPLAIN and EXPLAIN ANALYZE?

...Robert


From: Peter Alban <peter(dot)alban2(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: same query in high number of times
Date: 2009-06-21 18:28:15
Message-ID: 477dfcc10906211128td36b9f3na22cf57b68904f81@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi,

Here is the query :
*duration: 2533.734 ms statement: *

*SELECT news.url_text,news.title, comments.name, comments.createdate,
comments.user_id, comments.comment FROM news, comments WHERE comments.cid=
news.id AND comments.published='1' GROUP BY news.url_text,news.title
comments.name, comments.createdate, comments.user_id, comments.comment ORDER
BY comments.createdate DESC LIMIT 3
*

And here is the query plan :
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4313.54..4313.55 rows=3 width=595) (actual
time=288.525..288.528 rows=3 loops=1)
-> Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual
time=288.523..288.523 rows=3 loops=1)
Sort Key: comments.createdate
-> HashAggregate (cost=3253.60..3388.46 rows=13486 width=595)
(actual time=137.521..148.132 rows=13415 loops=1)
-> Hash Join (cost=1400.73..3051.31 rows=13486 width=595)
(actual time=14.298..51.049 rows=13578 loops=1)
Hash Cond: ("outer".cid = "inner".id)
-> Seq Scan on comments (cost=0.00..1178.72
rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1)
Filter: (published = 1)
-> Hash (cost=1391.18..1391.18 rows=3818 width=81)
(actual time=14.268..14.268 rows=3818 loops=1)
-> Seq Scan on news (cost=0.00..1391.18
rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1)

The same is being requested from different sessions . So why is it not being
cached .
*
postgresq.conf --current --
shared_buffers = 410000 # min 16 or
max_connections*2, 8KB each
temp_buffers = 11000 # min 100, 8KB each
#max_prepared_transactions = 5 # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared
memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 51024 # min 64, size in KB
#maintenance_work_mem = 16384 # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB
#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

effective_cache_size = 692674 # typically 8KB each
#random_page_cost = 4 # units are one sequential page
fetch
# cost
#cpu_tuple_cost = 0.01 # (same)
#cpu_index_tuple_cost = 0.001 # (same)
#cpu_operator_cost = 0.0025 # (same)

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5 # range 1-10
#geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0 # selects default based on effort
#geqo_selection_bias = 2.0 # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10 # range 1-1000*

cheers,
Peter
On Sun, Jun 21, 2009 at 7:42 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sun, Jun 21, 2009 at 6:54 AM, Peter Alban<peter(dot)alban2(at)gmail(dot)com>
> wrote:
> > Should PG realize that if the table data is same should the query result
> set
> > also be the same ?
>
> No. That's not so easy to implement as you might think. Saving the
> results of each previous query in case someone issues the same query
> again without having changed anything in the meantime would probably
> cost more in performance on average that you'd get out of it.
>
> > Where do I see what the PG does ? I can see now the query's that take
> long
> > time ,but do not have information about what the optimizer does neither
> when
> > the DB decides about to table scan or cache ?
>
> Can't you get this from EXPLAIN and EXPLAIN ANALYZE?
>
> ...Robert
>


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Peter Alban <peter(dot)alban2(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: same query in high number of times
Date: 2009-06-21 22:06:41
Message-ID: dcc563d10906211506w41440cefw45d5b40d0f132048@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Sun, Jun 21, 2009 at 12:28 PM, Peter Alban<peter(dot)alban2(at)gmail(dot)com> wrote:
> Hi,
>
> Here is the query  :
> duration: 2533.734 ms  statement:

SNIP

>  Limit  (cost=4313.54..4313.55 rows=3 width=595) (actual
> time=288.525..288.528 rows=3 loops=1)

According to this query plan, your query is taking up 288
milliseconds. I'm guessing the rest of the time is actually is spent
transferring data.


From: Laurent Laborde <kerdezixe(at)gmail(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Peter Alban <peter(dot)alban2(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: same query in high number of times
Date: 2009-06-23 08:52:08
Message-ID: 8a1bfe660906230152o59f64720uedc51f2a55a132e2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Jun 22, 2009 at 12:06 AM, Scott Marlowe<scott(dot)marlowe(at)gmail(dot)com> wrote:
> On Sun, Jun 21, 2009 at 12:28 PM, Peter Alban<peter(dot)alban2(at)gmail(dot)com> wrote:
>> Hi,
>>
>> Here is the query  :
>> duration: 2533.734 ms  statement:
>
> SNIP
>
>>  Limit  (cost=4313.54..4313.55 rows=3 width=595) (actual
>> time=288.525..288.528 rows=3 loops=1)
>
> According to this query plan, your query is taking up 288
> milliseconds.  I'm guessing the rest of the time is actually is spent
> transferring data.

Huuuuuu ...
The cost is _certainly_ not the time in ms.
See the planner cost constants in a config file, or in any good documentation.

--
F4FQM
Kerunix Flan
Laurent Laborde


From: Laurent Laborde <kerdezixe(at)gmail(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Peter Alban <peter(dot)alban2(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: same query in high number of times
Date: 2009-06-23 08:55:49
Message-ID: 8a1bfe660906230155y413b0e56w39d4b28e8bf993a4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Jun 23, 2009 at 10:52 AM, Laurent Laborde<kerdezixe(at)gmail(dot)com> wrote:
> On Mon, Jun 22, 2009 at 12:06 AM, Scott Marlowe<scott(dot)marlowe(at)gmail(dot)com> wrote:
>> On Sun, Jun 21, 2009 at 12:28 PM, Peter Alban<peter(dot)alban2(at)gmail(dot)com> wrote:
>>> Hi,
>>>
>>> Here is the query  :
>>> duration: 2533.734 ms  statement:
>>
>> SNIP
>>
>>>  Limit  (cost=4313.54..4313.55 rows=3 width=595) (actual
>>> time=288.525..288.528 rows=3 loops=1)
>>
>> According to this query plan, your query is taking up 288
>> milliseconds.  I'm guessing the rest of the time is actually is spent
>> transferring data.
>
> Huuuuuu ...
> The cost is _certainly_ not the time in ms.
> See the planner cost constants in a config file, or in any good documentation.

Woooops... cost... time... my mistake ... :)
*duck and cover*

--
F4FQM
Kerunix Flan
Laurent Laborde