Re: Optimizing select count query which often takes over 10 seconds

Lists: pgsql-general
From: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-24 09:57:43
Message-ID: CAADeyWhJ1hX=i8mCWF=QYbxThd31--79gkitqh6Y8B6o6t6Pog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello,

for a PostgreSQL 8.4.13 database + pgbouncer
on a 32 GB RAM machine with CentOS 6.3 / 64 bit
I use the following settings:

max_connections = 100
shared_buffers = 4096MB
work_mem = 32MB
checkpoint_segments = 32 # to shut up nightly pg_dump
escape_string_warning = off # to shut up Drupal 7.19 warnings
log_min_duration_statement = 10000

And the latter statement always
reports me just 1 command
(besides Drupal which I can't fix):

LOG: duration: 12590.394 ms statement:
select count(id) from (
select id,
row_number() over(partition by yw order by money
desc) as ranking
from pref_money
) x
where x.ranking = 1 and id='OK471018960997'

This command comes from a PHP-script
of mine which displays "medals" on
a player profile page - meaning how many
times she won a weekly tournament:

# \d pref_money
Table "public.pref_money"
Column | Type | Modifiers
--------+-----------------------+-----------------------------------------
id | character varying(32) |
money | integer | not null
yw | character(7) | default to_char(now(), 'IYYY-IW'::text)
Indexes:
"pref_money_yw_index" btree (yw)
Foreign-key constraints:
"pref_money_id_fkey" FOREIGN KEY (id) REFERENCES pref_users(id) ON
DELETE CASCADE

Does anybody please have an idea
how could I optimize it or should
I introduce a hourly job and a "medals"
column (that would make my players
stats less "live")?

Here is the EXPLAIN output
(which I hardly understand) for
a player with 9 weekly medals:

# explain analyze select count(id) from (
select id,
row_number() over(partition by yw order by money
desc) as ranking
from pref_money
) x
where x.ranking = 1 and id='OK452217781481';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=63694.22..63694.23 rows=1 width=82) (actual
time=4520.719..4520.719 rows=1 loops=1)
-> Subquery Scan x (cost=48519.10..63694.19 rows=11 width=82)
(actual time=4470.620..4520.710 rows=6 loops=1)
Filter: ((x.ranking = 1) AND ((x.id)::text = 'OK452217781481'::text))
-> WindowAgg (cost=48519.10..57190.58 rows=433574 width=26)
(actual time=4293.315..4491.652 rows=429803 loops=1)
-> Sort (cost=48519.10..49603.03 rows=433574
width=26) (actual time=4293.306..4352.544 rows=429803 loops=1)
Sort Key: pref_money.yw, pref_money.money
Sort Method: external sort Disk: 15856kB
-> Seq Scan on pref_money (cost=0.00..7923.74
rows=433574 width=26) (actual time=0.006..41.907 rows=429803 loops=1)
Total runtime: 4525.662 ms
(9 rows)

Thank you for any hints
Alex


From: Moshe Jacobson <moshe(at)neadwerx(dot)com>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-24 12:47:49
Message-ID: CAJ4CxL=xXXryKvt6ThwZ21rtpP=eUyAimrH0=6BaBzFqe2y_pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Jan 24, 2013 at 4:57 AM, Alexander Farber <
alexander(dot)farber(at)gmail(dot)com> wrote:

> select count(id) from (
> select id,
> row_number() over(partition by yw order by money
> desc) as ranking
> from pref_money
> ) x
> where x.ranking = 1 and id='OK471018960997'
>
> Does anybody please have an idea
> how could I optimize it or should
> I introduce a hourly job and a "medals"
> column (that would make my players
> stats less "live")?
>

One idea is to have a new ranking column to cache every player's ranking
for every weekly tournament. However, instead of updating it hourly with a
cron job, you could have a trigger on the table, such that when any row is
updated/inserted, you recalculate the rankings for only those rows having
the same "yw" value.

Then, you might want to create an index on the ranking column as well as
the yw column, which you already have indexed.

Moshe

--
Moshe Jacobson
Nead Werx, Inc. | Senior Systems Engineer
2323 Cumberland Parkway, Suite 201 | Atlanta, GA 30339
moshe(at)neadwerx(dot)com | www.neadwerx.com


From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-24 14:39:27
Message-ID: CAF-3MvOwMz+4LqaS4Dj4LQKn1KQi0RhT-Gr1n6pfNkk30XvALw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 24 January 2013 10:57, Alexander Farber <alexander(dot)farber(at)gmail(dot)com>wrote:

> # explain analyze select count(id) from (
> select id,
> row_number() over(partition by yw order by money
> desc) as ranking
> from pref_money
> ) x
> where x.ranking = 1 and id='OK452217781481';
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=63694.22..63694.23 rows=1 width=82) (actual
> time=4520.719..4520.719 rows=1 loops=1)
> -> Subquery Scan x (cost=48519.10..63694.19 rows=11 width=82)
> (actual time=4470.620..4520.710 rows=6 loops=1)
> Filter: ((x.ranking = 1) AND ((x.id)::text =
> 'OK452217781481'::text))
> -> WindowAgg (cost=48519.10..57190.58 rows=433574 width=26)
> (actual time=4293.315..4491.652 rows=429803 loops=1)
> -> Sort (cost=48519.10..49603.03 rows=433574
> width=26) (actual time=4293.306..4352.544 rows=429803 loops=1)
> Sort Key: pref_money.yw, pref_money.money
> Sort Method: external sort Disk: 15856kB
>

It's sorting on disk. That's not going to be fast. Indeed, it's taking
nearly all the time the query takes (4.4s for this step out of 4.5s for the
query).

> -> Seq Scan on pref_money (cost=0.00..7923.74
> rows=433574 width=26) (actual time=0.006..41.907 rows=429803 loops=1)
>

And then it's doing a sequential scan to sort the data. I suspect that's
because it's sorting on disk. Then again, this only takes 42ms, just once
(loops=1), so perhaps a seqscan is indeed the fastest approach here
(actually, wow, it scans 10000 records/ms - rows are 26 bytes wide, so
that's 260MB/s! I'm doubting my math here...).

Total runtime: 4525.662 ms
> (9 rows)
>
> Thank you for any hints
> Alex
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


From: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
To:
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-24 14:45:47
Message-ID: CAADeyWiaUX+M-zf_nTEJvPEBRr2tcHevrgmM0xumNmrKMcWHnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello -

On Thu, Jan 24, 2013 at 3:39 PM, Alban Hertroys <haramrae(at)gmail(dot)com> wrote:
> On 24 January 2013 10:57, Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
> wrote:
>>
>> # explain analyze select count(id) from (
>> select id,
>> row_number() over(partition by yw order by money
>> desc) as ranking
>> from pref_money
>> ) x
>> where x.ranking = 1 and id='OK452217781481';
>> QUERY PLAN
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------
>> Aggregate (cost=63694.22..63694.23 rows=1 width=82) (actual
>> time=4520.719..4520.719 rows=1 loops=1)
>> -> Subquery Scan x (cost=48519.10..63694.19 rows=11 width=82)
>> (actual time=4470.620..4520.710 rows=6 loops=1)
>> Filter: ((x.ranking = 1) AND ((x.id)::text =
>> 'OK452217781481'::text))
>> -> WindowAgg (cost=48519.10..57190.58 rows=433574 width=26)
>> (actual time=4293.315..4491.652 rows=429803 loops=1)
>> -> Sort (cost=48519.10..49603.03 rows=433574
>> width=26) (actual time=4293.306..4352.544 rows=429803 loops=1)
>> Sort Key: pref_money.yw, pref_money.money
>> Sort Method: external sort Disk: 15856kB
>
>
> It's sorting on disk. That's not going to be fast. Indeed, it's taking
> nearly all the time the query takes (4.4s for this step out of 4.5s for the
> query).

I've noticed that too, but what
does "sorting on disk" mean?

I have a lot of RAM (32 GB) ,
should I increase work_mem even more?
(it is currenlty 32 MB)

Regards
Alex


From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-24 16:24:45
Message-ID: CAF-3MvPDDmeQnEC66xu603Bep7bfqGaJJnt+h+4KtBtR9Koi0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> > It's sorting on disk. That's not going to be fast. Indeed, it's taking
> > nearly all the time the query takes (4.4s for this step out of 4.5s for
> the
> > query).
>
> I've noticed that too, but what
> does "sorting on disk" mean?
>
> I have a lot of RAM (32 GB) ,
> should I increase work_mem even more?
> (it is currenlty 32 MB)
>

You can try increasing the amount of work_mem in your psql session only and
see what amount helps. That way you don't need to permanently increase it
for all your queries.
I'd start with 48 MB and increase in increments of 16 MB (as that's the
size the sort operation claims to require on disk).

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-24 19:12:58
Message-ID: 1359054778.2055.1.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, 2013-01-24 at 15:45 +0100, Alexander Farber wrote:
> Hello -
>
> On Thu, Jan 24, 2013 at 3:39 PM, Alban Hertroys <haramrae(at)gmail(dot)com> wrote:
> > On 24 January 2013 10:57, Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
> > wrote:
> >>
> >> # explain analyze select count(id) from (
> >> select id,
> >> row_number() over(partition by yw order by money
> >> desc) as ranking
> >> from pref_money
> >> ) x
> >> where x.ranking = 1 and id='OK452217781481';
> >> QUERY PLAN
> >>
> >> -------------------------------------------------------------------------------------------------------------------------------------------
> >> Aggregate (cost=63694.22..63694.23 rows=1 width=82) (actual
> >> time=4520.719..4520.719 rows=1 loops=1)
> >> -> Subquery Scan x (cost=48519.10..63694.19 rows=11 width=82)
> >> (actual time=4470.620..4520.710 rows=6 loops=1)
> >> Filter: ((x.ranking = 1) AND ((x.id)::text =
> >> 'OK452217781481'::text))
> >> -> WindowAgg (cost=48519.10..57190.58 rows=433574 width=26)
> >> (actual time=4293.315..4491.652 rows=429803 loops=1)
> >> -> Sort (cost=48519.10..49603.03 rows=433574
> >> width=26) (actual time=4293.306..4352.544 rows=429803 loops=1)
> >> Sort Key: pref_money.yw, pref_money.money
> >> Sort Method: external sort Disk: 15856kB
> >
> >
> > It's sorting on disk. That's not going to be fast. Indeed, it's taking
> > nearly all the time the query takes (4.4s for this step out of 4.5s for the
> > query).
>
> I've noticed that too, but what
> does "sorting on disk" mean?
>
> I have a lot of RAM (32 GB) ,
> should I increase work_mem even more?
> (it is currenlty 32 MB)
>

You should better create an index on pref_money(yw, money). It could
help you get rid of the seqscan and sort operations.

--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com


From: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
To:
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-25 15:13:38
Message-ID: CAADeyWjvQCyftMvmHyzMMMRCaRKPGWWY7bRYUDQ3qmGhs7s70A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi -

On Thu, Jan 24, 2013 at 8:12 PM, Guillaume Lelarge
<guillaume(at)lelarge(dot)info> wrote:
> You should better create an index on pref_money(yw, money). It could
> help you get rid of the seqscan and sort operations.

I've created an index with

# create index pref_money_money_index on pref_money(money desc);

and posted the new EXPLAIN output here:

http://stackoverflow.com/questions/14498974/optimizing-select-count-query-which-often-takes-over-10-seconds

But it doesn't seem to change much or
am I too unexperienced to see the change?

Thanks
Alex


From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-25 16:00:06
Message-ID: 1359129606.24620.9.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 2013-01-25 at 16:13 +0100, Alexander Farber wrote:
> Hi -
>
> On Thu, Jan 24, 2013 at 8:12 PM, Guillaume Lelarge
> <guillaume(at)lelarge(dot)info> wrote:
> > You should better create an index on pref_money(yw, money). It could
> > help you get rid of the seqscan and sort operations.
>
> I've created an index with
>
> # create index pref_money_money_index on pref_money(money desc);
>
> and posted the new EXPLAIN output here:
>
> http://stackoverflow.com/questions/14498974/optimizing-select-count-query-which-often-takes-over-10-seconds
>
> But it doesn't seem to change much or
> am I too unexperienced to see the change?
>

There's no change because you created an index on money alone, and that
change sure didn't give PostgreSQL a chance to do anything better. What
I told you before was to create an index on yw, and money, like this :

create index pref_money_yw_money_idx on pref_money(yw, money);

This should help you to change the plan and, I hope, get better
performances.

--
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-25 18:42:38
Message-ID: CAMkU=1y8_Fkepww6tPVs4gs1EtcCUoWuA14rxkbCz2O4Q1DWTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Jan 24, 2013 at 1:57 AM, Alexander Farber
<alexander(dot)farber(at)gmail(dot)com> wrote:
> Hello,
>
> for a PostgreSQL 8.4.13 database + pgbouncer

Using 8.4 is really going to limit your options.

...

>
> LOG: duration: 12590.394 ms statement:
> select count(id) from (
> select id,
> row_number() over(partition by yw order by money
> desc) as ranking
> from pref_money
> ) x
> where x.ranking = 1 and id='OK471018960997'

Since you only care about ranking=1, it might be better to rewrite
that using something like:

where money = (select max(money....)

But, I doubt it. I don't think even the 9.2 planner has the smarts to
do what you want efficiently. It might be possible to make it do it
efficiently using a recursive query, once you have the index on
(yw,money).

> This command comes from a PHP-script
> of mine which displays "medals" on
> a player profile page - meaning how many
> times she won a weekly tournament:
...
>
> Does anybody please have an idea
> how could I optimize it or should
> I introduce a hourly job and a "medals"
> column (that would make my players
> stats less "live")?

This sounds like a good idea. But if the tournament is weekly why
would the job have to be hourly? Why do the results of a weekly
tournament need to be 'live'?

Cheers,

Jeff


From: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
To:
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-27 17:25:02
Message-ID: CAADeyWitdfbor4sF7wgRzc_VjQGvtGBrwx_PP09i9-o4tLocPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello -

On Fri, Jan 25, 2013 at 7:42 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Thu, Jan 24, 2013 at 1:57 AM, Alexander Farber
> <alexander(dot)farber(at)gmail(dot)com> wrote:
>>
>> LOG: duration: 12590.394 ms statement:
>> select count(id) from (
>> select id,
>> row_number() over(partition by yw order by money
>> desc) as ranking
>> from pref_money
>> ) x
>> where x.ranking = 1 and id='OK471018960997'
>
> This sounds like a good idea. But if the tournament is weekly why
> would the job have to be hourly? Why do the results of a weekly
> tournament need to be 'live'?

because for the current week
the medals are displayed too.

And when a player enters a top
then he should get +1 medals and
the one he pushed from the top -1 medals

So even hourly isn't really good enough for me...
It should be "live" stats.

Regards
Alex


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-27 19:41:30
Message-ID: CAMkU=1zy05dR7agh+orN9BqL6Otj+iziD4L0AvNruWSEN9PsJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sun, Jan 27, 2013 at 9:25 AM, Alexander Farber
<alexander(dot)farber(at)gmail(dot)com> wrote:
> Hello -
>
> On Fri, Jan 25, 2013 at 7:42 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

>> This sounds like a good idea. But if the tournament is weekly why
>> would the job have to be hourly? Why do the results of a weekly
>> tournament need to be 'live'?
>
> because for the current week
> the medals are displayed too.
>
> And when a player enters a top
> then he should get +1 medals and
> the one he pushed from the top -1 medals
>
> So even hourly isn't really good enough for me...
> It should be "live" stats.

Once the week is over, materialize the medals for that week. Then the
live part of the query only needs to specify the currently live week,
not the entire history. And in that case, the query should be quite
efficient if you have in index on both week and money.

Cheers,

Jeff


From: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
To:
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-28 07:57:39
Message-ID: CAADeyWhbf8YLu2CWE4fagE08y54wkcF2nGqYqw71yMin9d+bhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks -

On Sun, Jan 27, 2013 at 8:41 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> Once the week is over, materialize the medals for that week. Then the
> live part of the query only needs to specify the currently live week,
> not the entire history. And in that case, the query should be quite
> efficient if you have in index on both week and money.

I will try that!


From: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
To:
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-29 13:03:39
Message-ID: CAADeyWiLz_QsOYyF2RgZPoyB6nFLreopvn57ZyBW24AxcExbqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thank you, I've ended up with this cronjob
(yes I'm unfrotunately having week numbers as strings):

/* reset and then update medals count */

update pref_users set medals = 0;
UPDATE 223456

update pref_users u
set medals = s.medals
from (
select id, count(id) medals
from (
select id,
row_number() over(partition by yw order by money desc) as ranking
from pref_money where yw <> to_char(CURRENT_TIMESTAMP, 'IYYY-IW')
) x
where ranking = 1
group by id
) s
where u.id = s.id;
UPDATE 65


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-29 23:09:26
Message-ID: 1359500966.5795.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Alexander Farber <alexander(dot)farber(at)gmail(dot)com> wrote:

> update pref_users set medals = 0;
> UPDATE 223456

You're probably going to like your performance a lot better if you
modify that to:

update pref_users set medals = 0 where medals <> 0;

-Kevin


From: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
To:
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-30 07:41:40
Message-ID: CAADeyWgROhMSNFDOhSZa+DC4uqLNfA2xL+0XH2D=LzyaC=1r1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello Kevin,

On Wed, Jan 30, 2013 at 12:09 AM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Alexander Farber <alexander(dot)farber(at)gmail(dot)com> wrote:
>
>> update pref_users set medals = 0;
>> UPDATE 223456
>
> You're probably going to like your performance a lot better if you
> modify that to:
>
> update pref_users set medals = 0 where medals <> 0;

is it really so?

I only have 65 users (out of 223456) with medals != 0.

When programming other languages, I never do
if (x != 0) { x = 0; } but just set x = 0 straight away.

Regards
Alex


From: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
To:
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-30 08:00:24
Message-ID: CAADeyWiuF1NTbJHgZHHQmRkJ6nPUYWYYaVqFSbUgm83ykfASkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

The cronjob gives me now occasionally:

/* reset and then update medals count */
update pref_users set medals = 0;
psql:/home/afarber/bin/clean-database.sql:63: ERROR: deadlock detected

DETAIL: Process 31072 waits for ShareLock on transaction 124735679;
blocked by process 30368.
Process 30368 waits for ShareLock on transaction 124735675; blocked by
process 31072.
HINT: See server log for query details.
update pref_users u
set medals = s.medals
from (
select id, count(id) medals
from (
select id,
row_number() over(partition by yw order by money desc) as ranking
from pref_money where yw <> to_char(CURRENT_TIMESTAMP, 'IYYY-IW')
) x
where ranking = 1
group by id
) s
where u.id = s.id;

Any ideas please how to workaround?

Thank you
Alex


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-30 12:58:41
Message-ID: 1359550721.6253.YahooMailNeo@web162901.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Alexander Farber <alexander(dot)farber(at)gmail(dot)com> wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>> Alexander Farber <alexander(dot)farber(at)gmail(dot)com> wrote:
>>
>>> update pref_users set medals = 0;
>>> UPDATE 223456
>>
>> You're probably going to like your performance a lot better if
>> you modify that to:
>>
>> update pref_users set medals = 0 where medals <> 0;
>
> is it really so?

Yes.

> I only have 65 users (out of 223456) with medals != 0.

That's precisely the point.  You don't want to update all 223456
rows when there are only 65 which need to be changed.

> When programming other languages, I never do
> if (x != 0) { x = 0; } but just set x = 0 straight away.

Well, if updating a row was as cheap as assigning zero to x I
wouldn't suggest a change to your code.  If assigning something to
x involved an expensive function or disk access, you might try to
put an "if" around it.

If you don't want to burden your query with the condition, you
could consider attaching a trigger to every table that you might
want to assign existing values to rows.  See the
suppress_redundant_updates_trigger() function for details:

http://www.postgresql.org/docs/current/interactive/functions-trigger.html

-Kevin


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-30 13:06:57
Message-ID: 1359551217.83456.YahooMailNeo@web162904.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Alexander Farber <alexander(dot)farber(at)gmail(dot)com> wrote:

> The cronjob gives me now occasionally:
>
> /* reset and then update medals count */
> update pref_users set medals = 0;
> psql:/home/afarber/bin/clean-database.sql:63: ERROR:  deadlock detected
> DETAIL:  Process 31072 waits for ShareLock on transaction 124735679; blocked by process 30368.
> Process 30368 waits for ShareLock on transaction 124735675; blocked by process 31072.
> HINT:  See server log for query details.

> Any ideas please how to workaround?

Yeah, try this:

update pref_users set medals = 0 where medals <> 0;

:-)

That should significantly reduce the frequency of deadlocks;
however, IMO any application using a relational database should be
prepared to retry database transactions which fail with a
serialization error, and a deadlock is one form of that.  The
standard SQLSTATE to look for is '40001' and in PostgreSQL you
should also check for '40P01'.

-Kevin


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-30 15:18:17
Message-ID: CAMkU=1wFRANBCttRtgxnqx2VgcANEzKZ+JzmcqY85ii6X-Uqiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Jan 29, 2013 at 11:41 PM, Alexander Farber
<alexander(dot)farber(at)gmail(dot)com> wrote:
>
> When programming other languages, I never do
> if (x != 0) { x = 0; } but just set x = 0 straight away.

Most other languages are not transactional and durable. Databases are
different.

Cheers,

Jeff


From: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
To:
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing select count query which often takes over 10 seconds
Date: 2013-01-31 09:01:14
Message-ID: CAADeyWjtd8P6USe4kY-QFuWwggyw=pceQxeS5Ftguq-L7cuwPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Jan 30, 2013 at 2:06 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> update pref_users set medals = 0 where medals <> 0;

Thank you all for your insightful comments

This has cured my cronjob