Re: count(*) using index scan in "query often, update rarely" environment

Lists: pgsql-performance
From: "Cestmir Hybl" <cestmirl(at)freeside(dot)sk>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: count(*) using index scan in "query often, update rarely" environment
Date: 2005-10-07 09:24:05
Message-ID: 935801c5cb20$dcf93a00$131fc39e@stratos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hello all

First of all, I do understand why pgsql with it's MVCC design has to examine tuples to evaluate "count(*)" and "count(*) where (...)" queries in environment with heavy concurrent updates.

This kind of usage IMHO isn't the average one. There are many circumstances with rather "query often, update rarely" character.

Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query?

(sorry for disturbing if this was already discussed)

Regards,

Cestmir Hybl


From: hubert depesz lubaczewski <depesz(at)gmail(dot)com>
To: Cestmir Hybl <cestmirl(at)freeside(dot)sk>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: count(*) using index scan in "query often, update rarely" environment
Date: 2005-10-07 09:54:23
Message-ID: 9e4684ce0510070254v67e08bcfhb725715803a5de25@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/7/05, Cestmir Hybl <cestmirl(at)freeside(dot)sk> wrote:
>
> Isn't it possible (and reasonable) for these environments to keep track of
> whether there is a transaction in progress with update to given table and if
> not, use an index scan (count(*) where) or cached value (count(*)) to
> perform this kind of query?
>

if i understand your problem correctly, then simple usage of triggers will
do the job just fine.

hubert


From: "Cestmir Hybl" <cestmirl(at)freeside(dot)sk>
To: "hubert depesz lubaczewski" <depesz(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: count(*) using index scan in "query often, update rarely" environment
Date: 2005-10-07 10:14:29
Message-ID: 9d5b01c5cb27$e7a1c970$131fc39e@stratos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Yes, I can possibly use triggers to maintanin counts of several fixed groups of records or total recordcount (but it's unpractical).

No, I can't speed-up evaluation of generic "count(*) where ()" queries this way.

My question was rather about general performance of count() queries in environment with infrequent updates.

Cestmir
----- Original Message -----
From: hubert depesz lubaczewski
To: Cestmir Hybl
Cc: pgsql-performance(at)postgresql(dot)org
Sent: Friday, October 07, 2005 11:54 AM
Subject: Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment

On 10/7/05, Cestmir Hybl <cestmirl(at)freeside(dot)sk> wrote:
Isn't it possible (and reasonable) for these environments to keep track of whether there is a transaction in progress with update to given table and if not, use an index scan (count(*) where) or cached value (count(*)) to perform this kind of query?

if i understand your problem correctly, then simple usage of triggers will do the job just fine.

hubert


From: "Steinar H(dot) Gunderson" <sgunderson(at)bigfoot(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: count(*) using index scan in "query often, update rarely" environment
Date: 2005-10-07 10:48:16
Message-ID: 20051007104816.GA8371@uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote:
> Isn't it possible (and reasonable) for these environments to keep track of
> whether there is a transaction in progress with update to given table and
> if not, use an index scan (count(*) where) or cached value (count(*)) to
> perform this kind of query?

Even if there is no running update, there might still be dead rows in the
table. In any case, of course, a new update could always be occurring while
your counting query was still running.

/* Steinar */
--
Homepage: http://www.sesse.net/


From: "Cestmir Hybl" <cestmirl(at)freeside(dot)sk>
To: "Steinar H(dot) Gunderson" <sgunderson(at)bigfoot(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: count(*) using index scan in "query often, update rarely" environment
Date: 2005-10-07 11:14:20
Message-ID: 9fbe01c5cb30$4407d580$131fc39e@stratos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

collision: it's possible to either block updating transaction until index
scan ends or discard index scan imediately and finish query using MVCC
compliant scan

dead rows: this sounds like more serious counter-argument, I don't know much
about dead records management and whether it would be possible/worth to
make indexes matching live records when there's no transaction in progress
on that table

----- Original Message -----
From: "Steinar H. Gunderson" <sgunderson(at)bigfoot(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Sent: Friday, October 07, 2005 12:48 PM
Subject: Re: [PERFORM] count(*) using index scan in "query often, update
rarely" environment

> On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote:
>> Isn't it possible (and reasonable) for these environments to keep track
>> of
>> whether there is a transaction in progress with update to given table and
>> if not, use an index scan (count(*) where) or cached value (count(*)) to
>> perform this kind of query?
>
> Even if there is no running update, there might still be dead rows in the
> table. In any case, of course, a new update could always be occurring
> while
> your counting query was still running.
>
> /* Steinar */
> --
> Homepage: http://www.sesse.net/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Cestmir Hybl <cestmirl(at)freeside(dot)sk>
Cc: "Steinar H(dot) Gunderson" <sgunderson(at)bigfoot(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: count(*) using index scan in "query often, update rarely" environment
Date: 2005-10-07 13:07:15
Message-ID: 20051007130714.GB5175@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Fri, Oct 07, 2005 at 01:14:20PM +0200, Cestmir Hybl wrote:
> collision: it's possible to either block updating transaction until
> index scan ends or discard index scan imediately and finish query using
> MVCC compliant scan

You can't change from one scan method to a different one on the fly.
There's no way to know which tuples have alreaady been returned.

Our index access methods are designed to be very concurrent, and it
works extremely well. One index scan being able to block an update
would destroy that advantage.

> dead rows: this sounds like more serious counter-argument, I don't know
> much about dead records management and whether it would be
> possible/worth to make indexes matching live records when there's no
> transaction in progress on that table

It's not possible, because a finishing transaction would have to clean
up every index it has used, and also any index it hasn't used but has
been modified by another transaction which couldn't clean up by itself
but didn't do the work because the first one was looking at the index.
It's easy to see that it's possible to create an unbounded number of
transactions, each forcing the other to do some index cleanup. This is
not acceptable.

Plus, it would be very hard to implement, and a very wide door to bugs.

--
Alvaro Herrera http://www.advogato.org/person/alvherre
"Et put se mouve" (Galileo Galilei)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Cestmir Hybl" <cestmirl(at)freeside(dot)sk>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: count(*) using index scan in "query often, update rarely" environment
Date: 2005-10-07 13:42:32
Message-ID: 3254.1128692552@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Cestmir Hybl" <cestmirl(at)freeside(dot)sk> writes:
> Isn't it possible (and reasonable) for these environments to keep track =
> of whether there is a transaction in progress with update to given table =
> and if not, use an index scan (count(*) where) or cached value =
> (count(*)) to perform this kind of query?

Please read the archives before bringing up such well-discussed issues.

There's a workable-looking design in the archives (pghackers probably)
for maintaining overall table counts in a separate table, with each
transaction adding one row of "delta" information just before it
commits. I haven't seen anything else that looks remotely attractive.

regards, tom lane


From: Richard Huxton <dev(at)archonet(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: count(*) using index scan in "query often, update rarely"
Date: 2005-10-07 14:11:01
Message-ID: 434681F5.6000906@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
>
> There's a workable-looking design in the archives (pghackers probably)
> for maintaining overall table counts in a separate table, with each
> transaction adding one row of "delta" information just before it
> commits. I haven't seen anything else that looks remotely attractive.

It might be useful if there was a way to trap certain queries and
rewrite/replace them. That way more complex queries could be
transparently redirected to a summary table etc. I'm guessing that the
overhead to check every query would quickly destroy any gains though.

--
Richard Huxton
Archonet Ltd


From: hubert depesz lubaczewski <depesz(at)gmail(dot)com>
To: Cestmir Hybl <cestmirl(at)freeside(dot)sk>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: count(*) using index scan in "query often, update rarely" environment
Date: 2005-10-08 10:44:09
Message-ID: 9e4684ce0510080344v1dfa36f7nea19ddb1ad38f652@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/7/05, Cestmir Hybl <cestmirl(at)freeside(dot)sk> wrote:
>
> No, I can't speed-up evaluation of generic "count(*) where ()" queries
> this way.
>

no you can't speed up generic where(), *but* you can check what are the most
common "where"'s (like usually i do where on one column like:
select count(*) from table where some_particular_column = 'some value';
where you can simply make the trigger aware of the fact that it should count
based on value in some_particular_column.
works good enough for me not to look for alternatives.

depesz


From: mark(at)mark(dot)mielke(dot)cc
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: count(*) using index scan in "query often, update rarely" environment
Date: 2005-10-08 13:34:32
Message-ID: 20051008133432.GC23913@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Fri, Oct 07, 2005 at 12:48:16PM +0200, Steinar H. Gunderson wrote:
> On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote:
> > Isn't it possible (and reasonable) for these environments to keep track of
> > whether there is a transaction in progress with update to given table and
> > if not, use an index scan (count(*) where) or cached value (count(*)) to
> > perform this kind of query?
> Even if there is no running update, there might still be dead rows in the
> table. In any case, of course, a new update could always be occurring while
> your counting query was still running.

I don't see this being different from count(*) as it is today.

Updating a count column is certainly clever. If using a trigger,
perhaps it would allow the equivalent of:

select count(*) from table for update;

:-)

Cheers,
mark

(not that this is necessarily a good thing!)

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/