Quick Links

Re: count(*) performance improvement ideas

Lists:	pgsql-hackers

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	count(*) performance improvement ideas
Date:	2008-03-12 15:04:53
Message-ID:	2e78013d0803120804i4b945f02l7e944d0a079c32c2@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I am reading discussion about improving count(*) performance. I have
also seen a TODO for this.

Many people have suggested TRIGGER based solution to the slow count(*)
problem. I looked at the following link which presents the solution
neatly.

http://www.varlena.com/GeneralBits/120.php

But how does that really work for SERIALIZABLE transactions ? If
two concurrent transactions INSERT/DELETE rows from a table,
the trigger execution of one of the transactions is bound to fail
because of concurrent access. Even for READ COMMITTED transactions,
the trigger execution would wait if the other transaction has executed
the trigger on the same table. Well, I think the READ COMMITTED case
can be handled with DEFERRED triggers, but that may require queuing up
too many triggers if there are many inserts/deletes in a transaction.

Running trigger for every insert/delete seems too expensive. I wonder
if we can have a separate "counter" table (as suggested in the TRIGGER
based solution) and track total number of tuples inserted and deleted
in a transaction (and all the committed subtransactions). We then
execute a single UPDATE at the end of the transaction. With HOT,
updating the "counter" table should not be a big pain since all these
updates can potentially be HOT updates. Also since the update of
the "counter" table happens at the commit time, other transactions
inserting/deleting from the same user table may need to wait for a
very small period on the "counter" table tuple.

This still doesn't solve the serializable transaction problem
though. But I am sure we can figure out some solution for that case
as well if we agree on the general approach.

I am sure this must have been discussed before. So what are the
objections ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-12 15:31:01
Message-ID:	19417.1205335861@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
> I am sure this must have been discussed before.

Indeed. Apparently you didn't find the threads in which the idea of
having transactions enter "delta" entries was discussed? Solves both
the locking and the MVCC problems, at the cost that you need to make
cleanup passes over the counter table fairly often.

I don't see this as material for the core DB but it would make a great
contrib module.

regards, tom lane

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-12 15:34:13
Message-ID:	200803121534.m2CFYDR29157@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
> > I am sure this must have been discussed before.
>
> Indeed. Apparently you didn't find the threads in which the idea of
> having transactions enter "delta" entries was discussed? Solves both
> the locking and the MVCC problems, at the cost that you need to make
> cleanup passes over the counter table fairly often.
>
> I don't see this as material for the core DB but it would make a great
> contrib module.

The TODO list has lots of details on this.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-12 15:41:28
Message-ID:	2e78013d0803120841x500f0ddco81ae71bda293e9b7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 12, 2008 at 9:01 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
> > I am sure this must have been discussed before.
>
> Indeed. Apparently you didn't find the threads in which the idea of
> having transactions enter "delta" entries was discussed? Solves both
> the locking and the MVCC problems, at the cost that you need to make
> cleanup passes over the counter table fairly often.
>

Ok. I would surely look at those threads. Hopefully HOT will considerably
solve the counter table cleanup issue.

> I don't see this as material for the core DB but it would make a great
> contrib module.
>

I guess we will need some backend hooks to make it really work, no ?
At the minimum we need to track the "deltas" at the transaction level
and the ability to do some extra processing at the commit time.
May be I should first read those threads and I will find the answers.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

From:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-12 15:44:10
Message-ID:	47D7FA4A.2060109@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Pavan Deolasee wrote:
> I am reading discussion about improving count(*) performance. I have
> also seen a TODO for this.
>
> Many people have suggested TRIGGER based solution to the slow count(*)
> problem. I looked at the following link which presents the solution
> neatly.
>
> http://www.varlena.com/GeneralBits/120.php
>
> But how does that really work for SERIALIZABLE transactions ? If
> two concurrent transactions INSERT/DELETE rows from a table,
> the trigger execution of one of the transactions is bound to fail
> because of concurrent access. Even for READ COMMITTED transactions,
> the trigger execution would wait if the other transaction has executed
> the trigger on the same table. Well, I think the READ COMMITTED case
> can be handled with DEFERRED triggers, but that may require queuing up
> too many triggers if there are many inserts/deletes in a transaction.
>
> Running trigger for every insert/delete seems too expensive. I wonder
> if we can have a separate "counter" table (as suggested in the TRIGGER
> based solution) and track total number of tuples inserted and deleted
> in a transaction (and all the committed subtransactions). We then
> execute a single UPDATE at the end of the transaction. With HOT,
> updating the "counter" table should not be a big pain since all these
> updates can potentially be HOT updates. Also since the update of
> the "counter" table happens at the commit time, other transactions
> inserting/deleting from the same user table may need to wait for a
> very small period on the "counter" table tuple.
>
> This still doesn't solve the serializable transaction problem
> though. But I am sure we can figure out some solution for that case
> as well if we agree on the general approach.
>
> I am sure this must have been discussed before. So what are the
> objections

If you are talking about automatically doing this for every table - I
have an objection that the performance impact seems unwarranted against
the gain. We are still talking about every insert or update updating
some counter table, with the only mitigating factor being that the
trigger would be coded deeper into PostgreSQL theoretically making it
cheaper?

You can already today create a trigger on insert that will append to a
summary table of some sort, whose only purpose is to maintain counts. At
the simplest, it is as somebody else suggested where you might have the
other table only store the primary keys with foreign key references back
to the main table for handling deletes and updates. Storing transaction
numbers and such might allow the data to be reduced in terms of size
(and therefore elapsed time to scan), but it seems complex.

If this really is a problem that must be solved - I prefer the
suggestion from the past of keeping track of live rows per block for a
certain transaction range, and any that fall within this range can check
off this block quickly with an exact count, then the exceptional blocks
(the ones being changed) can be scanned to be sure. But, it's still
pretty complicated to implement right and maintain, for what is probably
limited gain. I don't personally buy into the need to do exact count(*)
on a whole table quickly. I know people ask for it - but I find these
same people either confused, or trying to use this functionality to
accomplish some other end, under the assumption that because they can
get counts faster from other databases, therefore PostgreSQL should do
it as well. I sometimes wonder whether these people would even notice if
PostgreSQL translated count(*) on the whole table to query reltuples. :-)

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	"Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-12 15:52:45
Message-ID:	2e78013d0803120852h11a1022fw952900d925405294@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 12, 2008 at 9:14 PM, Mark Mielke <mark(at)mark(dot)mielke(dot)cc> wrote:
>
> If you are talking about automatically doing this for every table - I
> have an objection that the performance impact seems unwarranted against
> the gain. We are still talking about every insert or update updating
> some counter table, with the only mitigating factor being that the
> trigger would be coded deeper into PostgreSQL theoretically making it
> cheaper?
>

No, I am not suggesting that. If you read proposal carefully, its one UPDATE
per transaction. With HOT, I am hoping that the counter table may be
completely cached in memory and won't bloat much.

Also, we can always have a GUC (like pgstats) to control the overhead.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

From:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-12 16:23:12
Message-ID:	47D80370.7030503@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Pavan Deolasee wrote:
> On Wed, Mar 12, 2008 at 9:14 PM, Mark Mielke <mark(at)mark(dot)mielke(dot)cc> wrote:
>
>> If you are talking about automatically doing this for every table - I
>> have an objection that the performance impact seems unwarranted against
>> the gain. We are still talking about every insert or update updating
>> some counter table, with the only mitigating factor being that the
>> trigger would be coded deeper into PostgreSQL theoretically making it
>> cheaper?
>>
>
> No, I am not suggesting that. If you read proposal carefully, its one UPDATE
> per transaction. With HOT, I am hoping that the counter table may be
> completely cached in memory and won't bloat much.
>
> Also, we can always have a GUC (like pgstats) to control the overhead.
>

Fine - once per transaction instead of once per insert. Still, if there
is overhead to this (updating a secondary summary table), does it really
make sense to have it for every table? Most of my tables do not require
count(*) on the whole table (actually - none of them do). For the same
reason as I don't want oid, I don't think I would want "fast count"
capabilities to impact my regular queries. Again, I don't think count(*)
on the whole table is a particularly useful case. count(*) on particular
subsets of the data may be, but of the whole table?

If you can make a secondary summary table to be used for count(*)
optional, great. If using HOT makes the secondary table more efficient,
great.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-12 16:42:44
Message-ID:	20576.1205340164@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
> Fine - once per transaction instead of once per insert. Still, if there
> is overhead to this (updating a secondary summary table), does it really
> make sense to have it for every table?

We certainly wouldn't accept a patch that imposed this overhead on every
table.

One of the advantages of framing it as an explicit set of triggers is
that then you have a natural way of indicating which table(s) you want
the feature for (and are willing to pay the overhead to get it).

regards, tom lane

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	"Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-12 16:59:48
Message-ID:	2e78013d0803120959r515667ecl4edb3bd3afbfbdfe@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 12, 2008 at 9:53 PM, Mark Mielke <mark(at)mark(dot)mielke(dot)cc> wrote:
>
>
>
> Fine - once per transaction instead of once per insert. Still, if there is
> overhead to this (updating a secondary summary table), does it really make
> sense to have it for every table? Most of my tables do not require count(*)
> on the whole table (actually - none of them do). For the same reason as I
> don't want oid, I don't think I would want "fast count" capabilities to
> impact my regular queries. Again, I don't think count(*) on the whole table
> is a particularly useful case. count(*) on particular subsets of the data
> may be, but of the whole table?
>

ISTM that you are complaining because we never had an *fast* count(*)
and adding that now comes at a cost. Had it been there from day one with
the same overhead as we are talking about now, nobody would have
complained :-)

Anyways, your point is taken and it would be great if can make it configurable,
if not table level then at least globally.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

From:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-12 17:10:50
Message-ID:	47D80E9A.8020401@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Pavan Deolasee wrote:
> On Wed, Mar 12, 2008 at 9:53 PM, Mark Mielke <mark(at)mark(dot)mielke(dot)cc> wrote:
>
>>
>> Fine - once per transaction instead of once per insert. Still, if there is
>> overhead to this (updating a secondary summary table), does it really make
>> sense to have it for every table? Most of my tables do not require count(*)
>> on the whole table (actually - none of them do). For the same reason as I
>> don't want oid, I don't think I would want "fast count" capabilities to
>> impact my regular queries. Again, I don't think count(*) on the whole table
>> is a particularly useful case. count(*) on particular subsets of the data
>> may be, but of the whole table?
>>
>>
>
> ISTM that you are complaining because we never had an *fast* count(*)
> and adding that now comes at a cost. Had it been there from day one with
> the same overhead as we are talking about now, nobody would have
> complained :-)
>

Quite possibly - but, similarly, if PostgreSQL was generally slower due
to such things, I might not have chosen PostgreSQL as my favoured db.
:-) I think there is a reason PostgreSQL doesn't come with a fast
count(*), and it's that the developers leading up to this point shared
the opinion that this feature was not critical.

I somewhat disagree about my personal reaction, though. I complain about
many things, even configurable things, such as LISTEN/NOTIFY. I
occasionally wish I had such a function, but given it's implementation,
I would choose to use a non-PostgreSQL mechanism for implementation if
available. Luckily, it's also "only suffer the performance cost if you
CHOOSE to use it." Seems like a historically valuable trend.

> Anyways, your point is taken and it would be great if can make it configurable,
> if not table level then at least globally

If table level, I might even try it out. If global level, I will not be
trying it out. I'm only one person - but perhaps this will help point
you in a direction favourable to many?

If you are REALLY feeling clever, the queries that I would find it to
benefit the MOST on, would include WHERE conditions. Just like indexes
with WHERE conditions. If you get to this point, I think you are
reaching something that will have far more universal benefit to existing
applications. CREATE COUNT INDEX blah ON blah WHERE ... I would use this
in many places.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-12 17:29:51
Message-ID:	8763vr7svk.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
>> Fine - once per transaction instead of once per insert. Still, if there
>> is overhead to this (updating a secondary summary table), does it really
>> make sense to have it for every table?
>
> We certainly wouldn't accept a patch that imposed this overhead on every
> table.

If you look at this at the right angle it's actually a degenerate case of
materialized views. I think think it would be more useful to approach it from
that direction even if it only supported a very limited set of expressions.

In an ideal world I would love to be able to do something like:

CREATE MATERIALIZED VIEW foo AS (select count(*) from bar) WITH INCREMENTAL UPDATES;

and have that automatically create both a heap to store the count and another
to store the incremental changes. To do this would require some magic to know
what "incremental changes" means for each aggregate where it's meaningful
though.

Then it would require some magic in the optimizer to recognize when piece of
the query can be satisfied by a materialized view.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

From:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-12 18:22:11
Message-ID:	47D81F53.1000001@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
> In an ideal world I would love to be able to do something like:
>
> CREATE MATERIALIZED VIEW foo AS (select count(*) from bar) WITH INCREMENTAL UPDATES;
>
> and have that automatically create both a heap to store the count and another
> to store the incremental changes. To do this would require some magic to know
> what "incremental changes" means for each aggregate where it's meaningful
> though.
>
> Then it would require some magic in the optimizer to recognize when piece of
> the query can be satisfied by a materialized view.
>

Hehe... +5 if you think it can be done in the next 12 - 24 months. :-)

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

From:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-18 03:11:01
Message-ID:	F0238EBA67824444BC1CB4700960CB4804EAC3A8@dmpeints002.isotach.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote
> "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
> > I am sure this must have been discussed before.
>
> Indeed. Apparently you didn't find the threads in which the idea of
> having transactions enter "delta" entries was discussed? Solves both
> the locking and the MVCC problems, at the cost that you need to make
> cleanup passes over the counter table fairly often.

I can't find any posts that directly address what I was looking for.

In my situation I have a small number of concurrent transactions with each transaction running a large number of single row inserts or deletes.

However I'm not after a fast count(*) from table, but more like a fast
select grouping_id, count(*) from my_table group by grouping_id

I initially thought that what you meant by "having transactions enter "delta" entries" was that I have a trigger that would create a row each time it was called, writing how many records where inserted or deleted. I didn't understand how this would be much of an improvement, as each of my rows would contain either +1 or -1.

But I just realised you might mean to include a txid row in my table of deltas, and in my trigger insert or update that row where txid = txid_current()
(per grouping_id)

Is that what is recommended?

No locking problems as each transaction is only updating its own rows.

Can you clarify the lack of MVCC problems?
Do new versions of rows get created if the original version of the row was created in the current transaction?
Does this idea apply with the same efficiency in pre 8.3, non-HOT implementations?
Any advice on creating or avoiding indexes on the tables in question?

I can think of two options for a performing the cleanup passes using current functionality:
1) Using Listen/Notify
Issue a notify whenever a new row is inserted for a transaction. They get delivered post transaction commit don't they? And not at all on rollback? Have an application listening for them, performing the aggregation & cleanup work.
2) Use a schedule, based on local knowledge of expected number of transactions over time.

So I'd rephrase Pavan's suggestion as a request to have post-transaction commit triggers that have access to (at least) the txid of the transaction that was committed.

Suggested syntax is to add the option "TRANSACTION" (or perhaps "COMMIT") to the CREATE TRIGGER statement:

CREATE TRIGGER name AFTER INSERT OR UPDATE OR DELETE ON table FOR EACH TRANSACTION EXECUTE PROCEDURE funcname ( arguments );

Any of the triggering actions on the specified table ensure that the function is called once if the transaction commits.
Requires a new TG_LEVEL.
TG_OP could be the first action triggered.

Would implementing this be extremely difficult due to transferring information from within the transaction to outside the transaction?
If so, perhaps I'd get the same benefit from having a trigger set up to fire pre-commit (or pre-prepare), and be a part of the transaction.
Would the locking difficulties be reduced as the lock would not be required till late in the game, and not held for long?

Regards,
Stephen Denne.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc:	"PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-18 04:18:10
Message-ID:	12841.1205813890@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> I initially thought that what you meant by "having transactions enter "delta" entries" was that I have a trigger that would create a row each time it was called, writing how many records where inserted or deleted. I didn't understand how this would be much of an improvement, as each of my rows would contain either +1 or -1.

Well, ideally you'd aggregate all the deltas caused by a particular
transaction into one entry in the counting table. Whether or not that
happens, though, the point of the concept is that some background task
aggregates all the deltas from long-gone transactions into just one base
row, and then deletes the old delta entries. To get a valid value of
COUNT(*), what onlookers must do is SUM() the base row and delta records
from all transactions that they can "see" under MVCC rules. The amount
of work involved is proportional to the number of recent updates, not
the total size of the underlying table.

> However I'm not after a fast count(*) from table, but more like a fast
> select grouping_id, count(*) from my_table group by grouping_id

You could apply the same technique across each group id, though this
certainly is getting beyond what any built-in feature might offer.

> Can you clarify the lack of MVCC problems?

The point there is that the "right answer" actually depends on the
observer, since each observer might have a different snapshot and
therefore be able to "see" a different set of committed rows in the
underlying table. The multiple-delta table handles this automatically,
because you can "see" a delta entry if and only if you could "see"
the underlying-table changes it represents.

> Does this idea apply with the same efficiency in pre 8.3, non-HOT implementations?

I didn't claim it was amazingly efficient in any implementation ;-).
HOT in particular is nearly useless since most rows in the count
table will never be updated, only inserted and eventually deleted.
You might get some mileage on the base row, but that'd be about it.
The count table will need frequent vacuums as well as frequent
aggregation scans.

It should beat scanning a large underlying table, but it's hardly
gonna be free.

regards, tom lane

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-18 09:00:27
Message-ID:	87iqzk9zkk.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
>
>> However I'm not after a fast count(*) from table, but more like a fast
>> select grouping_id, count(*) from my_table group by grouping_id
>
> You could apply the same technique across each group id, though this
> certainly is getting beyond what any built-in feature might offer.

At that point you're talking about materialized views. Which makes it a whole
lot more interesting imho.

>> Does this idea apply with the same efficiency in pre 8.3, non-HOT implementations?
>
> I didn't claim it was amazingly efficient in any implementation ;-).
> HOT in particular is nearly useless since most rows in the count
> table will never be updated, only inserted and eventually deleted.
> You might get some mileage on the base row, but that'd be about it.
> The count table will need frequent vacuums as well as frequent
> aggregation scans.

It might be better not to update this delta table in normal transactional
updates. After all the tuples you're deleting are precisely the ones that
nobody should be interested in any more. If you locked the table and magically
deleted those tuples and updated the master tuple using the global xmin
instead of your real xid people would get the same result and you could
reclaim the space much much sooner. Locking the table kind of sucks though.
And crash recovery would be a problem.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

From:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-18 23:24:16
Message-ID:	F0238EBA67824444BC1CB4700960CB4804EAC60E@dmpeints002.isotach.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

(Sorry for the repeat email Tom, I forgot the cc's)

Tom Lane wrote:
> "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> > I initially thought that what you meant by "having
> > transactions enter "delta" entries" was that I have a trigger
> > that would create a row each time it was called, writing how
> > many records where inserted or deleted. I didn't understand
> > how this would be much of an improvement, as each of my rows
> > would contain either +1 or -1.
>
> Well, ideally you'd aggregate all the deltas caused by a particular
> transaction into one entry in the counting table.

Yes, that's what I was attempting to do, but without changing the application code.

Using txid_current() can do that, so each of a large number of individual inserts or deletes within a transaction updates the same delta row for that transaction. I haven't found any references to this being a solution, and thought it was worth recording.

> > However I'm not after a fast count(*) from table, but more
> like a fast
> > select grouping_id, count(*) from my_table group by grouping_id
>
> You could apply the same technique across each group id, though this
> certainly is getting beyond what any built-in feature might offer.

Agreed. I've tried it out now, and am fairly happy with what I've got.

> > Can you clarify the lack of MVCC problems?
>
> The point there is that the "right answer" actually depends on the
> observer, since each observer might have a different snapshot and
> therefore be able to "see" a different set of committed rows in the
> underlying table. The multiple-delta table handles this
> automatically,
> because you can "see" a delta entry if and only if you could "see"
> the underlying-table changes it represents.
>
> > Does this idea apply with the same efficiency in pre 8.3,
> non-HOT implementations?
>
> I didn't claim it was amazingly efficient in any implementation ;-).
> HOT in particular is nearly useless since most rows in the count
> table will never be updated, only inserted and eventually deleted.
> You might get some mileage on the base row, but that'd be about it.
> The count table will need frequent vacuums as well as frequent
> aggregation scans.
>
> It should beat scanning a large underlying table, but it's hardly
> gonna be free.

What I was asking about with those questions, is if a single row is inserted in transaction A, and updated 1000 times still within transaction A, then transaction A commits... does a single row version get written, or 1001, 1000 of which are removable?

I tested this with a small summary table freshly vacuum full'ed.

10 transactions, one after the other, each transaction inserted three delta rows, and updates one of them three times, and the other two five times each.
So 3 inserts and 13 updates per transaction.
The updates only affect non-indexed fields in rows created in the same transaction.

The next vacuum full found 52 removable row versions.

I repeated the test, and got 13 removable row versions.

I repeated the test again, and got 13 removable row versions.

I repeated just one of the ten transactions, 13 removable row versions.

All inserts and updates are probably in the one page that has a fair bit of free space.

Is it possible to update the HOT code to re-use row versions on the same page if they were created in the same transaction?

Conclusion: making use of txid_current(), I can get single delta rows with deltas of 10000, but doing so creates 10000 dead row versions.

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.
__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__________________________________________________________________

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc:	"PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-19 03:03:55
Message-ID:	20927.1205895835@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> What I was asking about with those questions, is if a single row is inserted in transaction A, and updated 1000 times still within transaction A, then transaction A commits... does a single row version get written, or 1001, 1000 of which are removable?

Umm ... AFAICS there is no need for an UPDATE to touch the count table
at all. You'd only need ON INSERT and ON DELETE triggers.

regards, tom lane

From:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-19 03:22:43
Message-ID:	F0238EBA67824444BC1CB4700960CB4804EAC74C@dmpeints002.isotach.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote
> "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> > What I was asking about with those questions, is if a
> > single row is inserted in transaction A, and updated 1000
> > times still within transaction A, then transaction A
> > commits... does a single row version get written, or 1001,
> > 1000 of which are removable?
>
> Umm ... AFAICS there is no need for an UPDATE to touch the count table
> at all. You'd only need ON INSERT and ON DELETE triggers.

I'm not referring to updates of my base table... the single row inserted was referring to the delta row...

I'm trying to group together a large number of +1 or -1 deltas into a single delta per transaction.

A cut down example:

CREATE TABLE document_type_summary_detail
(
document_type_id integer NOT NULL,
txid bigint NOT NULL DEFAULT 0,
documents bigint NOT NULL DEFAULT 0,
CONSTRAINT pk_document_type_summary PRIMARY KEY (document_type_id, txid)
);

CREATE OR REPLACE FUNCTION document_count_trig()
RETURNS TRIGGER AS
$$
DECLARE
BEGIN
IF TG_OP = 'INSERT' THEN
UPDATE document_type_summary_detail set documents=documents+1 where document_type_id = NEW.document_type_id and txid=txid_current();
IF NOT FOUND THEN
INSERT INTO document_type_summary_detail (document_type_id,documents,txid) VALUES(NEW.document_type_id,1,txid_current());
END IF;
RETURN NEW;
ELSIF TG_OP = 'DELETE' THEN
UPDATE document_type_summary_detail set documents=documents-1 where document_type_id = OLD.document_type_id and txid=txid_current();
IF NOT FOUND THEN
INSERT INTO document_type_summary_detail (document_type_id,documents,txid) VALUES(OLD.document_type_id,-1,txid_current());
END IF;
RETURN OLD;
END IF;
END;
$$
LANGUAGE 'plpgsql';

create trigger document_count_trig before insert or delete on document for each row execute procedure document_count_trig();

--one off setup:
insert into document_type_summary_detail (document_type_id,txid,documents)
select dt.id, 0, count(d.id) from document_type dt left join document d on d.document_type_id = dt.id
group by 1,2;

--useful view:
CREATE OR REPLACE VIEW document_type_summary AS SELECT document_type_id, sum(documents) AS documents FROM document_type_summary_detail GROUP BY document_type_id;

--scheduled cleanup:
CREATE OR REPLACE FUNCTION document_type_summary_aggregate() RETURNS void AS
$$
BEGIN
INSERT INTO document_type_summary_detail(document_type_id) select distinct document_type_id from document_type_summary_detail except select document_type_id from document_type_summary_detail where txid=0;
UPDATE document_type_summary_detail set documents=v.documents from document_type_summary as v where document_type_summary_detail.document_type_id = v.document_type_id and document_type_summary_detail.txid=0 and exists (select 1 from document_type_summary_detail ss where ss.document_type_id = document_type_summary_detail.document_type_id and ss.txid <> 0);
DELETE FROM document_type_summary_detail where txid <> 0;
END;
$$
LANGUAGE 'plpgsql';

My assumption is that this solves the "locking causes serialisation of transactions" problem as the only rows updated are those inserted by the same transaction.

Regards,
Stephen Denne.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc:	"PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-19 03:26:46
Message-ID:	21266.1205897206@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> Tom Lane wrote
>> Umm ... AFAICS there is no need for an UPDATE to touch the count table
>> at all. You'd only need ON INSERT and ON DELETE triggers.

> I'm not referring to updates of my base table... the single row inserted was referring to the delta row...

> I'm trying to group together a large number of +1 or -1 deltas into a single delta per transaction.

Oh. You can't realistically do that from the level of individual ON
INSERT/DELETE triggers --- as you've found out, you end up with about
the same number of dead rows anyway. Collapsing a transaction's changes
into a single row would require keeping some intra-transaction state,
which is do-able at the C-code level but not from plpgsql.

regards, tom lane

From:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-19 05:31:16
Message-ID:	47E0A524.3010406@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
>
>> What I was asking about with those questions, is if a single row is inserted in transaction A, and updated 1000 times still within transaction A, then transaction A commits... does a single row version get written, or 1001, 1000 of which are removable?
>>
>
> Umm ... AFAICS there is no need for an UPDATE to touch the count table
> at all. You'd only need ON INSERT and ON DELETE triggers.
>

This returns to the question of whether count of the whole table is
useful, or whether count of a GROUP BY or WHERE is useful. If GROUP BY
or WHERE is useful, then trigger on UPDATE becomes necessary.

What is the direction here? Is it count of the whole table only? (<--
not interesting to me)

Or count of more practical real life examples, which I completely agree
with Greg, that this gets into the materialized view realm, and becomes
very interesting.

In my current db project, I never count all of the rows in a table.
However, I do use count(*) with GROUP BY and WHERE.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

From:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To:	"Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-19 21:54:31
Message-ID:	F0238EBA67824444BC1CB4700960CB4804EAC8F5@dmpeints002.isotach.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Mark Mielke wrote
> This returns to the question of whether count of the whole table is useful, or whether
> count of a GROUP BY or WHERE is useful.
> If GROUP BY or WHERE is useful, then trigger on UPDATE becomes necessary.

True... for the example I gave I should have had an update trigger on my table.
I neglected it as I based the example of a more complex multi-table example from my application, where I know that the value I'm grouping by doesn't change.

> What is the direction here? Is it count of the whole table only? (<-- not interesting to me)
> Or count of more practical real life examples, which I completely agree with Greg,
> that this gets into the materialized view realm, and becomes very interesting.
> In my current db project, I never count all of the rows in a table. However, I do use count(*) with GROUP BY and WHERE.

I'm trying to figure out how close I can come to a useful efficient materialized view with current production builds of postgresql,
and identifying areas where changes to postgresql could make it easier.

Currently I can see three areas of concern:

1) Turning it on
My query to initially populate the materialized view table takes 6 hours on my (limited hardware) development system, which highlights the problem of when do you turn on the triggers.
An outage is one way to guarantee that there are neither missing details nor double counted details.
Would turning on the triggers and then running my initial population query in the same transaction work?

2) Table bloat
I'm trying to group together a large number of +1 or -1 deltas into a single delta per transaction.
This creates as many dead rows as there are updates.
This is also a problem with Pavan's suggestion of maintaining a counter table.
They can all be HOT updates in 8.3, but they still all create new row versions.

Tom says "Collapsing a transaction's changes
into a single row would require keeping some intra-transaction state,
which is do-able at the C-code level but not from plpgsql."

So can I hope that this problem is solvable as a contrib module that will work with at least 8.3?
I'd still want to write plpgsql trigger function myself, as I know the aggregation rules, and call contributed functions to integrate with the collapsing of the transaction's changes into a single row. (Expect I'd need a small number of rows per transaction, as I need to create delta rows for each value of my grouping field involved in the transaction.

3) How to incorporate the deltas.
With my technique, if the transaction rolls back the delta record becomes a dead row, if it commits, the delta is then visible to transaction started after this time.
I need to have a regular process run to sum and remove the deltas, rewriting the summary rows.
I'd like to be able to write an after-commit trigger that fires after changes in dependent tables that I identify, which can add the transaction's deltas to the summary table. I would want it to effectively be a new, small transaction.

I think that if these three areas are addressed, then before even considering writing code to automatically convert any given view into a materialized view.

2) and 3) could perhaps be implemented with a per transaction map from my_custom_key to a prepared statement and a list of parameter values.
Provide access to the values, making them updateable within the transaction. Have the statement automatically executed on commit.

Pavan also refers to deferred triggers, which has got me thinking about another possible solution:

Instead of inserting a delta row, that will be updated a lot of times, create an on commit drop temp table named after the txid and the grouping value (which is an integer in my case, perhaps hash it if you're grouping by something that doesn't easily convert to part of a table name),
create an after insert initially deferred constraint to call a function which will move the 'at commit' values of the rows in the temp table to the summary table.
The temp table will only have one row inserted, updated many times, then on commit the trigger is fired once, and the temp table is dropped.

Does anyone think this will or won't work for some reason?

Regards,
Stephen Denne.

From:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc:	"Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-20 07:00:42
Message-ID:	2e78013d0803200000l34e306a3j515447017b58a58d@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 20, 2008 at 3:24 AM, Stephen Denne
<Stephen(dot)Denne(at)datamail(dot)co(dot)nz> wrote:

>
> Pavan also refers to deferred triggers, which has got me thinking about another possible solution:
>
> Instead of inserting a delta row, that will be updated a lot of times, create an on commit drop temp table named after the txid and the grouping value (which is an integer in my case, perhaps hash it if you're grouping by something that doesn't easily convert to part of a table name),
> create an after insert initially deferred constraint to call a function which will move the 'at commit' values of the rows in the temp table to the summary table.
> The temp table will only have one row inserted, updated many times, then on commit the trigger is fired once, and the temp table is dropped.
>
> Does anyone think this will or won't work for some reason?

I think this should work, although you may need to take some extra steps
to manage the summary table. Also, I think a single temp table per transaction
should suffice. The temp table would have one row per "group by" or "where"
condition on which you want to track the count. The corresponding row will
be updated as and when the corresponding count changes. You would need
INSERT/DELETE/UPDATE triggers to do that. If there are any subtransaction
aborts, that will be taken care by MVCC.

As you said, a single deferred trigger would then merge the temp table with
the summary table. Here we need to be extra careful because serializable
transactions may fail to update the same row in the summary table. One solution
I can think of is (and I haven't checked the archives, so somebody might have
already suggested this before):

Each summary table will have one summary row per "group by" or "where" condition
(same as temp table). In addition to that, it can have zero or more temporary
rows for the conditions.

- select summary_rows from summary table for update nowait;
- if lock is not available, insert our deltas into the summary table as a new
row. These rows will be deleted as soon as some other transaction gets lock
on the summary rows and merge our deltas with them
- if lock is available
- merge our deltas with the summary rows
- check for other temporary deltas and merge them with the summary rows
and delete those temporary rows

I guess we can write the generic triggers as contrib module. What needs to done
is to let user specify the tables and the conditions on which they want to track
count(*) and then apply those conditions in the generic triggers.

Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

From:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	"Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-03-25 00:42:17
Message-ID:	F0238EBA67824444BC1CB4700960CB4804EACCFC@dmpeints002.isotach.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Pavan Deolasee wrote
> On Thu, Mar 20, 2008 at 3:24 AM, Stephen Denne
> <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> wrote:
>
> >
> > Pavan also refers to deferred triggers, which has got me
> thinking about another possible solution:
> >
> > Instead of inserting a delta row, that will be updated a
> lot of times, create an on commit drop temp table named after
> the txid and the grouping value (which is an integer in my
> case, perhaps hash it if you're grouping by something that
> doesn't easily convert to part of a table name),
> > create an after insert initially deferred constraint to
> call a function which will move the 'at commit' values of the
> rows in the temp table to the summary table.
> > The temp table will only have one row inserted, updated
> many times, then on commit the trigger is fired once, and the
> temp table is dropped.
> >
> > Does anyone think this will or won't work for some reason?
>
>
> I think this should work, although you may need to take some
> extra steps
> to manage the summary table. Also, I think a single temp
> table per transaction
> should suffice. The temp table would have one row per "group
> by" or "where"
> condition on which you want to track the count. The
> corresponding row will
> be updated as and when the corresponding count changes. You would need
> INSERT/DELETE/UPDATE triggers to do that. If there are any
> subtransaction
> aborts, that will be taken care by MVCC.

Thanks for that. I had gone ahead and tried out the idea, and it was working 'ok'.

Using one table per transaction has the benefit of less temp tables (but the same number of triggers waiting to run). It also removes the grouping key from the table name.

I was using a single table per grouping key, with a single updated row in it.
The benefit was simpler queries, and I could create an ON INSERT trigger that would be triggered only once when the temp table was created, and a 'zero' row was inserted, thereby separating the setup of the trigger from the maintenance of the delta.

I haven't explored the transactional implications of updating vs inserting delta rows in the summary table at the time of transaction commit. The code below updates the summary table, which I think could lead to a large delay or deadlocks if there are other styles of updates on that table (other than on-commit triggers)?

I also hadn't considered sub-transactions.

Below is a cut-down version of what I tried out.

I was summarizing more than just the number of documents shown below, I was storing a sum, and two maximums of timestamps (using the 'greatest' function for aggregating each record). These were extra fields in both the summary table and the temp tables.
This is able to be made more generic by changing get_temp_table_name() to take an additional couple of parameters specifying the name of the function to run at commit, and an identifying key (eg 'doc_count' vs 'doc_size_sum'), or perhaps including the delta value too, (which looks like it would simplify the triggers on the tables whose changes we wish to summarize, except that it doesn't cater for storing greatest or least aggregates.)

I took a wild guess at a way of finding out whether the temp table already exists:
not exists(select tablename from pg_catalog.pg_tables where tablename=temp_delta_txid_group)
Is there a better/safer way?

Here's part of the code I've got at the moment (edited here to cut it down to the core example, so it may contain errors):

CREATE TABLE doc_type_summary (
document_type_id integer NOT NULL,
documents bigint NOT NULL DEFAULT 0,
CONSTRAINT pk_doc_type_summary PRIMARY KEY (document_type_id)
);

CREATE OR REPLACE FUNCTION process_delta() RETURNS TRIGGER AS
$$
BEGIN
EXECUTE 'UPDATE doc_type_summary set documents=doc_type_summary.documents+d.documents from ' || TG_ARGV[0] || ' as d where document_type_id = ' || TG_ARGV[1];
RETURN NULL;
END;
$$
LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION get_temp_table_name(document_type_id integer)
RETURNS text AS
$$
DECLARE
temp_delta_txid_group text;
BEGIN
temp_delta_txid_group := 'temp_delta_' || txid_current() || '_' || document_type_id;
IF not exists(select tablename from pg_catalog.pg_tables where tablename=temp_delta_txid_group) THEN
EXECUTE 'CREATE TEMP TABLE ' || temp_delta_txid_group || '(documents bigint NOT NULL DEFAULT 0) ON COMMIT DROP';
EXECUTE 'CREATE CONSTRAINT TRIGGER ' || temp_delta_txid_group || '_trig AFTER INSERT ON ' || temp_delta_txid_group || ' DEFERRABLE INITIALLY DEFERRED FOR EACH ROW EXECUTE PROCEDURE process_delta ("' || temp_delta_txid_group || '", ' || document_type_id || ')';
EXECUTE 'INSERT INTO ' || temp_delta_txid_group || ' DEFAULT VALUES';
END IF;
RETURN temp_delta_txid_group;
END;
$$
LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION document_count_trig()
RETURNS TRIGGER AS
$$
DECLARE
temp_delta_txid_group text;
BEGIN
IF TG_OP = 'INSERT' THEN
temp_delta_txid_group := get_temp_table_name(NEW.document_type_id);
EXECUTE 'UPDATE ' || temp_delta_txid_group || ' set documents=documents+1';
RETURN NEW;
ELSIF TG_OP = 'DELETE' THEN
temp_delta_txid_group := get_temp_table_name(OLD.document_type_id);
EXECUTE 'UPDATE ' || temp_delta_txid_group || ' set documents=documents-1';
RETURN OLD;
END IF;
END;
$$
LANGUAGE 'plpgsql';

--
Regards,
Stephen Denne.

From:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	"Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-16 01:50:53
Message-ID:	F0238EBA67824444BC1CB4700960CB48051D5D45@dmpeints002.isotach.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

(There is a possible performance bug mentioned at the end of the email, the rest is further discussion regarding materialised views)

I wrote
> Pavan Deolasee wrote
> > On Thu, Mar 20, 2008 at 3:24 AM, Stephen Denne
> > <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> wrote:
> >
> > >
> > > Pavan also refers to deferred triggers, which has got me
> > thinking about another possible solution:
> > >
> > > Instead of inserting a delta row, that will be updated a
> > lot of times, create an on commit drop temp table named after
> > the txid and the grouping value (which is an integer in my
> > case, perhaps hash it if you're grouping by something that
> > doesn't easily convert to part of a table name),
> > > create an after insert initially deferred constraint to
> > call a function which will move the 'at commit' values of the
> > rows in the temp table to the summary table.
> > > The temp table will only have one row inserted, updated
> > many times, then on commit the trigger is fired once, and the
> > temp table is dropped.
> > >
> > > Does anyone think this will or won't work for some reason?
> >
> >
> > I think this should work, although you may need to take some
> > extra steps
> > to manage the summary table. Also, I think a single temp
> > table per transaction
> > should suffice. The temp table would have one row per "group
> > by" or "where"
> > condition on which you want to track the count. The
> > corresponding row will
> > be updated as and when the corresponding count changes. You
> would need
> > INSERT/DELETE/UPDATE triggers to do that. If there are any
> > subtransaction
> > aborts, that will be taken care by MVCC.
>
> Thanks for that. I had gone ahead and tried out the idea, and
> it was working 'ok'.
>
> Using one table per transaction has the benefit of less temp
> tables (but the same number of triggers waiting to run). It
> also removes the grouping key from the table name.
>
> I was using a single table per grouping key, with a single
> updated row in it.
> The benefit was simpler queries, and I could create an ON
> INSERT trigger that would be triggered only once when the
> temp table was created, and a 'zero' row was inserted,
> thereby separating the setup of the trigger from the
> maintenance of the delta.

One temp table per grouping key would presumably allocate at least one disk page per grouping key.
This might result in pretty poor performance. Though if the number of updates per grouping key is large, HOT would have plenty of room on the page to write new row versions.

Does creation & dropping of temp tables result in system catalog bloat?

> I took a wild guess at a way of finding out whether the temp
> table already exists:
> not exists(select tablename from pg_catalog.pg_tables
> where tablename=temp_delta_txid_group)
> Is there a better/safer way?

Answering my own question:
There is a better way to do what I was doing, (not sure about a better way to check existence of a temp table though)...

A custom variable class can be set up and used to record whether the transaction in question has been set up. (Thanks to Andreas Kretschmer for pointing those out in another thread on -general)

Alter this setting within postgresql.conf to add 'mv':
custom_variable_classes = 'mv'

Add this setting to postgresql.conf:
mv.initialized = 'false'

Then only set it to true local to the transaction.

Usage within a trigger:

IF NOT (current_setting('mv.initialized')::boolean) THEN
-- trigger a deferred constraint function:
INSERT INTO mv_txid_doc_type_summary VALUES (txid_current());
PERFORM set_config('mv.initialized', 'true', true);
END IF;

The custom variable class can also be used to aggregate the deltas within a transaction, though there are some minor difficulties:

1) They only store text
2) You can only find out that a value has not been set by catching an exception
3) There is no way to list the settings.

The main benefit is that changing a variable's setting does not write a new row version.
Creating new transactionally scoped variables seems to take around 150 to 200 bytes of the process ram per variable, (depending on the size of the value stored).
The time to create them for the first time for a connection seems to vary a bit, between 200 and 1000 per minute. No IO is being performed, CPU is at 100%
These statistics gathered when trying variations of this query:

select count(*) from (select set_config('mv.test11.' || s1.s1, s2.s2::text, true) from (select s1 from generate_series(1,1000) as s1) as s1, (select s2 from generate_series(1,1000) as s2) as s2) as s3;

Which sets each one of 1000 variables to 1000 different values. This can take a few minutes, but strangely a second connection starting the same query just after the first one, can finish in seconds.

Regards,
Stephen Denne.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-16 01:55:58
Message-ID:	20299.1208310958@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> The custom variable class can also be used to aggregate the deltas within a transaction, though there are some minor difficulties:

> 1) They only store text
> 2) You can only find out that a value has not been set by catching an exception
> 3) There is no way to list the settings.

As for 2) and 3), can't you look into the pg_settings view?

> The time to create them for the first time for a connection seems to
vary a bit, between 200 and 1000 per minute.

Yeah, that path is not optimized at all because it was never considered
performance-critical. Updating an existing variable should be cheaper.

regards, tom lane

From:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-16 02:40:29
Message-ID:	F0238EBA67824444BC1CB4700960CB48036E82A4@dmpeints002.isotach.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

(apologies for the email format & top posting, I've had to temporarily switch to using a web interface from home that doesn't seem to know what plain text is)

pg_settings view doesn't contain custom variables created on the fly, (nor, from memory, ones defined in postgresql.conf. I'm not able to check and confirm that at the moment).
Fixing that would satisfy 2 & 3 nicely.
The docs on them say the ones in postgresql.conf are to be used by modules when they initialize, as the values to use when setting up actual real server variables, (able to store more types than just text), which I presume would appear in pg_settings.

Updating existing variables is much quicker, over 100000 updates per second.

Regards,
Stephen Denne.

________________________________

From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Wed 16/04/2008 1:55 p.m.

> 1) They only store text
> 2) You can only find out that a value has not been set by catching an exception
> 3) There is no way to list the settings.

As for 2) and 3), can't you look into the pg_settings view?

> The time to create them for the first time for a connection seems to
vary a bit, between 200 and 1000 per minute.

Yeah, that path is not optimized at all because it was never considered
performance-critical. Updating an existing variable should be cheaper.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-16 02:55:50
Message-ID:	21051.1208314550@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
>> As for 2) and 3), can't you look into the pg_settings view?

> pg_settings view doesn't contain custom variables created on the fly,

Really? [ pokes around ... ] Hm, you're right, because
add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this
usage it'll never be cleared. I wonder if we should change that.

The whole thing is a bit of an abuse of what the mechanism was intended
for, and so I'm not sure we should rejigger GUC's behavior to make it
more pleasant, but on the other hand if we're not ready to provide a
better substitute ...

regards, tom lane

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-16 07:38:38
Message-ID:	20080416093838.55834d92@mha-laptop
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> > From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> >> As for 2) and 3), can't you look into the pg_settings view?
>
> > pg_settings view doesn't contain custom variables created on the
> > fly,
>
> Really? [ pokes around ... ] Hm, you're right, because
> add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this
> usage it'll never be cleared. I wonder if we should change that.
>
> The whole thing is a bit of an abuse of what the mechanism was
> intended for, and so I'm not sure we should rejigger GUC's behavior
> to make it more pleasant, but on the other hand if we're not ready to
> provide a better substitute ...

While I agree with that part, is there any actual *reason* why we
shouldn't have the custom variables included in pg_settings? ISTM that
it would be usable in cases that aren't an abuse as well...

//Magnus

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-16 14:27:43
Message-ID:	7401.1208356063@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Tom Lane wrote:
>> Really? [ pokes around ... ] Hm, you're right, because
>> add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this
>> usage it'll never be cleared. I wonder if we should change that.
>>
>> The whole thing is a bit of an abuse of what the mechanism was
>> intended for, and so I'm not sure we should rejigger GUC's behavior
>> to make it more pleasant, but on the other hand if we're not ready to
>> provide a better substitute ...

> While I agree with that part, is there any actual *reason* why we
> shouldn't have the custom variables included in pg_settings?

IIRC, the motivation for doing that was to not expose a completely bogus
set of attributes for a variable whose defining C-module hadn't been
loaded yet.

I thought about this in the shower just now, and ISTM that if we want to
turn this into an actual feature rather than a kluge, there needs to be
some sort of "define variable" command that sets up a custom variable
and specifies its type (and whatever other properties seem worth
setting). IOW expose the DefineCustomFooVariable functions to SQL users.

I'd be a bit inclined to restrict the namespace that can be set up that
way, eg allow only "local." or "session." as the prefix. Maybe
that's just being too anal, but we could guarantee not to introduce
colliding built-in GUCs in future releases, whereas people trying to
define variables with any random name would definitely be at risk.

Comments?

regards, tom lane

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-16 14:51:22
Message-ID:	4806126A.60302@joeconway.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I've needed it myself before -- I think it is a good idea.

> IIRC, the motivation for doing that was to not expose a completely bogus
> set of attributes for a variable whose defining C-module hadn't been
> loaded yet.
>
> I thought about this in the shower just now, and ISTM that if we want to
> turn this into an actual feature rather than a kluge, there needs to be
> some sort of "define variable" command that sets up a custom variable
> and specifies its type (and whatever other properties seem worth
> setting). IOW expose the DefineCustomFooVariable functions to SQL users.
>
> I'd be a bit inclined to restrict the namespace that can be set up that
> way, eg allow only "local." or "session." as the prefix. Maybe
> that's just being too anal, but we could guarantee not to introduce
> colliding built-in GUCs in future releases, whereas people trying to
> define variables with any random name would definitely be at risk.
>
> Comments?

Would it make sense to have built-in GUCs belong to "pg_catalog." and
user defined GUCs default to "public."?

Joe

From:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-16 23:19:49
Message-ID:	F0238EBA67824444BC1CB4700960CB48051D5FCC@dmpeints002.isotach.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote
> "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> > From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> >> As for 2) and 3), can't you look into the pg_settings view?
>
> > pg_settings view doesn't contain custom variables created
> on the fly,
>
> Really? [ pokes around ... ] Hm, you're right, because
> add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this
> usage it'll never be cleared. I wonder if we should change that.
>
> The whole thing is a bit of an abuse of what the mechanism
> was intended
> for, and so I'm not sure we should rejigger GUC's behavior to make it
> more pleasant, but on the other hand if we're not ready to provide a
> better substitute ...

In my experiments with materialized views, I identified these problems as "minor" difficulties. Resolving them would allow further abuse ;)

Aside: It is currently more cumbersome to get a function to run, if needed, at commit. Ideal solution would be something like "EXECUTE ON COMMIT my_function()" or maybe "SAVEPOINT my_name ON COMMIT my_function()", but these suggestions are made without investigating what provision the SQL standard has made to address this need.

My use of mv.initialized means I can create variables when initializing a transaction, and afterwards know that they have values, but what I can't easily do is use those variables to identify which grouping keys have been updated. To do that I select & conditionally insert to a table for that explicit purpose. If select doesn't find the key, then I create variables named after that key, with zero values.

Performance and efficiency-wise.... which would be better way of keeping track of grouping keys used in a transaction?:
1) Create a temp table, on commit drop, for the transaction, storing grouping keys affected.
2) Use a persistent table, storing txid and grouping keys affected, deleting txid rows at commit.
3) Use pg_settings, storing tx local grouping keys affected, existence check via catching an exception, listing via checking existence for all possible values (a possibility in my scenario).

Speed is my priority, low disk IO is a probable means to that end, which is why I investigated using variables.

Basically, (3) isn't a viable option, so what are the trade-offs between creating a temporary table per transaction, or using rows in a permanent table with a txid column?

Here are some more plpgsql code fragments:

mv := 'mv.' || view_name || '.' || key_value || '.';

When recording a grouping key as being affected by the transaction, create the variables with zeroes:

PERFORM set_config(mv||'documents', '0', true);
PERFORM set_config(mv||'last_addition', 'null', true);

In an insert trigger:

PERFORM set_config(mv||'documents', (current_setting(mv||'documents')::bigint + 1)::text, true);
PERFORM set_config(mv||'last_addition', now()::text, true);

In the defferred till commit trigger:

UPDATE materialized_view set
documents=documents+current_setting(mv||'documents')::bigint,
last_addition=greatest(last_addition,nullif(current_setting(mv||'last_addition'),'null')::timestamp)
where
group_id = key_values.key_value;

Regards,
Stephen Denne.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-17 00:20:59
Message-ID:	5522.1208391659@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> Aside: It is currently more cumbersome to get a function to run, if needed, at commit. Ideal solution would be something like "EXECUTE ON COMMIT my_function()" or maybe "SAVEPOINT my_name ON COMMIT my_function()", but these suggestions are made without investigating what provision the SQL standard has made to address this need.

There is none, and the reason seems pretty obvious to me. What if your
"on commit" function fails? Or if you have two, and the second one
fails? Or even more to the point, the second one does something that
the first one expected to see the effects of?

Transaction commit is an exceedingly subtle and carefully structured
thing. Throwing random user-defined code into it ain't gonna happen.

regards, tom lane

From:	PFC <lists(at)peufeu(dot)com>
To:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-17 00:30:44
Message-ID:	op.t9qbpigrcigqcu@apollo13.peufeu.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> The whole thing is a bit of an abuse of what the mechanism
>> was intended
>> for, and so I'm not sure we should rejigger GUC's behavior to make it
>> more pleasant, but on the other hand if we're not ready to provide a
>> better substitute ...
>
> In my experiments with materialized views, I identified these problems
> as "minor" difficulties. Resolving them would allow further abuse ;)

Let's try this quick & dirty implementation of a local count-delta cache
using a local in-memory hashtable (ie. {}).
Writing the results to stable storage in an ON COMMIT trigger is left as
an exercise to the reader ;)
Performance isn't that bad, calling the trigger takes about 50 us.
Oldskool implementation with a table is at the end, it's about 10x slower.

Example :

INSERT INTO victim1 (key) VALUES ('one'),('two'),('two');
INSERT 0 3
Temps : 1,320 ms
test=# SELECT * FROM get_count();
key | cnt
-----+-----
two | 2
one | 1

CREATE OR REPLACE FUNCTION clear_count( )
RETURNS VOID
AS $$
GD.clear()
$$ LANGUAGE plpythonu;

CREATE OR REPLACE FUNCTION update_count( key TEXT, delta INTEGER )
RETURNS INTEGER
AS $$
if key in GD:
GD[key] += delta
else:
GD[key] = delta
return GD[key]
$$ LANGUAGE plpythonu;

CREATE TYPE count_data AS ( key TEXT, cnt INTEGER );

CREATE OR REPLACE FUNCTION get_count( )
RETURNS SETOF count_data
AS $$
return GD.iteritems()
$$ LANGUAGE plpythonu;

CREATE TABLE victim( id SERIAL PRIMARY KEY, key TEXT NOT NULL );
INSERT INTO victim (key) SELECT (random() * 300)::INTEGER::TEXT FROM
generate_series( 1,100000 );

CREATE TABLE victim1( id SERIAL PRIMARY KEY, key TEXT NOT NULL );

\timing
INSERT INTO victim1 SELECT * FROM victim;
TRUNCATE TABLE victim1;

SELECT clear_count();
INSERT INTO victim1 SELECT * FROM victim RETURNING update_count( key, 1 );
SELECT * FROM get_count();
TRUNCATE TABLE victim1;

CREATE OR REPLACE FUNCTION counter_trigger_f()
RETURNS TRIGGER
LANGUAGE plpgsql
AS
$$
DECLARE
BEGIN
IF TG_OP = 'INSERT' THEN
PERFORM update_count( NEW.key, 1 );
RETURN NEW;
ELSEIF TG_OP = 'UPDATE' THEN
-- update topic
IF NEW.key != OLD.key THEN
PERFORM update_count( OLD.key, -1 ), update_count( NEW.key, 1
);
END IF;
RETURN NEW;
ELSE -- DELETE
PERFORM update_count( OLD.key, -1 );
RETURN OLD;
END IF;
END;
$$;

CREATE TRIGGER count_trigger BEFORE INSERT OR UPDATE OR DELETE ON victim1
FOR EACH ROW EXECUTE PROCEDURE counter_trigger_f();

SELECT clear_count();
INSERT INTO victim1 SELECT * FROM victim;
SELECT * FROM get_count();

SELECT clear_count();
TRUNCATE TABLE victim1;
INSERT INTO victim1 (key) VALUES ('one'),('two'),('two');
SELECT * FROM get_count();
DELETE FROM victim1 WHERE key='two';
SELECT * FROM get_count();
UPDATE victim1 SET key='three' WHERE key='one';
SELECT * FROM get_count();
DELETE FROM victim1;
SELECT * FROM get_count();

CREATE TABLE counts( key TEXT PRIMARY KEY, total INTEGER NOT NULL DEFAULT
0 );

CREATE OR REPLACE FUNCTION table_counter_trigger_f()
RETURNS TRIGGER
LANGUAGE plpgsql
AS
$$
DECLARE
BEGIN
IF TG_OP = 'INSERT' THEN
UPDATE counts SET total=total+1 WHERE key=NEW.key;
IF NOT FOUND THEN INSERT INTO counts (key,total) VALUES
(NEW.key,1); END IF;
RETURN NEW;
ELSEIF TG_OP = 'UPDATE' THEN
-- update topic
IF NEW.key != OLD.key THEN
UPDATE counts SET total=total-1 WHERE key=OLD.key;
UPDATE counts SET total=total+1 WHERE key=NEW.key;
IF NOT FOUND THEN INSERT INTO counts (key,total) VALUES
(NEW.key,1); END IF;
END IF;
RETURN NEW;
ELSE -- DELETE
UPDATE counts SET total=total-1 WHERE key=OLD.key;
RETURN OLD;
END IF;
END;
$$;

CREATE TABLE victim2( id SERIAL PRIMARY KEY, key TEXT NOT NULL );

CREATE TRIGGER table_count_trigger BEFORE INSERT OR UPDATE OR DELETE ON
victim2 FOR EACH ROW EXECUTE PROCEDURE table_counter_trigger_f();
SELECT * FROM counts;
TRUNCATE TABLE victim2;
INSERT INTO victim2 SELECT * FROM victim;

From:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-17 00:33:41
Message-ID:	F0238EBA67824444BC1CB4700960CB48051D6057@dmpeints002.isotach.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote
> Transaction commit is an exceedingly subtle and carefully structured
> thing. Throwing random user-defined code into it ain't gonna happen.

Deferred constraint triggers currently run random user-defined code. This'll do me.

Regards,
Stephen Denne.

From:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To:	"PFC" <lists(at)peufeu(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-17 00:48:37
Message-ID:	F0238EBA67824444BC1CB4700960CB48051D6069@dmpeints002.isotach.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

PFC wrote:
> Let's try this quick & dirty implementation of a local
> count-delta cache
> using a local in-memory hashtable (ie. {}).

> CREATE OR REPLACE FUNCTION update_count( key TEXT, delta INTEGER )
> RETURNS INTEGER
> AS $$
> if key in GD:
> GD[key] += delta
> else:
> GD[key] = delta
> return GD[key]
> $$ LANGUAGE plpythonu;

Thanks for the code, this seems to be very much what I was looking for.

I don't know plpythonu (nor python), just read a few docs now:
"The global dictionary SD is available to store data between function calls. This variable is private static data. The global dictionary GD is public data, available to all Python functions within a session. Use with care."

Does session == transaction or connection?
I don't understand the difference between SD and GD, private and public. Where are the context boundaries?

Regards,
Stephen Denne.

From:	PFC <lists(at)peufeu(dot)com>
To:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-17 09:20:11
Message-ID:	op.t9qz7x2ncigqcu@apollo13.peufeu.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 17 Apr 2008 02:48:37 +0200, Stephen Denne
<Stephen(dot)Denne(at)datamail(dot)co(dot)nz> wrote:

> PFC wrote:
>> Let's try this quick & dirty implementation of a local
>> count-delta cache
>> using a local in-memory hashtable (ie. {}).
>
>> CREATE OR REPLACE FUNCTION update_count( key TEXT, delta INTEGER )
>> RETURNS INTEGER
>> AS $$
>> if key in GD:
>> GD[key] += delta
>> else:
>> GD[key] = delta
>> return GD[key]
>> $$ LANGUAGE plpythonu;
>
> Thanks for the code, this seems to be very much what I was looking for.
>
> I don't know plpythonu (nor python), just read a few docs now:

Learn Python, it is a really useful language ;)

> "The global dictionary SD is available to store data between function
> calls. This variable is private static data. The global dictionary GD is
> public data, available to all Python functions within a session. Use
> with care."
>
> Does session == transaction or connection?
> I don't understand the difference between SD and GD, private and public.
> Where are the context boundaries?

There is no sharing between processes, so
- both SD and GD are limited to the current session (connection, postgres
process), no shared memory is involved
- GD is global between all python functions (global)
- SD is specific to each python function (static)

The big gotcha is that these are all non-transactional : if you rollback,
GD and SD stay the same, and when you issue a query, you can assume the
state of SD and GD is random (due to previous queries) unless you
initialize them to a known value.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Joe Conway <mail(at)joeconway(dot)com>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-17 16:00:52
Message-ID:	20728.1208448052@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> writes:
> Tom Lane wrote:
>> I thought about this in the shower just now, and ISTM that if we want to
>> turn this into an actual feature rather than a kluge, there needs to be
>> some sort of "define variable" command that sets up a custom variable
>> and specifies its type (and whatever other properties seem worth
>> setting). IOW expose the DefineCustomFooVariable functions to SQL users.
>>
>> I'd be a bit inclined to restrict the namespace that can be set up that
>> way, eg allow only "local." or "session." as the prefix. Maybe
>> that's just being too anal, but we could guarantee not to introduce
>> colliding built-in GUCs in future releases, whereas people trying to
>> define variables with any random name would definitely be at risk.

> Would it make sense to have built-in GUCs belong to "pg_catalog." and
> user defined GUCs default to "public."?

[ after a bit of reflection... ] I don't think that we want to tie
GUC names to schemas, especially not schemas that might not be there
(remember public is droppable). The existing scheme for qualified
GUC names considers that the prefix is the name of a loadable module,
which isn't typically tied to any particular schema.

I kinda like "session" as the prefix since it helps remind people that
these things will have session lifespan.

OTOH, there's a possibility for confusion with the SET SESSION syntax
("hm, did you mean SET SESSION foo = ... or SET session.foo = ...?").
"local" has got the same issue. Maybe "temp"?

regards, tom lane

From:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To:	"PFC" <lists(at)peufeu(dot)com>
Cc:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-17 20:32:01
Message-ID:	F0238EBA67824444BC1CB4700960CB48051D6243@dmpeints002.isotach.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

PFC wrote
> <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> wrote:
> > I don't know plpythonu (nor python), just read a few docs now:
>
> Learn Python, it is a really useful language ;)

My wife has a snake phobia, besides, I've just started learning Scala.

> There is no sharing between processes, so
> - both SD and GD are limited to the current session
> (connection, postgres
> process), no shared memory is involved
> - GD is global between all python functions (global)
> - SD is specific to each python function (static)

Thanks.

> The big gotcha is that these are all non-transactional
> : if you rollback,
> GD and SD stay the same, and when you issue a query, you can
> assume the
> state of SD and GD is random (due to previous queries) unless you
> initialize them to a known value.

Using txid_current() as a key should alleviate that.

Regards,
Stephen Denne.

From:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To:	"PFC" <lists(at)peufeu(dot)com>
Cc:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-17 20:43:13
Message-ID:	F0238EBA67824444BC1CB4700960CB48051D625A@dmpeints002.isotach.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> > The big gotcha is that these are all non-transactional
> > : if you rollback,
> > GD and SD stay the same, and when you issue a query, you can
> > assume the
> > state of SD and GD is random (due to previous queries) unless you
> > initialize them to a known value.
>
> Using txid_current() as a key should alleviate that.

No... hold on, it is per session, and a session can't have two or more transactions active at once can it?

(Though I hear of things called sub-transactions)

So the problem is that other functions may be using GD themselves, and your own code is at the mercy of the other functions. Conversely you shouldn't clear GD, as some other function may be using it.

So you're better off using a single function for everything, and using SD within it?

There isn't any way of telling whether the function is being called for the first time in a transaction. You don't know when to clear it.

Regards,
Stephen Denne.

From:	PFC <lists(at)peufeu(dot)com>
To:	"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc:	"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-04-17 23:32:53
Message-ID:	op.t9r3o3egcigqcu@apollo13.peufeu.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> My wife has a snake phobia, besides, I've just started learning Scala.

Just had a look at Scala, it looks nice. Slightly Lispish (like all good
languages)...

> txid_current()
> No... hold on, it is per session, and a session can't have two or more
> transactions active at once can it?

It could be used to detect rollback.

> So the problem is that other functions may be using GD themselves, and
> your own code is at the mercy of the other functions. Conversely you
> shouldn't clear GD, as some other function may be using it.

Exactly.

> So you're better off using a single function for everything, and using
> SD within it?

Since the purpose is to store counts for rows matching a certain criteria
in a set of tables, you could build a hashtable of hashtables, like :

GD[table name][criteria name][criteria value] = count

This would add complexity, about half a line of code. But you'd have to
create lots of plpgsql trigger functions to wrap it.

> There isn't any way of telling whether the function is being called for
> the first time in a transaction. You don't know when to clear it.

The first time in a session, GD will be empty.
Clearing it at the start of a transaction would not be useful (clearing
it at ROLLBACK would).
It is updating the "real" summary table with the contents of this hash
that is the problem, also.

So, basically, if you connect, do one insert, and disconnect, this would
be useless.
But, if you do a zillion inserts, caching the counts deltas in RAM would
be faster.
And if you use persistent connections, you could update the counts in the
real table only every N minutes, for instance, but this would need some
complicity from the backend.

> Regards,
> Stephen Denne.
>
> Disclaimer:
> At the Datamail Group we value team commitment, respect, achievement,
> customer focus, and courage. This email with any attachments is
> confidential and may be subject to legal privilege. If it is not
> intended for you please advise by reply immediately, destroy it and do
> not copy, disclose or use it in any way.
> __________________________________________________________________
> This email has been scanned by the DMZGlobal Business Quality
> Electronic Messaging Suite.
> Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
> __________________________________________________________________
>
>

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: count(*) performance improvement ideas
Date:	2008-06-23 22:12:20
Message-ID:	200806232212.m5NMCKc03022@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Added to TODO:

* Allow custom variables to appear in pg_settings()

---------------------------------------------------------------------------

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > Tom Lane wrote:
> >> Really? [ pokes around ... ] Hm, you're right, because
> >> add_placeholder_variable() sets the GUC_NO_SHOW_ALL flag, and in this
> >> usage it'll never be cleared. I wonder if we should change that.
> >>
> >> The whole thing is a bit of an abuse of what the mechanism was
> >> intended for, and so I'm not sure we should rejigger GUC's behavior
> >> to make it more pleasant, but on the other hand if we're not ready to
> >> provide a better substitute ...
>
> > While I agree with that part, is there any actual *reason* why we
> > shouldn't have the custom variables included in pg_settings?
>
> IIRC, the motivation for doing that was to not expose a completely bogus
> set of attributes for a variable whose defining C-module hadn't been
> loaded yet.
>
> I thought about this in the shower just now, and ISTM that if we want to
> turn this into an actual feature rather than a kluge, there needs to be
> some sort of "define variable" command that sets up a custom variable
> and specifies its type (and whatever other properties seem worth
> setting). IOW expose the DefineCustomFooVariable functions to SQL users.
>
> I'd be a bit inclined to restrict the namespace that can be set up that
> way, eg allow only "local." or "session." as the prefix. Maybe
> that's just being too anal, but we could guarantee not to introduce
> colliding built-in GUCs in future releases, whereas people trying to
> define variables with any random name would definitely be at risk.
>
> Comments?
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +