Re: why does count take so long?

From: Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To: Christopher Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: why does count take so long?
Date: 2003-09-10 15:16:59
Message-ID: 3F5F406B.9E8B5138@nsd.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Christopher,

You did not quite understand.

The counts in question is the actual counts (deltas) for the
transtactions.

the tuple should be something like:
xid, reltype, insert_count, update_count, delete_count

When a COUNT(*) is issued the commited tuples are totaled up stored as
xid 0 or whatever and the commited tuples deleted (all this with
appropriate locks).

I am not sure if there is a need for the update count.

JLL

Christopher Browne wrote:
>
> In the last exciting episode, jllachan(at)nsd(dot)ca (Jean-Luc Lachance) wrote:
> > How about keeping counts of inserts, deletes and updates per table per
> > transaction as part of the live statistics?
>
> Aye, there's the rub.
>
> That's exactly what _won't_ work, and that's exactly the case that is
> somewhat pathological under MVCC.
>
> With MVCC, data "appears as if by magic" when its transaction COMMITs,
> thereby revealing the rows to the world.
>
> Unfortunately, there's no simple way of making updates to counts
> "simply appear" when they take effect, not without turning the updates
> into a concurrency bottleneck.
>
> Here's a bit of a wild thought...
>
> Assume a table with schema as follows:
> create table pg_count (
> xid integer, --- Actually, it's not an integer, right, Andrew? :-(
> reltype oid,
> count integer
> );
>
> Every transaction, "xid," affects some number of tuples. So that for
> a transaction, #2134151 that adds 5 rows to table with oid 12345 and deletes 4
> rows from table with 45678, part of the transaction would include
> inserting these rows:
>
> insert into pg_count (xid, reltype, count) values (2134151, 12345, 5);
> insert into pg_count (xid, reltype, count) values (2134151, 45678, -4);
>
> In order to get the count for table 12345, you could then go to
> pg_count and request:
> select sum(count) from pg_count where reltype = 12345;
>
> The size of this table would increase every time a transaction gets
> committed, so presumably part of VACUUM TABLE would be a
> collection/summarization, thus...
>
> -- Collect existing stats into 1 row
> insert into pg_count(xid, reltype, count) values (currxid,
> currtable, select sum(count) from pg_count where reltype =
> currtable);
>
> -- Delete the old stats
> delete from pg_count where reltype = currtable and xid <> currxid;
>
> This will cope with concurrency reasonably well (modulo having to make
> sure that the "collect/delete" transaction is a serialized one).
>
> Unfortunately, if a table is updated frequently, the summary
> select sum(count) from pg_count where reltype = 12345;
> will have to collect together quite a large number of entries, which
> makes this "less cheap."
>
> That suggests an optimization; any time the COUNT is selected, the old
> stats can and should be collected into 1 row and the old data deleted.
> --
> wm(X,Y):-write(X),write('@'),write(Y). wm('aa454','freenet.carleton.ca').
> http://www3.sympatico.ca/cbbrowne/nonrdbms.html
> Sturgeon's Law: 90% of *EVERYTHING* is crud.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alessandro GARDICH 2003-09-10 15:18:27 message type 0x50
Previous Message Tom Lane 2003-09-10 14:45:12 Re: Question about conccurrency control and Insert