Re: Materialized views (Was Re: Improving count(*))

Lists: pgsql-hackers
From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Improving count(*)
Date: 2005-11-17 19:28:10
Message-ID: 1132255690.4959.207.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

One of the major complaints is always "Select count(*) is slow".

I have a somewhat broadbrush idea as to how we might do this (for larger
tables).

Previously, we've said that maintaining a running count of the tuples in
a table is hard, or at least costly. However, maintaining a partial
count may be somewhat easier and offer the performance required.

Bearing in mind other RDBMS' approach is to count the number of rows in
an index, their cost is probably about the same as scanning table
blocks/10 very roughly - so the cost is far from zero for them. In order
to get similar performance we would need a technique that involved
scanning a small percentage of the data blocks in a table, typically
less than 10%.

Prelims: As tuples are inserted they go into blocks in a varied
sequence. However, since only VACUUM ever reduces the size of the data
in a block, we notice that blocks eventually fill up. If they are on the
FSM, they are removed. At that point, we could begin keeping track of
the number of live tuples in the block.

So we could imagine an on-disk data structure that is an array of this
struct:
blockid - the block in question
count - the number of live tuples in that block
txnid - the txnid of the last tuple written

When we run a SELECT COUNT(*) we would read this data structure and use
it to build a bitmap of blocks that still need to be scanned to get the
count(*) result. So this would give us a partially cached result for
count(*), which then would require scanning only the last few blocks of
a table to get at the correct answer.

This is beginning to sound very much like the same sort of solution as
the one recently proposed for VACUUM: we keep track of the status of
each block, then use it as a way of speeding things up.

For VACUUM we want to keep a data structure that notes which blocks
require vacuums. For count(*) we want to keep track of which blocks do
not require vacuums, nearly. There are some blocks in the middle that
wouldn't go on either list, true.

So, why not combine the two purposes into a single solution?

We would be able to manage at least 800 data blocks for each block of
this structure, which would mean 1.25 MB per GB of data. For tables with
a reasonably non-random block write pattern the key parts of that could
reside in memory with reasonable ease even for busy tables. In those
cases, it also seems like this technique might lead to the need to scan
only a small percentage of heap blocks - so this would give roughly
equivalent performance to other RDBMS for SELECT count(*). If its a data
structure that we were thinking of maintaining for VACUUM anyway, this
improves the value of the cost we would have to pay to maintain the a
cache data structure.

This is thinking only, I'm not proposing this as a fully worked proposal
nor am I pursuing this myself. I can see immediately that there are some
major obstacles in the way, like MVCC, autovacuum, overheads etc but it
seems worth pointing out a possible angle of attack, since this looks
like it might hit two birds with one stone.

I'm not aiming to start "open season" on crazy ideas for this; this idea
may take months to ruminate into a solution.

Best Regards, Simon Riggs


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-17 19:38:16
Message-ID: 20051117193816.GH22933@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 17, 2005 at 07:28:10PM +0000, Simon Riggs wrote:
> One of the major complaints is always "Select count(*) is slow".
>
> I have a somewhat broadbrush idea as to how we might do this (for larger
> tables).

It's an interesting idea, but you still run into the issue of
visibility. If two people start a transaction, one of them inserts a
row and then both run a select count(*), they should get different
answers. I just don't see a way that your suggestion could possibly
lead to that result...

There is no unique answer to count(*), it all depends on who is looking
(sounds like relativity :) ). If you can sort that, you're well over
90% of the way.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-17 19:46:53
Message-ID: 36e682920511171146y2db6fc4fh629a76c9e10b696f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon,
Nice suggestion, I think it's workable but (like all other methods) has
some technical/pseudo-political challenges.
I'm still voting for my old, "Much Ado About COUNT(*)" topic; adding
visibiility to the indexes and counting them like the other RDBMS vendors.
True, it would add storage overhead that several people don't want, but such
as the life of the COUNT(*) discussion for PostgreSQL...
-Jonah

On 11/17/05, Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
>
> On Thu, Nov 17, 2005 at 07:28:10PM +0000, Simon Riggs wrote:
> > One of the major complaints is always "Select count(*) is slow".
> >
> > I have a somewhat broadbrush idea as to how we might do this (for larger
> > tables).
>
> It's an interesting idea, but you still run into the issue of
> visibility. If two people start a transaction, one of them inserts a
> row and then both run a select count(*), they should get different
> answers. I just don't see a way that your suggestion could possibly
> lead to that result...
>
> There is no unique answer to count(*), it all depends on who is looking
> (sounds like relativity :) ). If you can sort that, you're well over
> 90% of the way.
>
> Have a nice day,
> --
> Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> > tool for doing 5% of the work and then sitting around waiting for
> someone
> > else to do the other 95% so you can sue them.
>
>
>


From: Rod Taylor <pg(at)rbt(dot)ca>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-17 19:55:09
Message-ID: 1132257309.50945.78.camel@home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2005-11-17 at 20:38 +0100, Martijn van Oosterhout wrote:
> On Thu, Nov 17, 2005 at 07:28:10PM +0000, Simon Riggs wrote:
> > One of the major complaints is always "Select count(*) is slow".
> >
> > I have a somewhat broadbrush idea as to how we might do this (for larger
> > tables).
>
> It's an interesting idea, but you still run into the issue of
> visibility. If two people start a transaction, one of them inserts a
> row and then both run a select count(*), they should get different
> answers. I just don't see a way that your suggestion could possibly
> lead to that result...

The instant someone touches a block it would no longer be marked as
frozen (vacuum or analyze or other is not required) and count(*) would
visit the tuples in the block making the correct decision at that time.

--


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Rod Taylor <pg(at)rbt(dot)ca>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-17 21:09:09
Message-ID: 20051117210904.GK22933@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 17, 2005 at 02:55:09PM -0500, Rod Taylor wrote:
> On Thu, 2005-11-17 at 20:38 +0100, Martijn van Oosterhout wrote:
> > It's an interesting idea, but you still run into the issue of
> > visibility. If two people start a transaction, one of them inserts a
> > row and then both run a select count(*), they should get different
> > answers. I just don't see a way that your suggestion could possibly
> > lead to that result...
>
> The instant someone touches a block it would no longer be marked as
> frozen (vacuum or analyze or other is not required) and count(*) would
> visit the tuples in the block making the correct decision at that time.

Hmm, so the idea would be that if a block no longer contained any
tuples hidden from any active transaction, you could store the count
and skip reading that page. Ofcourse, as soon as someone UPDATEs a
tuple, that block comes into play again because it would be visible
from some but not other transactions. Then again, for count(*) UPDATEs
are irrelevent.

The other way, storing visibility in the index seems awfully expensive,
since any changes to the tuple would require updating the index. Still,
people have thought about this already, I'm sure the issues are
known...

Have a niceday,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-17 21:34:01
Message-ID: 8720.1132263241@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> Bearing in mind other RDBMS' approach is to count the number of rows in
> an index, their cost is probably about the same as scanning table
> blocks/10 very roughly - so the cost is far from zero for them.

Really? The impression I get is that people who ask for this expect the
answer to be instantaneous, ie they think the system will maintain a
running net total for each table. (In a non-MVCC system that isn't
necessarily an unreasonable thing to do.)

I really can't get excited about adding this level of complexity and
overhead to the system just to support COUNT(*)-with-no-WHERE slightly
better than we do now.

The triggers-and-deltas approach previously proposed seems considerably
more attractive to me, because (1) it's not invasive and (2) you only
have to pay the overhead on tables where you want it.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-17 21:34:08
Message-ID: 1132263248.4959.267.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2005-11-17 at 14:46 -0500, Jonah H. Harris wrote:

> Nice suggestion, I think it's workable but (like all other methods)
> has some technical/pseudo-political challenges.
>
> I'm still voting for my old, "Much Ado About COUNT(*)" topic; adding
> visibiility to the indexes and counting them like the other RDBMS
> vendors. True, it would add storage overhead that several people
> don't want, but such as the life of the COUNT(*) discussion for
> PostgreSQL...

[When no idea is good, we take the least-worst path. When we only have
one bad idea, nobody does anything. So we need a couple of ugly but
workable alternatives to flush out the one we will pick to resolve
things.]

As Martijn points out *any* solution must take account of visibility
rules. So abstracting from both these ideas gives shape to the solution,
which must be:
- a data structure smaller than the table itself
- including visibility data, explicitly/implicitly, exactly/lossily
- must serve multiple purposes to ensure the overhead of maintaining the
structure is amortised across many potential benefits, since various
needs share similar solution requirements

Would having visibility on an index be of use to a VACUUM?
Yes, I guess it could be. If we knew when the table was last vacuumed,
we could build a bitmap of changed blocks by scanning the index. We
would only need visibility on *one* of the indexes on a table, so
perhaps index visibility could be an option rather than a
one-size-fits-all.

Adding visibility to an index would add substantial bulk to any index.
If we could do this at the same time as adding leading key, full field
compression (*not* prefix compression), then it might be worth doing.

I would also note that DELETE would need to touch all visible index
rows, which currently is not required for btree indexes. (But as we
already noted, any solution must include visibility data and so any
solution must update some data structure on delete).

Index-only plans could help with various GROUP BY and join queries also,
so it certainly is attractive, though costly.

Best Regards, Simon Riggs


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-17 21:47:01
Message-ID: 1132264021.4959.281.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2005-11-17 at 16:34 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Bearing in mind other RDBMS' approach is to count the number of rows in
> > an index, their cost is probably about the same as scanning table
> > blocks/10 very roughly - so the cost is far from zero for them.
>
> Really? The impression I get is that people who ask for this expect the
> answer to be instantaneous, ie they think the system will maintain a
> running net total for each table. (In a non-MVCC system that isn't
> necessarily an unreasonable thing to do.)

Yeh. I think Informix keeps a running total, IIRC, but certainly Oracle
and various others do not.

People probably have given the impression that count(*) is
instantaneous, but that doesn't mean it actually is - they're just
talking up the problems of pg.

> I really can't get excited about adding this level of complexity and
> overhead to the system just to support COUNT(*)-with-no-WHERE slightly
> better than we do now.

Well, I was pointing out the cross-over with the requirements for a
faster VACUUM also. Taken together, it might be a winner.

> The triggers-and-deltas approach previously proposed seems considerably
> more attractive to me, because (1) it's not invasive and (2) you only
> have to pay the overhead on tables where you want it.

This would need to either be optional whichever way we did it, just as
with the creation of an index. I also think that taking the Oracle path
of adding new features in functions/packages would be a good thing,
rather than over-burdening the parser constantly with new syntax to cope
with.

Did the triggers-and-deltas approach cope with MVCC correctly?

Best Regards, Simon Riggs


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-17 22:11:09
Message-ID: 20051117221109.GL22933@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 17, 2005 at 09:34:08PM +0000, Simon Riggs wrote:
> Adding visibility to an index would add substantial bulk to any index.
> If we could do this at the same time as adding leading key, full field
> compression (*not* prefix compression), then it might be worth doing.

I think the single biggest problem with visibility-in-index is that
there is no link from the tuple to the index. So if you update a tuple,
the only way to update the index is the start from the top and go down
until you find it. If your table/index is of any size, you can imagine
the overhead will kill you.

Now, lets say you add a field to the tuple which you the position of
the index entry. You can only reasonably do this for one index, say the
primary key. Now you have a two-way link the updating becomes much
quicker, at the cost of even more overhead.

Doing it only for one index per table may be sensible anyway since you
don't really want to store visibility any more times than necessary.

> I would also note that DELETE would need to touch all visible index
> rows, which currently is not required for btree indexes. (But as we
> already noted, any solution must include visibility data and so any
> solution must update some data structure on delete).

Remember, UPDATE = DELETE + UPDATE, so you have to handle all updates
too. Inserts are the only easy case (well, except the fact that they
have to point to eachother. locking nastyness).

> Index-only plans could help with various GROUP BY and join queries also,
> so it certainly is attractive, though costly.

Only in cases where you don't need the data (ie EXISTS), otherwise you
still need the tuple.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-17 22:37:19
Message-ID: 20234.1132267039@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> Now, lets say you add a field to the tuple which you the position of
> the index entry. You can only reasonably do this for one index, say the
> primary key. Now you have a two-way link the updating becomes much
> quicker, at the cost of even more overhead.

I think this is fairly infeasible --- consider what it does to the cost
and (lack of) atomicity of an index page split, for instance.

regards, tom lane


From: mark(at)mark(dot)mielke(dot)cc
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Rod Taylor <pg(at)rbt(dot)ca>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-18 00:48:13
Message-ID: 20051118004813.GA531@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Probably obvious, and already mentioned, count(*) isn't the only query
that would benefit from visibility information in the index. It's
rather unfortunate that MVCC requires table lookups, when all values
queried or matched are found in the index key itself. The idea of an
all index table is appealing to me for some applications (Oracle
supports this, I believe?). In effect, a sorted, and searchable table,
that doesn't double in size, just because it is indexed.

So what's the real cost here? Larger index size to include the
visibility information (optional?) and UPDATE/DELETE need to
set expirations on the index rows as well as the table rows,
for only the indexes that have visibility information? A flag
in the table structure in memory to know whether the table has
any indexes with visibility information that require updating?

It doesn't sound that bad to me. Perhaps I just don't know better? :-)

The per-block counts idea, seem useful to me. A database that
frequently modifies every page of an index, would seem
inefficient. What if the per-block counts were kept, but associated
with index blocks instead of table blocks, for indexes that maintain
visibility information? The per-block counts only need to be able to
provide enough information for the reader to know whether the count is
valid, or invalid, perhaps updated at vacuum time?

The idea of a partial index, that keeps this information (visibility
information + per-block live row count cache) seems fascinating to
me. Many new optimization opportunities to hang myself with... :-)

Maybe PostgreSQL could be FASTER than other databases?

Or are we just dreaming?

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-18 04:13:39
Message-ID: 878xvmr3t8.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I think the important thing to keep track of is a single bit:

Which the following apply?

a) all of the tuples in this block are visible

b) at least one tuple in this block is in-doubt and may need to be vacuumed

That isn't enough to calculate count(*) on its own but it means you could scan
an index like any other database and avoid checking every single tuple. If the
tuple lies in a block that is known not to contain any in-doubt records then
the tuple can be counted immediately.

This has the advantage that it helps with a lot more cases than a simple
"select count(*) from tab". As Tom pointed out that case can be tackled more
directly with a O(1) solution anyways. More complex cases are where fast index
scans are really important.

So you could do "SELECT count(*) FROM tab WHERE a > ?" and have it scan an
index on <a> without having to check the visibility of every single tuple. It
only has to check the visibility of tuples that lie on blocks that contain at
least one in-doubt tuple.

You could even imagine using the new bitmap index scan machinery to combine
these bits of information too.

And this is exactly the same information that vacuum needs. Once vacuum has
run and cleaned out a block it knows whether there are any records that are
still in-doubt or whether every record it left is universally visible. It can
note that and allow future vacuums to skip that block if no deletes or inserts
have changed that bit since.

--
greg


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: mark(at)mark(dot)mielke(dot)cc
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Rod Taylor <pg(at)rbt(dot)ca>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-18 05:28:01
Message-ID: 437D6661.2040003@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mark(at)mark(dot)mielke(dot)cc wrote:
> Probably obvious, and already mentioned, count(*) isn't the only query
> that would benefit from visibility information in the index. It's
> rather unfortunate that MVCC requires table lookups, when all values
> queried or matched are found in the index key itself. The idea of an
> all index table is appealing to me for some applications (Oracle
> supports this, I believe?). In effect, a sorted, and searchable table,
> that doesn't double in size, just because it is indexed.
>

I've been thinking about that lately also. It seems like it would be
useful to have the entire table in a Btree in some situations, but there
are some drawbacks:
(1) probably hard to implement
(2) only works with one key
(3) since tuples would not be at a fixed location on disk, you can't
just use a noraml secondary index. The secondary index would have to
point to the key of the tuple in the Btree table, and then do another
lookup in the actual table.
(4) of course, insert performance goes down due to btree maintenence

Range queries (or queries on equality when there are many duplicates)
might benefit a lot. But I would think that in many situations, the fact
that you could only have one key indexed on the table would counteract
those benefits.

I haven't noticed any recent comments by the hackers on this subject.
Maybe they have some more details? I think MS SQL has something similar
to that also.

Regards,
Jeff Davis


From: Richard Huxton <dev(at)archonet(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-18 15:46:42
Message-ID: 437DF762.7080108@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> One of the major complaints is always "Select count(*) is slow".

Although there seem to have been plenty of ideas on this they all seem
to just provide a solution for the "whole table" case. It might be that
the solution provides other benefits, but for this one case it does seem
like a lot of work.

Might it be possible to apply rule-style rewriting to a clause of an
ordinary select query? That is, is it prohibitively expensive to get PG
to recognise
SELECT count(*) FROM big_table
and replace it with
SELECT sum(summary_count) FROM my_materialised_view

This should allow you to have where-clauses and apply to a range of
cases. What I fear is that checking to see if the rule applies will cost
too much on all those queries where it doesn't apply.

--
Richard Huxton
Archonet Ltd


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-18 18:35:04
Message-ID: 7711.1132338904@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Richard Huxton <dev(at)archonet(dot)com> writes:
> Might it be possible to apply rule-style rewriting to a clause of an
> ordinary select query? That is, is it prohibitively expensive to get PG
> to recognise
> SELECT count(*) FROM big_table
> and replace it with
> SELECT sum(summary_count) FROM my_materialised_view

> This should allow you to have where-clauses and apply to a range of
> cases. What I fear is that checking to see if the rule applies will cost
> too much on all those queries where it doesn't apply.

There is already code in the optimizer that does similar rewriting
for min/max queries. However, that's a hard-wired transformation.
I don't see any very simple way to provide a user-configurable
equivalent.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Richard Huxton <dev(at)archonet(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-18 20:03:36
Message-ID: 20051118200335.GD26861@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Richard Huxton <dev(at)archonet(dot)com> writes:
> > Might it be possible to apply rule-style rewriting to a clause of an
> > ordinary select query? That is, is it prohibitively expensive to get PG
> > to recognise
> > SELECT count(*) FROM big_table
> > and replace it with
> > SELECT sum(summary_count) FROM my_materialised_view
>
> > This should allow you to have where-clauses and apply to a range of
> > cases. What I fear is that checking to see if the rule applies will cost
> > too much on all those queries where it doesn't apply.
>
> There is already code in the optimizer that does similar rewriting
> for min/max queries. However, that's a hard-wired transformation.
> I don't see any very simple way to provide a user-configurable
> equivalent.

I guess there must be a query-rewriting mechanism for implementing
materialized views. With that in place we may be able to implement this
other thing ... Is anybody working on materialized views?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: mark(at)mark(dot)mielke(dot)cc
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-19 00:53:38
Message-ID: 20051119005338.GA22398@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 18, 2005 at 03:46:42PM +0000, Richard Huxton wrote:
> Simon Riggs wrote:
> >One of the major complaints is always "Select count(*) is slow".
> Although there seem to have been plenty of ideas on this they all seem
> to just provide a solution for the "whole table" case. It might be that
> the solution provides other benefits, but for this one case it does seem
> like a lot of work.

Or, it wasn't explained properly as to how the WHERE clause would
function.

The solution involving an index that has visibility information should
work fine with a WHERE clause. Only index rows that match the clause
are counted.

A solution enhancing the above mentioned indexes, to maintain a count
for whole index blocks, would allow whole index blocks that satisfy
the WHERE clause to be counted, assuming the whole index block is
visible in the current transaction.

Not to say these are the best ideas, or the only ideas - but it isn't
true that most of the solution presented only apply to the 'whole table'
case. The *simplest* solutions, apply only to the 'whole table' case. :-)

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Huxton <dev(at)archonet(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Improving count(*)
Date: 2005-11-19 01:22:34
Message-ID: 200511181722.34809.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro,

> I guess there must be a query-rewriting mechanism for implementing
> materialized views.  With that in place we may be able to implement this
> other thing ...  Is anybody working on materialized views?

I have a bundle of academic code designed to do exactly this, if any hacker
wants to take on the task of getting it into production shape.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Materialized views (Was Re: Improving count(*))
Date: 2005-11-19 09:49:53
Message-ID: Pine.OSF.4.61.0511191131110.143250@kosh.hut.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 18 Nov 2005, Josh Berkus wrote:

> Alvaro,
>
>> I guess there must be a query-rewriting mechanism for implementing
>> materialized views.  With that in place we may be able to implement this
>> other thing ...  Is anybody working on materialized views?
>
> I have a bundle of academic code designed to do exactly this, if any hacker
> wants to take on the task of getting it into production shape.

Could you post it to the list? I'd be interested to take a look, though
I'm afraid don't have the time to work on it.

I've been reading some papers on materialized views lately. Here's some
interesting ones:

Blakeley, Larson, Tompa: Efficiently Updating Materialized View
http://tinyurl.com/8hqeo
Describes a fairly simple algorithm for keeping select-project-join views
up to date.

Vista: View Maintenance in Relational and Deductive Databases by
Incremental Query Evaluation
http://tinyurl.com/exb8o
A survey of various algorithms.

Gupta, Mumick, Subrahmanian: Maintaining Views Incrementally
http://portal.acm.org/citation.cfm?id=170066
Extended abstract of a paper that presents two algorithms: one similar to
the Blakeley paper, and another one that can also handle recursion.

Ross, Srivastava, Sudarshan: Materialized View Maintenance and Integrity
Constraint Checking: Trading Space for Time
http://citeseer.ist.psu.edu/ross96materialized.html
Describes how materialized views can be used for implementing database
assertions.

- Heikki


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org, matview-devel(at)gborg(dot)postgresql(dot)org
Subject: Re: Materialized views (Was Re: Improving count(*))
Date: 2005-11-19 13:04:08
Message-ID: b0f3f5a10511190504o726ea00eube3fbeaad094349d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(CCed to the matview-devel mailing list)

On 11/19/05, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> I've been reading some papers on materialized views lately. Here's some
> interesting ones:

(snip)

You might want to take a look at the pages that I set up to track the
progress on my master's thesis:

<url:http://www.nicolas.barbier.easynet.be/itsme/thesis/>

especially the literature page:

<url:http://www.nicolas.barbier.easynet.be/itsme/thesis/literature/>

IMO, GL95, Qua97 and GM99 are the ones that are most applicable to
view maintenance with bag-semantics (thus, SQL). You should be able to
find all these papers with Google (Scholar) in case my computer is
shut down, otherwise you can download them directly from me.

Greetings,
Nicolas

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Materialized views (Was Re: Improving count(*))
Date: 2005-11-19 20:36:54
Message-ID: 200511191236.54703.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki,

> Could you post it to the list? I'd be interested to take a look, though
> I'm afraid don't have the time to work on it.

Yeah, I should put it up on pgFoundry. I'm not sure exactly where, though --
I don't want to launch a new project if it's not going to take off. Maybe
Bizgres.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Materialized views (Was Re: Improving count(*))
Date: 2005-11-20 20:31:05
Message-ID: Pine.OSF.4.61.0511202230250.262818@kosh.hut.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 19 Nov 2005, Josh Berkus wrote:

>> Could you post it to the list? I'd be interested to take a look, though
>> I'm afraid don't have the time to work on it.
>
> Yeah, I should put it up on pgFoundry. I'm not sure exactly where, though --
> I don't want to launch a new project if it's not going to take off. Maybe
> Bizgres.

There's also a "matview" project on gborg.

- Heikki


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, matview-devel(at)gborg(dot)postgresql(dot)org
Subject: Re: Materialized views (Was Re: Improving count(*))
Date: 2005-11-20 20:34:29
Message-ID: Pine.OSF.4.61.0511202231210.262818@kosh.hut.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 19 Nov 2005, Nicolas Barbier wrote:

> You might want to take a look at the pages that I set up to track the
> progress on my master's thesis:
>
> <url:http://www.nicolas.barbier.easynet.be/itsme/thesis/>
>
> especially the literature page:
>
> <url:http://www.nicolas.barbier.easynet.be/itsme/thesis/literature/>
>
> IMO, GL95, Qua97 and GM99 are the ones that are most applicable to
> view maintenance with bag-semantics (thus, SQL). You should be able to
> find all these papers with Google (Scholar) in case my computer is
> shut down, otherwise you can download them directly from me.

Thanks, interesting stuff.

BTW: Does the GL95 algorithm handle outer joins?

- Heikki


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org, matview-devel(at)gborg(dot)postgresql(dot)org
Subject: Re: Materialized views (Was Re: Improving count(*))
Date: 2005-11-21 14:19:08
Message-ID: b0f3f5a10511210619p5f37ca00h9a7f044e3aab61ef@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/20/05, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On Sat, 19 Nov 2005, Nicolas Barbier wrote:
>
> > You might want to take a look at the pages that I set up to track the
> > progress on my master's thesis:
> >
> > <url:http://www.nicolas.barbier.easynet.be/itsme/thesis/>
> >
> > especially the literature page:
> >
> > <url:http://www.nicolas.barbier.easynet.be/itsme/thesis/literature/>
> >
> > IMO, GL95, Qua97 and GM99 are the ones that are most applicable to
> > view maintenance with bag-semantics (thus, SQL). You should be able to
> > find all these papers with Google (Scholar) in case my computer is
> > shut down, otherwise you can download them directly from me.
>
> Thanks, interesting stuff.
>
> BTW: Does the GL95 algorithm handle outer joins?

No, but GM99 does (although only in the cases where it can be
applied). I guess that a slightly adapted version of the technique
from Qua97 can also be used. Investigating :-).

greetings,
Nicolas

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: mark(at)mark(dot)mielke(dot)cc
Cc: Richard Huxton <dev(at)archonet(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-22 23:11:01
Message-ID: 200511222311.jAMNB1Y00922@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mark(at)mark(dot)mielke(dot)cc wrote:
> On Fri, Nov 18, 2005 at 03:46:42PM +0000, Richard Huxton wrote:
> > Simon Riggs wrote:
> > >One of the major complaints is always "Select count(*) is slow".
> > Although there seem to have been plenty of ideas on this they all seem
> > to just provide a solution for the "whole table" case. It might be that
> > the solution provides other benefits, but for this one case it does seem
> > like a lot of work.
>
> Or, it wasn't explained properly as to how the WHERE clause would
> function.
>
> The solution involving an index that has visibility information should
> work fine with a WHERE clause. Only index rows that match the clause
> are counted.
>
> A solution enhancing the above mentioned indexes, to maintain a count
> for whole index blocks, would allow whole index blocks that satisfy
> the WHERE clause to be counted, assuming the whole index block is
> visible in the current transaction.

I think it would be very difficult to generate a per-index-page
visibility bit because I can't think of a normal database operation that
would allow us to update it. It requires that an index scan visit every
heap page to check the visibility of the tuples. However, we almost
never do a full-index scan because it is so much slower than a heap
scan. It would be easy to keep a heap-visible bit up-to-date (because
we do full-heap scans occasionally), but that would require the index
to load the heap page to find the bit. (The bit isn't on the index, it
is on the heap).

Jan has been talking about have a bitmap to track pages that need
vacuuming, and I am wondering if the same system could be used to track
the heap-dirty bits. Putting one bit on every 8k disk page means we have
to load the 8k page to read the bit, while a centralized bitmap would
load 64k page bits in a single 8k page. That one page would cover 500MB
of a table. Seems vacuum could use the same bitmap values.

Assuming we track 100 tables regularly, that would require 800k of shared
memory. We would have to pagein/out the bitmaps as needed, but we could
throw them away on a crash and rebuild as part of normal operation.

FSM has not been a concurrency bottleneck, so I am thinking this would
not be either.

I suppose it would require a new filesystem file for each table.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: mark(at)mark(dot)mielke(dot)cc, Richard Huxton <dev(at)archonet(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-23 00:07:03
Message-ID: 20051123000703.GG7086@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 22, 2005 at 06:11:01PM -0500, Bruce Momjian wrote:
> mark(at)mark(dot)mielke(dot)cc wrote:
> Jan has been talking about have a bitmap to track pages that need
> vacuuming, and I am wondering if the same system could be used to track
> the heap-dirty bits. Putting one bit on every 8k disk page means we have
> to load the 8k page to read the bit, while a centralized bitmap would
> load 64k page bits in a single 8k page. That one page would cover 500MB
> of a table. Seems vacuum could use the same bitmap values.
>
> Assuming we track 100 tables regularly, that would require 800k of shared
> memory. We would have to pagein/out the bitmaps as needed, but we could
> throw them away on a crash and rebuild as part of normal operation.
>
> FSM has not been a concurrency bottleneck, so I am thinking this would
> not be either.
>
> I suppose it would require a new filesystem file for each table.

ISTM that the requirements here are very similar to the requirements for
the FSM, at least from a high-level: Track all pages where condition X
is true. Is there value to using the same framework for both cases?
Maybe it makes more sense to store free space info for a relation using
a bitmap, rather than storing individual page numbers. Or conversely,
storing 'dirty page info' should maybe be done in a similar fasion to
FSM instead of a bitmap.

If we wanted to provide the ultimate in tunability we'd offer both
solutions; some tables will have a large number of pages with free space
on them (which would favor storing that info in a bitmap); likewise some
tables will have a small number of pages that are 'dirty', which would
favor storing a list of page numbers instead of a bitmap.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: mark(at)mark(dot)mielke(dot)cc
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Richard Huxton <dev(at)archonet(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-11-23 00:51:04
Message-ID: 20051123005104.GA450@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 22, 2005 at 06:11:01PM -0500, Bruce Momjian wrote:
> mark(at)mark(dot)mielke(dot)cc wrote:
> > A solution enhancing the above mentioned indexes, to maintain a count
> > for whole index blocks, would allow whole index blocks that satisfy
> > the WHERE clause to be counted, assuming the whole index block is
> > visible in the current transaction.
> I think it would be very difficult to generate a per-index-page
> visibility bit because I can't think of a normal database operation that
> would allow us to update it. It requires that an index scan visit every
> heap page to check the visibility of the tuples. However, we almost
> never do a full-index scan because it is so much slower than a heap
> scan. It would be easy to keep a heap-visible bit up-to-date (because
> we do full-heap scans occasionally), but that would require the index
> to load the heap page to find the bit. (The bit isn't on the index, it
> is on the heap).

Vacuum time?

> Jan has been talking about have a bitmap to track pages that need
> vacuuming, and I am wondering if the same system could be used to track
> the heap-dirty bits. Putting one bit on every 8k disk page means we have
> to load the 8k page to read the bit, while a centralized bitmap would
> load 64k page bits in a single 8k page. That one page would cover 500MB
> of a table. Seems vacuum could use the same bitmap values.

Sounds correct.

> Assuming we track 100 tables regularly, that would require 800k of shared
> memory. We would have to pagein/out the bitmaps as needed, but we could
> throw them away on a crash and rebuild as part of normal operation.
> FSM has not been a concurrency bottleneck, so I am thinking this would
> not be either.
> I suppose it would require a new filesystem file for each table.

*nod*

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, matview-devel(at)gborg(dot)postgresql(dot)org
Subject: Re: Materialized views (Was Re: Improving count(*))
Date: 2005-11-23 08:34:57
Message-ID: 200511230034.57819.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Guys,

I'm not going to get the university research up before American Thanksgiving.
Sorry. Look for it next week.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: mark(at)mark(dot)mielke(dot)cc, Richard Huxton <dev(at)archonet(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving count(*)
Date: 2005-12-01 22:33:32
Message-ID: 200512012233.jB1MXWf07787@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Add idea to TODO:

* Allow data to be pulled directly from indexes

Currently indexes do not have enough tuple visibility information
to allow data to be pulled from the index without also accessing
the heap. One way to allow this is to set a bit on index tuples
to indicate if a tuple is currently visible to all transactions
when the first valid heap lookup happens. This bit would have to
be cleared when a heap tuple is expired. Another idea is to maintain
a bitmap of heap pages where all rows are visible to all backends,
and allow index lookups to reference that bitmap to avoid heap
lookups, perhaps the same bitmap we might add someday to determine
which heap pages need vacuuming.

I am thinking we are at the point where between this usage and vacuum
that a bitmap for each table is something we should work on for 8.2.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> mark(at)mark(dot)mielke(dot)cc wrote:
> > On Fri, Nov 18, 2005 at 03:46:42PM +0000, Richard Huxton wrote:
> > > Simon Riggs wrote:
> > > >One of the major complaints is always "Select count(*) is slow".
> > > Although there seem to have been plenty of ideas on this they all seem
> > > to just provide a solution for the "whole table" case. It might be that
> > > the solution provides other benefits, but for this one case it does seem
> > > like a lot of work.
> >
> > Or, it wasn't explained properly as to how the WHERE clause would
> > function.
> >
> > The solution involving an index that has visibility information should
> > work fine with a WHERE clause. Only index rows that match the clause
> > are counted.
> >
> > A solution enhancing the above mentioned indexes, to maintain a count
> > for whole index blocks, would allow whole index blocks that satisfy
> > the WHERE clause to be counted, assuming the whole index block is
> > visible in the current transaction.
>
> I think it would be very difficult to generate a per-index-page
> visibility bit because I can't think of a normal database operation that
> would allow us to update it. It requires that an index scan visit every
> heap page to check the visibility of the tuples. However, we almost
> never do a full-index scan because it is so much slower than a heap
> scan. It would be easy to keep a heap-visible bit up-to-date (because
> we do full-heap scans occasionally), but that would require the index
> to load the heap page to find the bit. (The bit isn't on the index, it
> is on the heap).
>
> Jan has been talking about have a bitmap to track pages that need
> vacuuming, and I am wondering if the same system could be used to track
> the heap-dirty bits. Putting one bit on every 8k disk page means we have
> to load the 8k page to read the bit, while a centralized bitmap would
> load 64k page bits in a single 8k page. That one page would cover 500MB
> of a table. Seems vacuum could use the same bitmap values.
>
> Assuming we track 100 tables regularly, that would require 800k of shared
> memory. We would have to pagein/out the bitmaps as needed, but we could
> throw them away on a crash and rebuild as part of normal operation.
>
> FSM has not been a concurrency bottleneck, so I am thinking this would
> not be either.
>
> I suppose it would require a new filesystem file for each table.
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073