Re: index-only scans

From: Kääriäinen Anssi <anssi(dot)kaariainen(at)thl(dot)fi>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "postgres(at)cybertec(dot)at" <postgres(at)cybertec(dot)at>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: index-only scans
Date: 2011-08-13 20:35:14
Message-ID: BC19EF15D84DC143A22D6A8F2590F0A78864132F79@EXMAIL.stakes.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"""
Now, suppose that we know that 50% of the heap pages have their
visibility map bits set. What's the chance that this query won't need
a heap fetch? Well, the chance is 50% *if* you assume that a row
which has been quiescent for a long time is just as likely to be
queried as one that has been recently inserted or updated. However,
in many real-world use cases, nothing could be farther from the truth.

What do we do about that?
"""

The example is much more realistic if the query is a fetch of N latest rows from a table. Very common use case, and the whole relation's visibility statistics are completely wrong for that query. Wouldn't it be great if there was something like pg_stat_statements that would know the statistics per query, derived from usage...

Even if the statistics are not available per query, the statistics could be calculated from the relation usage: the weighted visibility of the pages would be pages_visible_when_read / total_pages_read for the relation. That percentage would minimize the average cost of the plans much better than just the non-weighted visibility percentage.

For the above example, if the usage is 90% read the N latest rows and we assume they are never visible, the weighted visibility percentage would be 10% while the non-weighted visibility percentage could be 90%. Even if the visibility percentage would be incorrect for the queries reading old rows, by definition of the weighted visibility percentage there would not be too many of them.

The same idea could of course be used to calculate the effective cache hit ratio for each table. Cache hit ratio would have the problem of feedback loops, though.

Of course, keeping such statistic could be more expensive than the benefit it gives. On the other hand, page hit percentage is already available...

- Anssi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2011-08-13 20:40:15 Re: our buffer replacement strategy is kind of lame
Previous Message Tom Lane 2011-08-13 20:10:47 Re: psql: bogus descriptions displayed by \d+