Re: posix_fadvise v22

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: posix_fadvise v22
Date: 2009-01-28 12:56:38
Message-ID: 87eiynlhbt.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> What I intend to do over the next day or so is commit the prefetch
> infrastructure and the bitmap scan prefetch logic, but I'm bouncing the
> indexscan part back for rework. I think that it should be implemented
> in or near index_getnext() and pay attention to
> effective_io_concurrency.

The biggest question I have here is about doing it at the index_* abstraction
level. I've looked pretty hard at how to do this and run into some pretty
major problems.

I don't see any way to do it without changing the access method api. The
index_* methods cannot start fiddling with the current scan position without
messing up things like CURRENT OF and mark/restore irrecoverably.

But if we're going to change the index am api then we lose all of the
advantages of putting the logic in indexam.c in the first place. It won't
help any other index am without special code in each one

The best plan I came up with at this level is to add an am method

<am>getprefetchbitmap(IndexScanDesc scan,
ScanDirection direction,
TIDBitmap *bitmap,
int nblocks)

Which returns up to nblocks worth of bitmap starting from the current scan
position in the specified direction based on whatever's convenient to the
internal representation. I think it would be best if it stopped at the end
of the page at least if the next index page isn't in shared buffers.

Then nodeIndexscan.c would keep track of how the value of target_prefetch
just like nodeBitmapHeapScan, incrementing it whenever the caller continues
the scan and resetting it to zero if the direction changes.

However the getprefetchbitmap() call would only remove duplicates from the
upcoming blocks. It wouldn't know which blocks have already been prefetched
from previous calls. So nodeIndexscan would also have to remove duplicates
itself.

This splitting the work between three layers of abstraction is pretty messy
and creates a lot of duplicated work and doesn't seem to buy us anything. It
*still* wouldn't help any non-btree index types until they add the new method
-- and if they add the new method they might as well just add the USE_PREFETCH
code anyways. I don't see how the new method is useful for anything else.

I have another plan which would be a lot simpler but is a much more
brute-force approach. If we add a new scan pointer in addition to the current
position and the mark and add a slew of new methods like getnext_prefetch()
and reset_prefetch() to reset it to the current position. Also, add a hash
table internal to PrefetchBuffer and have it return a boolean indicating
whether it actually did a prefetch. Then index_prefetch would reset the
prefetch pointer and scan forward using it calling PrefetchBuffer on *every*
pointer counting how many trues returns it sees from PrefetchBuffer.*

*[Hm, not quite, we want to count recent prefetches that probably are still
queued separately from old ones that are probably in cache. And we also have
to think about how long to treat prefetches as probably still being in cache.
But with some additional thought I think this could be made to work.]

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2009-01-28 13:22:39 Re: [COMMITTERS] pgsql: Silence compiler warning on win32.
Previous Message Simon Riggs 2009-01-28 12:42:34 Re: Hot standby, recovery infra