Re: WIP: Fast GiST index build

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Fast GiST index build
Date: 2011-08-10 19:45:18
Message-ID: 4E42DFCE.3050904@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10.08.2011 13:19, Alexander Korotkov wrote:
> Hi!
>
> Here is last verion of the patch.
> List of changes:
> 1) Neighbor relocation and prefetch were removed. They will be supplied as
> separate patches.

unloadNodeBuffers() is now dead code.

> 2) Final emptying now using standart lists of all buffers by levels.
> 3) Automatic switching again use simple comparison of index size and
> effective_cache_size.

LEAF_PAGES_STATS_* are unused now. Should avoid calling smgrnblocks() on
every tuple, the overhead of that could add up.

> 4) Some renames. In particular GISTLoadedPartItem
> to GISTBufferingInsertStack.
> 5) Some comments were corrected and some were added.
> 6) pgindent
> 7) rebased with head
>
> Readme update and user documentation coming soon.

I wonder, how hard would it be to merge gistBufferingBuildPlaceToPage()
with the gistplacetopage() function used in the main codepath? There's
very little difference between them, and it would be nice to maintain
just one function. At the very least I think there should be a comment
in both along the lines of "NOTE: if you change this function, make sure
you update XXXX (the other function) as well!"

In gistbuild(), in the final emptying stage, there's this special-case
handling for the root block before looping through the buffers in the
buffersOnLevels lists:

> for (;;)
> {
> nodeBuffer = getNodeBuffer(gfbb, &buildstate.giststate, GIST_ROOT_BLKNO,
> InvalidOffsetNumber, NULL, false);
> if (!nodeBuffer || nodeBuffer->blocksCount <= 0)
> break;
> MemoryContextSwitchTo(gfbb->context);
> gfbb->bufferEmptyingStack = lcons(nodeBuffer, gfbb->bufferEmptyingStack);
> MemoryContextSwitchTo(buildstate.tmpCtx);
> processEmptyingStack(&buildstate.giststate, &insertstate);
> }

What's the purpose of that? Wouldn't the loop through buffersOnLevels
lists take care of the root node too?

The calculations in initBuffering() desperately need comments. As does
the rest of the code too, but the heuristics in that function are
particularly hard to understand without some explanation.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-08-10 21:23:43 wal_sender_delay (WalSndDelay) has served its purpose
Previous Message Alexander Korotkov 2011-08-10 19:44:20 Re: WIP: Fast GiST index build