Re: Handling GIN incomplete splits

Lists: pgsql-hackers
From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Handling GIN incomplete splits
Date: 2013-11-13 16:49:55
Message-ID: 5283ADB3.9070306@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here's another part of my crusade against xlog cleanup routines. This
series of patches gets rid of the gin_cleanup() function, which is
currently used to finish splits of GIN b-tree pages, if the system
crashes (or an error occurs) between splitting a page and inserting its
downlink to the parent.

The first three patches just move code around. IMHO they make the code
more readable, so they should be committed in any case. The meat is in
the fourth patch.

Thoughts, objections?

Alexander, I'm sorry if this conflicts with your GIN patches. Feel free
to post the latest versions of your patches against the current master,
ignoring patches. I can fix the bitrot. That said, I think these
refactorings will make your code look a little bit nicer too, so you
might want to rebase because of that anyway.

- Heikki

Attachment Content-Type Size
0001-Further-GIN-refactoring.patch text/x-diff 15.6 KB
0002-Refactor-the-internal-GIN-B-tree-interface-for-formi.patch text/x-diff 7.5 KB
0003-More-GIN-refactoring.patch text/x-diff 9.9 KB
0004-Get-rid-of-the-post-recovery-cleanup-step-of-GIN-pag.patch text/x-diff 73.2 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Handling GIN incomplete splits
Date: 2013-11-19 12:48:24
Message-ID: CAB7nPqRQ0y8U4Kkx5DVsNZfz91CBkfAhZcrV66tjyvP=VkVYvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Here is a review of the first three patches:
1) Further gin refactoring:
make check passes (core tests and contrib tests).
Code compiles without warnings.
Then... About the patch... Even if I got little experience with code of
gin, moving the flag for search mode out of btree, as well as removing the
logic of PostingTreeScan really makes the code lighter and easier to follow.
Just wondering, why not simplifying as well ginTraverseLock:ginbtree.c at
the same time to something similar to that?
if (!GinPageIsLeaf(page) || searchMode == TRUE)
return access;

/* we should relock our page */
LockBuffer(buffer, GIN_UNLOCK);
LockBuffer(buffer, GIN_EXCLUSIVE);

/* But root can become non-leaf during relock */
if (!GinPageIsLeaf(page))
{
/* restore old lock type (very rare) */
LockBuffer(buffer, GIN_UNLOCK);
LockBuffer(buffer, GIN_SHARE);
}
else
access = GIN_EXCLUSIVE;
return access;
Feel free to discard as I can imagine that changing such code would make
back-branch maintenance more difficult and it would increase conflicts with
patches currently in development.
2) Refactoring of internal gin btree (needs patch 1 applied first):
make check passes (core tests and contrib tests).
Code compiles without warnings.
Yep, removing ginPageGetLinkItup makes sense. Just to be picky, I would
have put the arguments of GinFormInteriorTuple replacing ginPageGetLinkItup
in 3 separate lines just for lisibility.
In dataPrepareDownlink:gindatapage.c, depending on if lpage is a leaf page
or not, isn't it inconsistent with the older code not to use
GinDataPageGetItemPointer and GinDataPageGetPostingItem to set
btree->pitem.key.
In ginContinueSplit:ginxlog.c, could it be possible to remove this code? It
looks that its deletion has been forgotten:
/*

* elog(NOTICE,"ginContinueSplit root:%u l:%u r:%u", split->rootBlkno,
* split->leftBlkno, split->rightBlkno);
*/
Except the doubt about dataPrepareDownlink (related to my lack of knowledge
of the code), patch looks good.
3) More refactoring (needs patches 1 and 2):
make check passes (core tests and contrib tests).
Code compiles without warnings.
Perhaps this patch would have been easier to read with context diffs :) It
just moves code around so nothing to say.

Then, I have done a small test with all 3 patches applied. Test is done
with pg_trgm by uploading the book "Les Miserables":
=# CREATE TABLE les_miserables (num serial, line text);
CREATE TABLE
=# \copy les_miserables (line) FROM '~/Desktop/pg135.txt';
=# select count(*) from les_miserables;
count
-------
68116
(1 row)
=# CREATE INDEX les_miserables_idx ON les_miserables USING gin (line
gin_trgm_ops);
CREATE INDEX

And here is the result of this query (average of a couple of 5 runs):
=# explain analyse SELECT * FROM les_miserables where line ~~ '%Cosette%';
Vanilla server: 5.289 ms
With patch 1 only: 5.283 ms
With patches 1+2: 5.232 ms
With patches 1+2+3: 5.232 ms
Based on that there is no performance degradation.

I just began reading the 4th patch. As it is a bit more complex and needs
more testing, I'll provide feedback later.
Regards,

On Thu, Nov 14, 2013 at 1:49 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> Here's another part of my crusade against xlog cleanup routines. This
> series of patches gets rid of the gin_cleanup() function, which is
> currently used to finish splits of GIN b-tree pages, if the system crashes
> (or an error occurs) between splitting a page and inserting its downlink to
> the parent.
>
> The first three patches just move code around. IMHO they make the code
> more readable, so they should be committed in any case. The meat is in the
> fourth patch.
>
> Thoughts, objections?
>

--
Michael


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Handling GIN incomplete splits
Date: 2013-11-20 13:27:50
Message-ID: CAB7nPqQfokZjUbh-B4PWC3Rz8BiZaP+k=L4d9RcJnn667NZQXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here are some comments about the 4th patch.
1) Compiles without warnings, passes make check.
2) s/ginFinshSplit/ginFinishSplit
3) Server crashes when trying to create a gin index index creation (see
example of previous email with pg_trgm). Here is the backtrace of the crash:
* thread #1: tid = 0x15221, 0x00007fff8c59f866
libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread,
stop reason = signal SIGABRT
frame #0: 0x00007fff8c59f866 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff8e60735c libsystem_pthread.dylib`pthread_kill + 92
frame #2: 0x00007fff91586bba libsystem_c.dylib`abort + 125
frame #3: 0x000000010e953ed9
postgres`ExceptionalCondition(conditionName=0x000000010ea31055,
errorType=0x000000010e9b5973, fileName=0x000000010ea2fd3d, lineNumber=1175)
+ 137 at assert.c:54
frame #4: 0x000000010e79073a
postgres`UnpinBuffer(buf=0x000000010f37f9c0, fixOwner='\0') + 282 at
bufmgr.c:1175
frame #5: 0x000000010e793465 postgres`ReleaseBuffer(buffer=3169) + 565
at bufmgr.c:2540
frame #6: 0x000000010e414e43
postgres`freeGinBtreeStack(stack=0x00007fa2138adcf8) + 67 at ginbtree.c:196
frame #7: 0x000000010e415330
postgres`ginInsertValue(btree=0x00007fff51807e80, stack=0x00007fa2138a6dd8,
insertdata=0x00007fff51807e70, buildStats=0x00007fff51809fa0) + 1216 at
ginbtree.c:728
frame #8: 0x000000010e404ebf
postgres`ginEntryInsert(ginstate=0x00007fff51807fe0, attnum=1, key=7828073,
category='\0', items=0x0000000117d0ab28, nitem=76,
buildStats=0x00007fff51809fa0) + 1055 at gininsert.c:218
frame #9: 0x000000010e405ad6
postgres`ginbuild(fcinfo=0x00007fff5180a050) + 1590 at gininsert.c:392
frame #10: 0x000000010e9609ba
postgres`OidFunctionCall3Coll(functionId=2738, collation=0,
arg1=4693605424, arg2=4693760992, arg3=140334089145912) + 186 at fmgr.c:1649
frame #11: 0x000000010e4f4b30
postgres`index_build(heapRelation=0x0000000117c2bc30,
indexRelation=0x0000000117c51be0, indexInfo=0x00007fa213888e38,
isprimary='\0', isreindex='\0') + 464 at index.c:1963
frame #12: 0x000000010e4f2f07
postgres`index_create(heapRelation=0x0000000117c2bc30,
indexRelationName=0x00007fa213888b30, indexRelationId=16445, relFileNode=0,
indexInfo=0x00007fa213888e38, indexColNames=0x00007fa2138892d8,
accessMethodObjectId=2742, tableSpaceId=0,
collationObjectId=0x00007fa213889330, classObjectId=0x00007fa213889350,
coloptions=0x00007fa213889370, reloptions=0, isprimary='\0',
isconstraint='\0', deferrable='\0', initdeferred='\0',
allow_system_table_mods='\0', skip_build='\0', concurrent='\0',
is_internal='\0') + 3591 at index.c:1082
frame #13: 0x000000010e5da885
postgres`DefineIndex(stmt=0x00007fa213888b90, indexRelationId=0,
is_alter_table='\0', check_rights='\x01', skip_build='\0', quiet='\0') +
4181 at indexcmds.c:595
frame #14: 0x000000010e7dd4a3
postgres`ProcessUtilitySlow(parsetree=0x00007fa21384b530,
queryString=0x00007fa21384a838, context=PROCESS_UTILITY_TOPLEVEL,
params=0x0000000000000000, dest=0x00007fa21384b918,
completionTag=0x00007fff5180b020) + 2931 at utility.c:1163
frame #15: 0x000000010e7dc4d7
postgres`standard_ProcessUtility(parsetree=0x00007fa21384b530,
queryString=0x00007fa21384a838, context=PROCESS_UTILITY_TOPLEVEL,
params=0x0000000000000000, dest=0x00007fa21384b918,
completionTag=0x00007fff5180b020) + 3511 at utility.c:873
frame #16: 0x000000010e7db719
postgres`ProcessUtility(parsetree=0x00007fa21384b530,
queryString=0x00007fa21384a838, context=PROCESS_UTILITY_TOPLEVEL,
params=0x0000000000000000, dest=0x00007fa21384b918,
completionTag=0x00007fff5180b020) + 185 at utility.c:352
frame #17: 0x000000010e7db0e5
postgres`PortalRunUtility(portal=0x00007fa213889638,
utilityStmt=0x00007fa21384b530, isTopLevel='\x01', dest=0x00007fa21384b918,
completionTag=0x00007fff5180b020) + 325 at pquery.c:1187
frame #18: 0x000000010e7da002
postgres`PortalRunMulti(portal=0x00007fa213889638, isTopLevel='\x01',
dest=0x00007fa21384b918, altdest=0x00007fa21384b918,
completionTag=0x00007fff5180b020) + 514 at pquery.c:1318
frame #19: 0x000000010e7d95c4
postgres`PortalRun(portal=0x00007fa213889638, count=9223372036854775807,
isTopLevel='\x01', dest=0x00007fa21384b918, altdest=0x00007fa21384b918,
completionTag=0x00007fff5180b020) + 964 at pquery.c:816
frame #20: 0x000000010e7d4d77
postgres`exec_simple_query(query_string=0x00007fa21384a838) + 1207 at
postgres.c:1048
frame #21: 0x000000010e7d3fc1 postgres`PostgresMain(argc=1,
argv=0x00007fa21301a3c0, dbname=0x00007fa21301a228,
username=0x00007fa21301a208) + 2753 at postgres.c:3992
frame #22: 0x000000010e75868c
postgres`BackendRun(port=0x00007fa212e00240) + 700 at postmaster.c:4085
frame #23: 0x000000010e757c81
postgres`BackendStartup(port=0x00007fa212e00240) + 433 at postmaster.c:3774
frame #24: 0x000000010e7544fe postgres`ServerLoop + 606 at
postmaster.c:1585
frame #25: 0x000000010e751d74 postgres`PostmasterMain(argc=3,
argv=0x00007fa212c041f0) + 5380 at postmaster.c:1240
frame #26: 0x000000010e6930fd postgres`main(argc=3,
argv=0x00007fa212c041f0) + 653 at main.c:196
Test has been done on OSX 10.9.

This is all I have for now... I will have a look at the code later, as this
patch changes quite a bit the internals of gin, it is going to take a
little bit of time.
Thanks,
--
Michael


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Handling GIN incomplete splits
Date: 2013-11-20 16:12:21
Message-ID: 528CDF65.8070004@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 19.11.2013 14:48, Michael Paquier wrote:
> Here is a review of the first three patches:
> 1) Further gin refactoring:
> make check passes (core tests and contrib tests).
> Code compiles without warnings.

Committed.

> Then... About the patch... Even if I got little experience with code of
> gin, moving the flag for search mode out of btree, as well as removing the
> logic of PostingTreeScan really makes the code lighter and easier to follow.
> Just wondering, why not simplifying as well ginTraverseLock:ginbtree.c at
> the same time to something similar to that?
> if (!GinPageIsLeaf(page) || searchMode == TRUE)
> return access;
>
> /* we should relock our page */
> LockBuffer(buffer, GIN_UNLOCK);
> LockBuffer(buffer, GIN_EXCLUSIVE);
>
> /* But root can become non-leaf during relock */
> if (!GinPageIsLeaf(page))
> {
> /* restore old lock type (very rare) */
> LockBuffer(buffer, GIN_UNLOCK);
> LockBuffer(buffer, GIN_SHARE);
> }
> else
> access = GIN_EXCLUSIVE;
> return access;
> Feel free to discard as I can imagine that changing such code would make
> back-branch maintenance more difficult and it would increase conflicts with
> patches currently in development.

Yeah, might be more readable to write it that way. There's a lot of
cleanup that could be done to the gin code, these patches are by no
means the end of it.

> 2) Refactoring of internal gin btree (needs patch 1 applied first):
> make check passes (core tests and contrib tests).
> Code compiles without warnings.

Committed.

> Yep, removing ginPageGetLinkItup makes sense. Just to be picky, I would
> have put the arguments of GinFormInteriorTuple replacing ginPageGetLinkItup
> in 3 separate lines just for lisibility.

Ok, did that.

> In dataPrepareDownlink:gindatapage.c, depending on if lpage is a leaf page
> or not, isn't it inconsistent with the older code not to use
> GinDataPageGetItemPointer and GinDataPageGetPostingItem to set
> btree->pitem.key.

Hmm. The old code in dataSplitPage() didn't use
GinDataPageGetItemPointer/PostingItem either.

The corresponding code in ginContinueSplit did, though. There was
actually an inconsistency there: the ginContinueSplit function took the
downlink's key from the last item on the page (using maxoff), while
dataSplitPage took it from the right bound using
GinDataPageGetRightBound(). Both are the same, dataSplitPage copies the
value from the last item to the right bound, so it doesn't make a
difference. They would diverge if the last item on the page is deleted,
though, so the old coding in ginContinueSplit was actually a bit suspicious.

> In ginContinueSplit:ginxlog.c, could it be possible to remove this code? It
> looks that its deletion has been forgotten:
> /*
>
> * elog(NOTICE,"ginContinueSplit root:%u l:%u r:%u", split->rootBlkno,
> * split->leftBlkno, split->rightBlkno);
> */

Yeah, that's just leftover debug code. But again, I'll leave that for
another patch (in fact, the whole function will go away with the fourth
patch, anyway).

> 3) More refactoring (needs patches 1 and 2):
> make check passes (core tests and contrib tests).
> Code compiles without warnings.
> Perhaps this patch would have been easier to read with context diffs :) It
> just moves code around so nothing to say.

Committed.

Thanks for the review! I'll let you finish the review of the fourth
patch. Meanwhile, I'll take another look at Alexander's gin packed
posting items patch, and see how badly these commits bitrotted it.
- Heikki


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Handling GIN incomplete splits
Date: 2013-11-21 12:44:29
Message-ID: 528E002D.4050303@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20.11.2013 15:27, Michael Paquier wrote:
> Here are some comments about the 4th patch.
> 1) Compiles without warnings, passes make check.
> 2) s/ginFinshSplit/ginFinishSplit
> 3) Server crashes when trying to create a gin index index creation (see
> example of previous email with pg_trgm).

Thanks, fixed!

Here's a new version. To ease the review, I split the remaining patch
again into two, where the first patch is just yet more refactoring.

I also fixed some bugs in WAL logging and replay that I bumped into
while testing.

- Heikki

Attachment Content-Type Size
0001-More-GIN-refactoring.patch text/x-diff 33.8 KB
0002-Get-rid-of-the-post-recovery-cleanup-step-of-GIN-pag.patch text/x-diff 60.7 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Handling GIN incomplete splits
Date: 2013-11-22 13:04:37
Message-ID: CAB7nPqShw2nHRkM+P41UWfMp3o8j2=3s2bteoiDx8ZEa30po2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 21, 2013 at 9:44 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> Here's a new version. To ease the review, I split the remaining patch again
> into two, where the first patch is just yet more refactoring.
>
> I also fixed some bugs in WAL logging and replay that I bumped into while
> testing.
Cool. Here is the review of the two remaining patches.
1) More refactoring, general remarks:
- Code compiles without warnings
- Passes make check
- If I got it correctly... this patch separates the part managing data
to be inserted from ginbtree. I can understand the meaning behind that
if we consider that GinBtree is used only to define methods and search
conditions (flags and keys). However I am wondering if this does not
make the code more complicated... Particularly the flag isDelete that
is only passed to ginxlogInsert meritates its comment IMO. Note that I
haven't read the 2nd patch when writing that :)
- With this patch, previous SELECT example takes 5.236 ms in average,
runtime does not change.
1-1) This block of code is repeated several times and should be
refactored into a single function:
/* During index build, count the new page */
if (buildStats)
{
if (btree->isData)
buildStats->nDataPages++;
else
buildStats->nEntryPages++;
}
Something with a function like that perhaps:
static void ginCountNewPage(GinBtree btree, GinStatsData *buildStats);
1-2) Could it be possible to change the variable name of
"GinBtreeEntryInsertData *entry" in entryIsEnoughSpace? entry->entry
is kind of hard to apprehend... Renaming it to either insertEntry.
Another idea would be to rename entry in GinBtreeEntryInsertData to
entryData or entryTuple.
1-3) This block of code is present two times:
+ if (!isleaf)
+ {
+ PostingItem *pitem = GinDataPageGetPostingItem(lpage, off);
+ PostingItemSetBlockNumber(pitem, updateblkno);
+ }
Should the update of a downlink to point to the next page be a
separate function?
2) post recovery cleanup:
- OK, so roughly the soul of this patch is to change the update
mechanism for a left child gin page so as the parent split is always
done first before any new data is inserted in this child. And this
ensures that we can remove the xlog cleanup mechanism for gin page
splits in the same fashion as gist... xlog redo mechanism is then
adapted according to that.
- Compilation fails becausze the flags GIN_SPLIT_ISLEAF and
GIN_SPLIT_ISDATA are missing in gin_private.h. The patch attached
fixes that though.
- With my additional patch, it passes make check, compilation shows no warnings.
- I did some tests with the patch:
-- Index creation time
vanilla: 3266.834
with the two patches: 3412.473 ms
-- Tried the new redo mechanism by simulating some server failures a
couple of times and saw no failures
- I am seeing similar run times for queries like the example used in
previous emails of this thread. So no problem on this side.
- And... Here are some comments about the code:
2-1) In ginFindParents, is the case where the stack has no parent
possible (aka the stack is the root itself)? Shouldn't this code path
check if root is NULL or not?
2-2) Not sure that this structure is in-line with the project policy:
struct
{
BlockNumber left;
BlockNumber right;
} children;
Why not adding a complementary structure in gin_private.h doing that?
It could be used as well in ginxlogSplit to specify a left/right
family of block numbers.
2-3) s/kepy/kept (comments of ginFinishSplit)
Other than that, the patch looks great. I particularly like the new
redo logic in ginxlog.c, the code is more easily understandable with
the split redo removed. Even if I am just a noob for this code, it is
nicely built and structured.

Regards,
--
Michael

Attachment Content-Type Size
0003-gin-flag-fix.patch text/plain 704 bytes

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Handling GIN incomplete splits
Date: 2013-11-27 13:47:10
Message-ID: 5295F7DE.70501@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/22/13 15:04, Michael Paquier wrote:
> On Thu, Nov 21, 2013 at 9:44 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> Here's a new version. To ease the review, I split the remaining patch again
>> into two, where the first patch is just yet more refactoring.
>>
>> I also fixed some bugs in WAL logging and replay that I bumped into while
>> testing.
> Cool. Here is the review of the two remaining patches.
> 1) More refactoring, general remarks:
> - Code compiles without warnings
> - Passes make check
> - If I got it correctly... this patch separates the part managing data
> to be inserted from ginbtree. I can understand the meaning behind that
> if we consider that GinBtree is used only to define methods and search
> conditions (flags and keys). However I am wondering if this does not
> make the code more complicated...

Well, I think it's an improvement in readability to separate the
insertion payload from the search information. With the second patch it
becomes more important, because if an incompletely split page is
encountered while you're inserting a value, you needs to insert the
downlink for the old incomplete split first, and then continue the
insertion of the original vaule where you left. That "context switching"
becomes a lot easier when value you're inserting is passed around
separately.

> Particularly the flag isDelete that
> is only passed to ginxlogInsert meritates its comment IMO. Note that I
> haven't read the 2nd patch when writing that :)

Yeah, that gets removed in the second patch, which changes the WAL
record format. :-)

> 1-2) Could it be possible to change the variable name of
> "GinBtreeEntryInsertData *entry" in entryIsEnoughSpace? entry->entry
> is kind of hard to apprehend... Renaming it to either insertEntry.
> Another idea would be to rename entry in GinBtreeEntryInsertData to
> entryData or entryTuple.

ok, renamed the variable to insertData.

> 1-3) This block of code is present two times:
> + if (!isleaf)
> + {
> + PostingItem *pitem = GinDataPageGetPostingItem(lpage, off);
> + PostingItemSetBlockNumber(pitem, updateblkno);
> + }
> Should the update of a downlink to point to the next page be a
> separate function?

Doesn't seem any better to me.

Thanks, committed this refactoring patch now. Will continue on the main
patch.

- Heikki


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Handling GIN incomplete splits
Date: 2013-11-27 17:40:39
Message-ID: CAMkU=1yBi=5d1jCgR_MzoOxMZo7xQZrDw7XN9H-+VmCFSDsM6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 13, 2013 at 8:49 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> Here's another part of my crusade against xlog cleanup routines. This
> series of patches gets rid of the gin_cleanup() function, which is
> currently used to finish splits of GIN b-tree pages, if the system crashes
> (or an error occurs) between splitting a page and inserting its downlink to
> the parent.
>
> The first three patches just move code around. IMHO they make the code
> more readable, so they should be committed in any case. The meat is in the
> fourth patch.
>
> Thoughts, objections?
>
> Alexander, I'm sorry if this conflicts with your GIN patches. Feel free to
> post the latest versions of your patches against the current master,
> ignoring patches. I can fix the bitrot. That said, I think these
> refactorings will make your code look a little bit nicer too, so you might
> want to rebase because of that anyway.

Hi Heikki,

The commit 04eee1fa9ee80dabf7 of this series causes a self-deadlock in the
LWLock code during the operation below, with it trying to take
an LW_EXCLUSIVE on a high, even-numbered lockid when it already holds the
same lockid.

CREATE INDEX planet_osm_ways_nodes ON planet_osm_ways USING gin (nodes)
WITH (FASTUPDATE=OFF);

It happens pretty reliably using osm2pgsql.

I will try to come up with a simple reproducible demonstration, and stack
trace, over the weekend.

Cheers,

Jeff


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Handling GIN incomplete splits
Date: 2013-11-27 17:43:59
Message-ID: 52962F5F.5030208@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/22/13 15:04, Michael Paquier wrote:
> 2) post recovery cleanup:
> - OK, so roughly the soul of this patch is to change the update
> mechanism for a left child gin page so as the parent split is always
> done first before any new data is inserted in this child. And this
> ensures that we can remove the xlog cleanup mechanism for gin page
> splits in the same fashion as gist... xlog redo mechanism is then
> adapted according to that.

> - I did some tests with the patch:
> -- Index creation time
> vanilla: 3266.834
> with the two patches: 3412.473 ms

Hmm. I didn't expect any change in index creation time. Is that
repeatable, or within the margin of error?

> 2-1) In ginFindParents, is the case where the stack has no parent
> possible (aka the stack is the root itself)? Shouldn't this code path
> check if root is NULL or not?

No. A root split doesn't need to find the parent (because there isn't
any), so we never call ginFindParents with a stack containing just the
root. If you look at the only caller of ginFindParents, it's quite clear
that stack->parent always exists.

> 2-2) Not sure that this structure is in-line with the project policy:
> struct
> {
> BlockNumber left;
> BlockNumber right;
> } children;
> Why not adding a complementary structure in gin_private.h doing that?
> It could be used as well in ginxlogSplit to specify a left/right
> family of block numbers.

I don't think we have a project policy against in-line structs. Might be
better style to not do it, anyway, though.

Come to think of it, that children variable mustn't be declared inside
the if-statement; it's no longer in scope when the XLogInsert was done,
so the &children was pointing to potentially uninitialized piece of
stack. Valgrind would've complained.

I ended up using BlockIdData[2] for that.

Committed, thanks for the review!

- Heikki


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Handling GIN incomplete splits
Date: 2013-12-01 20:40:14
Message-ID: CAMkU=1xJ-j=YX19-MRba5eUaQik04zU62cAQcbiRZJZE9-BTZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 27, 2013 at 9:40 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Wed, Nov 13, 2013 at 8:49 AM, Heikki Linnakangas <
> hlinnakangas(at)vmware(dot)com> wrote:
>
>> Here's another part of my crusade against xlog cleanup routines. This
>> series of patches gets rid of the gin_cleanup() function, which is
>> currently used to finish splits of GIN b-tree pages, if the system crashes
>> (or an error occurs) between splitting a page and inserting its downlink to
>> the parent.
>>
>> The first three patches just move code around. IMHO they make the code
>> more readable, so they should be committed in any case. The meat is in the
>> fourth patch.
>>
>> Thoughts, objections?
>>
>> Alexander, I'm sorry if this conflicts with your GIN patches. Feel free
>> to post the latest versions of your patches against the current master,
>> ignoring patches. I can fix the bitrot. That said, I think these
>> refactorings will make your code look a little bit nicer too, so you might
>> want to rebase because of that anyway.
>
>
> Hi Heikki,
>
> The commit 04eee1fa9ee80dabf7 of this series causes a self-deadlock in the
> LWLock code during the operation below, with it trying to take
> an LW_EXCLUSIVE on a high, even-numbered lockid when it already holds the
> same lockid.
>
> CREATE INDEX planet_osm_ways_nodes ON planet_osm_ways USING gin (nodes)
> WITH (FASTUPDATE=OFF);
>
> It happens pretty reliably using osm2pgsql.
>
> I will try to come up with a simple reproducible demonstration, and stack
> trace, over the weekend.
>

Whatever the problem, it seems to have been fixed in ce5326eed386959aa,
"More GIN refactoring".

Cheers,

Jeff


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Handling GIN incomplete splits
Date: 2013-12-02 09:26:22
Message-ID: 529C523E.5010403@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/01/2013 10:40 PM, Jeff Janes wrote:
> On Wed, Nov 27, 2013 at 9:40 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>> The commit 04eee1fa9ee80dabf7 of this series causes a self-deadlock in the
>> LWLock code during the operation below, with it trying to take
>> an LW_EXCLUSIVE on a high, even-numbered lockid when it already holds the
>> same lockid.
>>
>> CREATE INDEX planet_osm_ways_nodes ON planet_osm_ways USING gin (nodes)
>> WITH (FASTUPDATE=OFF);
>>
>> It happens pretty reliably using osm2pgsql.
>>
>> I will try to come up with a simple reproducible demonstration, and stack
>> trace, over the weekend.
>
> Whatever the problem, it seems to have been fixed in ce5326eed386959aa,
> "More GIN refactoring".

That's good, I guess :-). Thanks for the testing. Did you import the
full planet.osm? I tried with a subset containing just Finland, but
didn't see any problems.

- Heikki


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Handling GIN incomplete splits
Date: 2013-12-02 17:41:07
Message-ID: CAMkU=1zdjY=O239VeBfUuQEyGNXZGnW-e9zGdnz+v2CRsOe7uQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 2, 2013 at 1:26 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com>wrote:

> On 12/01/2013 10:40 PM, Jeff Janes wrote:
>
>> On Wed, Nov 27, 2013 at 9:40 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>
>> The commit 04eee1fa9ee80dabf7 of this series causes a self-deadlock in
>>> the
>>> LWLock code during the operation below, with it trying to take
>>> an LW_EXCLUSIVE on a high, even-numbered lockid when it already holds the
>>> same lockid.
>>>
>>> CREATE INDEX planet_osm_ways_nodes ON planet_osm_ways USING gin (nodes)
>>> WITH (FASTUPDATE=OFF);
>>>
>>> It happens pretty reliably using osm2pgsql.
>>>
>>> I will try to come up with a simple reproducible demonstration, and stack
>>> trace, over the weekend.
>>>
>>
>> Whatever the problem, it seems to have been fixed in ce5326eed386959aa,
>> "More GIN refactoring".
>>
>
> That's good, I guess :-). Thanks for the testing. Did you import the full
> planet.osm? I tried with a subset containing just Finland, but didn't see
> any problems.
>
>
I used Antarctica. I don't have the RAM to process the full planet, or the
bandwidth to download it very easily.

Do you think it is worth chasing down where the problem was, to make sure
it was truly fixed rather than simply changed in a way that happens not to
trigger any more in this situation?

Cheers,

Jeff