Re: GIN improvements part 1: additional information

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part 1: additional information
Date: 2013-10-21 19:12:05
Message-ID: CAPpHfduXkYv4uUHHFrCntNQD=JyB=xQXtutXeY_KYs6CBLJN0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 10, 2013 at 3:57 PM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 09.10.2013 02:04, Tomas Vondra wrote:
>
>> On 8.10.2013 21:59, Heikki Linnakangas wrote:
>>
>>> On 08.10.2013 17:47, Alexander Korotkov wrote:
>>>
>>>> Hi, Tomas!
>>>>
>>>> On Sun, Oct 6, 2013 at 3:58 AM, Tomas Vondra<tv(at)fuzzy(dot)cz> wrote:
>>>>
>>>> I've attempted to rerun the benchmarks tests I did a few weeks ago, but
>>>>> I got repeated crashes when loading the data (into a table with
>>>>> tsvector+gin index).
>>>>>
>>>>> Right before a crash, theres this message in the log:
>>>>>
>>>>> PANIC: not enough space in leaf page!
>>>>>
>>>>>
>>>> Thanks for testing. Heikki's version of patch don't works for me too on
>>>> even much more simplier examples. I can try to get it working if he
>>>> answer
>>>> my question about GinDataLeafPageGetPostingList* macros.
>>>>
>>>
>>> The new macros in that patch version were quite botched. Here's a new
>>> attempt.
>>>
>>
>> Nope, still the same errors :-(
>>
>> PANIC: not enough space in leaf page!
>> LOG: server process (PID 29722) was terminated by signal 6: Aborted
>> DETAIL: Failed process was running: autovacuum: ANALYZE public.messages
>>
>
> I've continued hacking away at the patch, here's yet another version. I've
> done a lot of cleanup and refactoring to make the code more readable (I
> hope). I'm not sure what caused the panic you saw, but it's probably fixed
> now. Let me know if not.
>
> Some notable changes since my last patch version:
>
> * I changed the ginbtree interface so that isEnoughSpace method is no
> more. Instead, placeToPage returns false without doing anything if the item
> doesn't fit. It was slightly simpler this way when working with the
> compressed posting lists, as placeToPage had to calculate tuple sizes
> anyway to decide how many items fit on the page.
>
> * I rewrote incrUpdateItemIndexes(). It now operates in two stages: 1.
> adjust the pageOffsets and 2. split the bin. I found it more readable this
> way.
>
> * I changed the WAL format of insert records so that there is a separate
> struct for data-leaf, data-internal, and entry pages. The information
> recorded for each of those was so different that it was confusing to cram
> them all into the same struct.
>
>
> I'm going to step back a bit from the details of the patch now. I think
> there's enough work for you, Alexander, until the next commitfest:
>
> * Try to make the code also work with the old page format, so that you
> don't need to REINDEX after pg_upgrade.
>
> * Documentation. The new compressed posting list format needs to be
> explained somewhere. At the top of ginpostinglist.c would be a good place.
> The README should be improved too.
>
> * Fix whatever I broke (sorry)
>
> Are we committed to go ahead with this patch in 9.4 timeframe, in one form
> or another? If we are, then I'd like to start committing refactoring
> patches that just move code around in preparation for the Patch, to make
> the review of the core of the patch easier. For example, merging the
> isEnoughSpace and placeToPage methods in the ginbtree interface. Without
> the patch, it's unnecessary code churn - it won't do any harm but it won't
> do any good either. I'm pretty confident that this patch will land in the
> 9.4 timeframe, so barring objections I'll start committing such
> refactorings.

Attached version of patch is debugged. I would like to note that number of
bugs was low and it wasn't very hard to debug. I've rerun tests on it. You
can see that numbers are improved as the result of your refactoring.

event | period
-----------------------+-----------------
index_build | 00:01:45.416822
index_build_recovery | 00:00:06
index_update | 00:05:17.263606
index_update_recovery | 00:01:22
search_new | 00:24:07.706482
search_updated | 00:26:25.364494
(6 rows)

label | blocks_mark
----------------+-------------
search_new | 847587636
search_updated | 881210486
(2 rows)

label | size
---------------+-----------
new | 419299328
after_updates | 715915264
(2 rows)

Beside simple bugs, there was a subtle bug in incremental item indexes
update. I've added some more comments including ASCII picture about how
item indexes are shifted.

Now, I'm trying to implement support of old page format. Then we can decide
which approach to use.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gin-packed-postinglists-11.patch.gz application/x-gzip 26.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2013-10-21 19:18:40 Re: [HACKERS] Cube extension point support // GSoC'13
Previous Message Heikki Linnakangas 2013-10-21 19:06:04 Re: [HACKERS] Cube extension point support // GSoC'13