Re: GIN improvements part 1: additional information

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GIN improvements part 1: additional information
Date: 2013-10-10 11:57:41
Message-ID: 52569635.5090808@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09.10.2013 02:04, Tomas Vondra wrote:
> On 8.10.2013 21:59, Heikki Linnakangas wrote:
>> On 08.10.2013 17:47, Alexander Korotkov wrote:
>>> Hi, Tomas!
>>>
>>> On Sun, Oct 6, 2013 at 3:58 AM, Tomas Vondra<tv(at)fuzzy(dot)cz> wrote:
>>>
>>>> I've attempted to rerun the benchmarks tests I did a few weeks ago, but
>>>> I got repeated crashes when loading the data (into a table with
>>>> tsvector+gin index).
>>>>
>>>> Right before a crash, theres this message in the log:
>>>>
>>>> PANIC: not enough space in leaf page!
>>>>
>>>
>>> Thanks for testing. Heikki's version of patch don't works for me too on
>>> even much more simplier examples. I can try to get it working if he
>>> answer
>>> my question about GinDataLeafPageGetPostingList* macros.
>>
>> The new macros in that patch version were quite botched. Here's a new
>> attempt.
>
> Nope, still the same errors :-(
>
> PANIC: not enough space in leaf page!
> LOG: server process (PID 29722) was terminated by signal 6: Aborted
> DETAIL: Failed process was running: autovacuum: ANALYZE public.messages

I've continued hacking away at the patch, here's yet another version.
I've done a lot of cleanup and refactoring to make the code more
readable (I hope). I'm not sure what caused the panic you saw, but it's
probably fixed now. Let me know if not.

Some notable changes since my last patch version:

* I changed the ginbtree interface so that isEnoughSpace method is no
more. Instead, placeToPage returns false without doing anything if the
item doesn't fit. It was slightly simpler this way when working with the
compressed posting lists, as placeToPage had to calculate tuple sizes
anyway to decide how many items fit on the page.

* I rewrote incrUpdateItemIndexes(). It now operates in two stages: 1.
adjust the pageOffsets and 2. split the bin. I found it more readable
this way.

* I changed the WAL format of insert records so that there is a separate
struct for data-leaf, data-internal, and entry pages. The information
recorded for each of those was so different that it was confusing to
cram them all into the same struct.

I'm going to step back a bit from the details of the patch now. I think
there's enough work for you, Alexander, until the next commitfest:

* Try to make the code also work with the old page format, so that you
don't need to REINDEX after pg_upgrade.

* Documentation. The new compressed posting list format needs to be
explained somewhere. At the top of ginpostinglist.c would be a good
place. The README should be improved too.

* Fix whatever I broke (sorry)

Are we committed to go ahead with this patch in 9.4 timeframe, in one
form or another? If we are, then I'd like to start committing
refactoring patches that just move code around in preparation for the
Patch, to make the review of the core of the patch easier. For example,
merging the isEnoughSpace and placeToPage methods in the ginbtree
interface. Without the patch, it's unnecessary code churn - it won't do
any harm but it won't do any good either. I'm pretty confident that this
patch will land in the 9.4 timeframe, so barring objections I'll start
committing such refactorings.

- Heikki

Attachment Content-Type Size
gin-packed-postinglists-10-heikki.patch.gz application/x-gzip 25.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2013-10-10 12:03:26 strange behavior of pg_trgm's similarity function
Previous Message pilum.70 2013-10-10 11:50:23 Re: pg_stat_statements: calls under-estimation propagation