Re: Suspicious check (src/backend/access/gin/gindatapage.c)

Lists: pgsql-hackers
From: Gaetano Mendola <mendola(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Suspicious check (src/backend/access/gin/gindatapage.c)
Date: 2014-09-11 22:35:58
Message-ID: CAJycT5ozTB81Erk+T7vGA1DkmTfuSitt6AjGwfDONLykukO9ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At line 650 I can read:

if ((leaf->lsize - segsize) - (leaf->lsize - segsize) < BLCKSZ / 4)
break;

I believe one of the two should be leaf->rsize

--
cpp-today.blogspot.com


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Gaetano Mendola <mendola(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Suspicious check (src/backend/access/gin/gindatapage.c)
Date: 2014-09-12 00:48:53
Message-ID: CAB7nPqRbLahnsPE7fXwaR14Rqa0H87WAJPH70cGWRwg5msLiTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 12, 2014 at 7:35 AM, Gaetano Mendola <mendola(at)gmail(dot)com> wrote:
> At line 650 I can read:
>
> if ((leaf->lsize - segsize) - (leaf->lsize - segsize) < BLCKSZ / 4)
> break;
>
> I believe one of the two should be leaf->rsize
Yes this condition is broken. Shouldn't it be that instead when
appending items at the end of a page?
if ((leaf->lsize - segsize) - (leaf->rsize + segsize) < BLCKSZ / 4)
This has been introduced by 36a35c5 and should be backpatched to 9.4.
Regards,
--
Michael

Attachment Content-Type Size
20140912_gin_compress_cond.patch text/x-diff 562 bytes

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Gaetano Mendola <mendola(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Suspicious check (src/backend/access/gin/gindatapage.c)
Date: 2014-09-12 08:38:40
Message-ID: 5412B110.6020502@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/12/2014 03:48 AM, Michael Paquier wrote:
> On Fri, Sep 12, 2014 at 7:35 AM, Gaetano Mendola <mendola(at)gmail(dot)com> wrote:
>> At line 650 I can read:
>>
>> if ((leaf->lsize - segsize) - (leaf->lsize - segsize) < BLCKSZ / 4)
>> break;
>>
>> I believe one of the two should be leaf->rsize
> Yes this condition is broken. Shouldn't it be that instead when
> appending items at the end of a page?
> if ((leaf->lsize - segsize) - (leaf->rsize + segsize) < BLCKSZ / 4)

Yep, that's what it was supposed to be. The consequence was that when
appending to the index, the left page was always packed as tight as
possible. Which is not actually that bad a behavior, in fact if you're
just appending values and never do any other updates, it's optimal.
That's probably why no-one has noticed. But it's not what was intended.

But now that I look at it more closely, fixing this as you suggested
doesn't actually yield the intended behavior either. The intention was
to fill the left page 75% full. However, after fixing that typo, the
code would move items to the left page so that the difference between
the halves is BLCKSZ / 4, which does not give a 75% ratio. For example,
if there are 8200 bytes of data in total, the "fixed" code would
actually put 5124 bytes on the left page, and 3076 bytes on the right,
which makes the left page only 62% full.

Fixed, per the attached patch. Now that the logic is fixed, I hope we
won't get complaints that the indexes are bigger, if you fill a table by
appending to the end. I wonder if we should aim at an even more uneven
split; the default fillfactor for B-trees is 90%, for example. I didn't
go that high when I wrote that, because the code in previous versions
always did a 50/50 split. But it could be argued that a higher
fillfactor makes sense for a GIN index - they typically don't get as
much random updates as a B-tree.

- Heikki

Attachment Content-Type Size
gin-split-ratio-fix-1.patch text/x-diff 2.4 KB

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Gaetano Mendola <mendola(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Suspicious check (src/backend/access/gin/gindatapage.c)
Date: 2014-09-12 08:42:39
Message-ID: 5412B1FF.4000306@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/12/2014 11:38 AM, Heikki Linnakangas wrote:
> Now that the logic is fixed, I hope we
> won't get complaints that the indexes are bigger, if you fill a table by
> appending to the end. I wonder if we should aim at an even more uneven
> split; the default fillfactor for B-trees is 90%, for example. I didn't
> go that high when I wrote that, because the code in previous versions
> always did a 50/50 split. But it could be argued that a higher
> fillfactor makes sense for a GIN index - they typically don't get as
> much random updates as a B-tree.

Actually, we should add a fillfactor reloption to GIN. But that's 9.5
material.

- Heikki


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Gaetano Mendola <mendola(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Suspicious check (src/backend/access/gin/gindatapage.c)
Date: 2014-09-15 23:19:32
Message-ID: CAB7nPqSsc-piWmU_wY1TFXZ2CL-Cw-xuPZc7e5jPp9O5g+HxFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 12, 2014 at 5:42 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 09/12/2014 11:38 AM, Heikki Linnakangas wrote:
>>
>> Now that the logic is fixed, I hope we
>> won't get complaints that the indexes are bigger, if you fill a table by
>> appending to the end. I wonder if we should aim at an even more uneven
>> split; the default fillfactor for B-trees is 90%, for example. I didn't
>> go that high when I wrote that, because the code in previous versions
>> always did a 50/50 split. But it could be argued that a higher
>> fillfactor makes sense for a GIN index - they typically don't get as
>> much random updates as a B-tree.
>
>
> Actually, we should add a fillfactor reloption to GIN. But that's 9.5
> material.

Actually gin is the only method that does not have this parameter, no?
Then the following extra-processing would be enough I imagine:
1) Tune freespace using fillfactor when placing keys to leaf data page
(dataPlaceToPageLeaf)
2) On split, instead of (BLCKSZ * 3) / 4, have the left leaf full at
(BLCKSZ * fillfactor) / 100
Regards,
--
Michael