Re: GIN improvements part 1: additional information

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part 1: additional information
Date: 2013-11-14 11:00:22
Message-ID: CAPpHfdvEQ-JhE_2z9pvw22Bp6h_o8XOaXCbjAGf87gs4p4Jmuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 14, 2013 at 2:17 PM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>wrote:

> On Tue, Nov 5, 2013 at 9:49 PM, Heikki Linnakangas <
> hlinnakangas(at)vmware(dot)com> wrote:
>
>> On 04.11.2013 23:44, Alexander Korotkov wrote:
>>
>>> On Mon, Oct 21, 2013 at 11:12 PM, Alexander Korotkov
>>> <aekorotkov(at)gmail(dot)com>wrote:
>>>
>>> Attached version of patch is debugged. I would like to note that number
>>>> of
>>>> bugs was low and it wasn't very hard to debug. I've rerun tests on it.
>>>> You
>>>> can see that numbers are improved as the result of your refactoring.
>>>>
>>>> event | period
>>>> -----------------------+-----------------
>>>> index_build | 00:01:45.416822
>>>> index_build_recovery | 00:00:06
>>>> index_update | 00:05:17.263606
>>>> index_update_recovery | 00:01:22
>>>> search_new | 00:24:07.706482
>>>> search_updated | 00:26:25.364494
>>>> (6 rows)
>>>>
>>>> label | blocks_mark
>>>> ----------------+-------------
>>>> search_new | 847587636
>>>> search_updated | 881210486
>>>> (2 rows)
>>>>
>>>> label | size
>>>> ---------------+-----------
>>>> new | 419299328
>>>> after_updates | 715915264
>>>> (2 rows)
>>>>
>>>> Beside simple bugs, there was a subtle bug in incremental item indexes
>>>> update. I've added some more comments including ASCII picture about how
>>>> item indexes are shifted.
>>>>
>>>> Now, I'm trying to implement support of old page format. Then we can
>>>> decide which approach to use.
>>>>
>>>>
>>> Attached version of patch has support of old page format. Patch still
>>> needs
>>> more documentation and probably refactoring, but I believe idea is clear
>>> and it can be reviewed. In the patch I have to revert change of null
>>> category placement for compatibility with old posting list format.
>>>
>>
>> Thanks, just glanced at this quickly.
>>
>> If I'm reading the patch correctly, old-style non-compressed posting tree
>> leaf pages are not vacuumed at all; that's clearly not right.
>>
>
> Fixed. Now separate function handles uncompressed posting lists and
> compress them if as least one TID is deleted.
>
>
>> I argued earlier that we don't need to handle the case that compressing a
>> page makes it larger, so that the existing items don't fit on the page
>> anymore. I'm having some second thoughts on that; I didn't take into
>> account the fact that the "mini-index" in the new page format takes up some
>> space. I think it's still highly unlikely that there isn't enough space on
>> a 8k page, but it's not totally crazy with a non-standard small page size.
>> So at least that situation needs to be checked for with an ereport(),
>> rather than Assert.
>>
>
> Now this situation is ereported before any change in page.
>
> To handle splitting a non-compressed page, it seems that you're relying on
>> the fact that before splitting, we try to insert, which compresses the
>> page. The problem with that is that it's not correctly WAL-logged. The
>> split record that follows includes a full copy of both page halves, but if
>> the split fails for some reason, e.g you run out of disk space, there is no
>> WAL record at all of the the compression. I'd suggest doing the compression
>> in the insert phase on a temporary copy of the page, leaving the original
>> page untouched if there isn't enough space for the insertion to complete.
>> (You could argue that this case can't happen because the compression must
>> always create enough space to insert one more item. maybe so, but at least
>> there should be an explicit check for that.)
>
>
> Good point. Yes, if we don't handle specially inserting item indexes, I
> see no point to do special handling for single TID which is much smaller.
> In the attached patch dataCompressLeafPage just reserves space for single
> TID.
>
> Also, many changes in comments and README.
>
> Unfortunally, I didn't understand what is FIXME about in
> ginVacuumEntryPage. So, it's not fixed :)
>

Sorry, I posted buggy version of patch. Attached version is correct.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gin-packed-postinglists-14.patch.gz application/x-gzip 28.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2013-11-14 11:50:45 Re: Improve code in tidbitmap.c
Previous Message Sameer Thakur 2013-11-14 10:54:47 Re: Extra functionality to createuser