Re: GIN improvements part 1: additional information

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part 1: additional information
Date: 2013-12-18 11:45:00
Message-ID: CAPpHfdvmRhW_rSCo90Hcu+0xBUVvxmtgW0arKRQDbXcDdNANgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 17, 2013 at 2:49 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 12/17/2013 12:22 AM, Alexander Korotkov wrote:
>
>> On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas <
>> hlinnakangas(at)vmware(dot)com
>>
>>> wrote:
>>>
>>
>> On 12/12/2013 06:44 PM, Alexander Korotkov wrote:
>>>
>>> When values are packed into small groups, we have to either insert
>>>
>>>> inefficiently encoded value or re-encode whole right part of values.
>>>>
>>>
>>> It would probably be simplest to store newly inserted items uncompressed,
>>> in a separate area in the page. For example, grow the list of
>>> uncompressed
>>> items downwards from pg_upper, and the compressed items upwards from
>>> pg_lower. When the page fills up, re-encode the whole page.
>>>
>>
> I hacked together an implementation of a variant of Simple9, to see how it
> performs. Insertions are handled per the above scheme.
>
> In a limited pg_trgm test case I've been using a lot for this, this
> reduces the index size about 20%, compared to varbyte encoding. It might be
> possible to squeeze it a bit more, I handcrafted the "selectors" in the
> encoding algorithm to suite our needs, but I don't actually have a good
> idea of how to choose them optimally. Also, the encoding can encode 0
> values, but we never need to do that, so you could take advantage of that
> to pack items tighter.
>
> Compression and decompression speed seems to be about the same.
>
> Patch attached if you want to play with it. WAL replay is still broken,
> and there are probably bugs.
>
>
> Good idea. But:
>> 1) We'll still need item indexes in the end of page for fast scan.
>>
>
> Sure.
>
>
> 2) Storage would be easily extendable to hold additional information as
>> well.
>> Better compression shouldn't block more serious improvements.
>>
>
> I'm not sure I agree with that. For all the cases where you don't care
> about additional information - which covers all existing users for example
> - reducing disk size is pretty important. How are you planning to store the
> additional information, and how does using another encoding gets in the way
> of that?

I was planned to store additional information datums between
varbyte-encoded tids. I was expected it would be hard to do with PFOR.
However, I don't see significant problems in your implementation of Simple9
encoding. I'm going to dig deeper in your version of patch.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-12-18 11:46:59 Re: [PATCH] SQL assertions prototype
Previous Message Ashutosh Bapat 2013-12-18 11:42:09 Re: Example query causing param_info to be set in plain rel path