Quick Links

Re: GIN improvements part 1: additional information

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: GIN improvements part 1: additional information
Date:	2013-12-10 19:44:01
Message-ID:	CAPpHfdtRAxaq8mShtpd4mh5R0=5hP900mBJNU5TnyaeM44EEyA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Dec 10, 2013 at 12:26 AM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:

> On 12/09/2013 11:34 AM, Alexander Korotkov wrote:
>
>> On Mon, Dec 9, 2013 at 1:18 PM, Heikki Linnakangas
>> <hlinnakangas(at)vmware(dot)com>wrote:
>>
>> Even if we use varbyte encoding, I wonder if it would be better to treat
>>> block + offset number as a single 48-bit integer, rather than encode them
>>> separately. That would allow the delta of two items on the same page to
>>> be
>>> stored as a single byte, rather than two bytes. Naturally it would be a
>>> loss on other values, but would be nice to see some kind of an analysis
>>> on
>>> that. I suspect it might make the code simpler, too.
>>>
>>
>> Yeah, I had that idea, but I thought it's not a better option. Will try to
>> do some analysis.
>>
>
> The more I think about that, the more convinced I am that it's a good
> idea. I don't think it will ever compress worse than the current approach
> of treating block and offset numbers separately, and, although I haven't
> actually tested it, I doubt it's any slower. About the same amount of
> arithmetic is required in both versions.
>
> Attached is a version that does that. Plus some other minor cleanup.
>
> (we should still investigate using a completely different algorithm,
> though)

Yes, when I though about that, I didn't realize that we can reserve less
than 16 bits for offset number.
I rerun my benchmark and got following results:

label | blocks_mark
----------------+-------------
search_new | 847509920
search_updated | 883789826
(2 rows)

label | size
---------------+-----------
new | 364560384
after_updates | 642736128
(2 rows)

Speed is same while index size is less. In previous format it was:

label | size
---------------+-----------
new | 419299328
after_updates | 715915264
(2 rows)

Good optimization, thanks. I'll try another datasets but I expect similar
results.
However, patch didn't apply to head. Corrected version is attached.

------
With best regards,
Alexander Korotkov.

Attachment	Content-Type	Size
gin-packed-postinglists-20.patch.gz	application/x-gzip	28.9 KB

In response to

Re: GIN improvements part 1: additional information at 2013-12-09 20:26:25 from Heikki Linnakangas

Responses

Re: GIN improvements part 1: additional information at 2013-12-24 15:33:47 from Peter Eisentraut

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2013-12-10 19:49:12	Re: ANALYZE sampling is too good
Previous Message	Andres Freund	2013-12-10 19:28:38	Re: ANALYZE sampling is too good