Re: GIN improvements part2: fast scan

From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part2: fast scan
Date: 2014-02-03 06:53:45
Message-ID: CAF4Au4xW-peajLZA822qiTkOw4y6GoJ4GRpT8XFXj+zB1S-b_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomasa, it'd be nice if you use real data in your testing.

One very good application of gin fast-scan is dramatic performance
improvement of hstore/jsonb @> operator, see slides 57, 58
http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2013.pdf.
I'd like not to lost this benefit :)

Oleg

PS. I used data delicious-rss-1250k.gz from
http://randomwalker.info/data/delicious/

On Mon, Feb 3, 2014 at 5:44 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> On 3.2.2014 00:13, Tomas Vondra wrote:
>> On 2.2.2014 11:45, Heikki Linnakangas wrote:
>>> On 01/30/2014 01:53 AM, Tomas Vondra wrote:
>>>> (3) A file with explain plans for 4 queries suffering ~2x slowdown,
>>>> and explain plans with 9.4 master and Heikki's patches is available
>>>> here:
>>>>
>>>> http://www.fuzzy.cz/tmp/gin/queries.txt
>>>>
>>>> All the queries have 6 common words, and the explain plans look
>>>> just fine to me - exactly like the plans for other queries.
>>>>
>>>> Two things now caught my eye. First some of these queries actually
>>>> have words repeated - either exactly like "database & database" or
>>>> in negated form like "!anything & anything". Second, while
>>>> generating the queries, I use "dumb" frequency, where only exact
>>>> matches count. I.e. "write != written" etc. But the actual number
>>>> of hits may be much higher - for example "write" matches exactly
>>>> just 5% documents, but using @@ it matches more than 20%.
>>>>
>>>> I don't know if that's the actual cause though.
>>>
>>> Ok, here's another variant of these patches. Compared to git master, it
>>> does three things:
>>>
>>> 1. It adds the concept of ternary consistent function internally, but no
>>> catalog changes. It's implemented by calling the regular boolean
>>> consistent function "both ways".
>>>
>>> 2. Use a binary heap to get the "next" item from the entries in a scan.
>>> I'm pretty sure this makes sense, because arguably it makes the code
>>> more readable, and reduces the number of item pointer comparisons
>>> significantly for queries with a lot of entries.
>>>
>>> 3. Only perform the pre-consistent check to try skipping entries, if we
>>> don't already have the next item from the entry loaded in the array.
>>> This is a tradeoff, you will lose some of the performance gain you might
>>> get from pre-consistent checks, but it also limits the performance loss
>>> you might get from doing useless pre-consistent checks.
>>>
>>> So taken together, I would expect this patch to make some of the
>>> performance gains less impressive, but also limit the loss we saw with
>>> some of the other patches.
>>>
>>> Tomas, could you run your test suite with this patch, please?
>>
>> Sure, will do. Do I get it right that this should be applied instead of
>> the four patches you've posted earlier?
>
> So, I was curious and did a basic testing - I've repeated the tests on
> current HEAD and 'HEAD with the new patch'. The complete data are
> available at [http://www.fuzzy.cz/tmp/gin/gin-scan-benchmarks.ods] and
> I've updated the charts at [http://www.fuzzy.cz/tmp/gin/] too.
>
> Look for branches named 9.4-head-2 and 9.4-heikki-2.
>
> To me it seems that:
>
> (1) The main issue was that with common words, it used to be much
> slower than HEAD (or 9.3). This seems to be fixed, i.e. it's not
> slower than before. See
>
> http://www.fuzzy.cz/tmp/gin/3-common-words.png (previous patch)
> http://www.fuzzy.cz/tmp/gin/3-common-words-new.png (new patch)
>
> for comparison vs. 9.4 HEAD. With the new patch there's no slowdown,
> which seems nice. Compared to 9.3 it looks like this:
>
> http://www.fuzzy.cz/tmp/gin/3-common-words-new-vs-93.png
>
> so there's a significant speedup (thanks to the modified layout).
>
> (2) The question is whether the new patch works fine on rare words. See
> this for comparison of the patches against HEAD:
>
> http://www.fuzzy.cz/tmp/gin/3-rare-words.png
> http://www.fuzzy.cz/tmp/gin/3-rare-words-new.png
>
> and this is the comparison of the two patches:
>
> http://www.fuzzy.cz/tmp/gin/patches-rare-words.png
>
> That seems fine to me - some queries are slower, but we're talking
> about queries taking 1 or 2 ms, so the measurement error is probably
> the main cause of the differences.
>
> (3) With higher numbers of frequent words, the differences (vs. HEAD or
> the previous patch) are not that dramatic as in (1) - the new patch
> is consistently by ~20% faster.
>
> Tomas
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Chalke 2014-02-03 07:08:57 Re: patch: option --if-exists for pg_dump
Previous Message Amit Kapila 2014-02-03 06:42:08 Re: Turning off HOT/Cleanup sometimes