Quick Links

Re: GIN improvements part2: fast scan

From:	Tomas Vondra <tv(at)fuzzy(dot)cz>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: GIN improvements part2: fast scan
Date:	2014-02-03 01:44:18
Message-ID:	52EEF472.9020106@fuzzy.cz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 3.2.2014 00:13, Tomas Vondra wrote:
> On 2.2.2014 11:45, Heikki Linnakangas wrote:
>> On 01/30/2014 01:53 AM, Tomas Vondra wrote:
>>> (3) A file with explain plans for 4 queries suffering ~2x slowdown,
>>> and explain plans with 9.4 master and Heikki's patches is available
>>> here:
>>>
>>> http://www.fuzzy.cz/tmp/gin/queries.txt
>>>
>>> All the queries have 6 common words, and the explain plans look
>>> just fine to me - exactly like the plans for other queries.
>>>
>>> Two things now caught my eye. First some of these queries actually
>>> have words repeated - either exactly like "database & database" or
>>> in negated form like "!anything & anything". Second, while
>>> generating the queries, I use "dumb" frequency, where only exact
>>> matches count. I.e. "write != written" etc. But the actual number
>>> of hits may be much higher - for example "write" matches exactly
>>> just 5% documents, but using @@ it matches more than 20%.
>>>
>>> I don't know if that's the actual cause though.
>>
>> Ok, here's another variant of these patches. Compared to git master, it
>> does three things:
>>
>> 1. It adds the concept of ternary consistent function internally, but no
>> catalog changes. It's implemented by calling the regular boolean
>> consistent function "both ways".
>>
>> 2. Use a binary heap to get the "next" item from the entries in a scan.
>> I'm pretty sure this makes sense, because arguably it makes the code
>> more readable, and reduces the number of item pointer comparisons
>> significantly for queries with a lot of entries.
>>
>> 3. Only perform the pre-consistent check to try skipping entries, if we
>> don't already have the next item from the entry loaded in the array.
>> This is a tradeoff, you will lose some of the performance gain you might
>> get from pre-consistent checks, but it also limits the performance loss
>> you might get from doing useless pre-consistent checks.
>>
>> So taken together, I would expect this patch to make some of the
>> performance gains less impressive, but also limit the loss we saw with
>> some of the other patches.
>>
>> Tomas, could you run your test suite with this patch, please?
>
> Sure, will do. Do I get it right that this should be applied instead of
> the four patches you've posted earlier?

So, I was curious and did a basic testing - I've repeated the tests on
current HEAD and 'HEAD with the new patch'. The complete data are
available at [http://www.fuzzy.cz/tmp/gin/gin-scan-benchmarks.ods] and
I've updated the charts at [http://www.fuzzy.cz/tmp/gin/] too.

Look for branches named 9.4-head-2 and 9.4-heikki-2.

To me it seems that:

(1) The main issue was that with common words, it used to be much
slower than HEAD (or 9.3). This seems to be fixed, i.e. it's not
slower than before. See

http://www.fuzzy.cz/tmp/gin/3-common-words.png (previous patch)
http://www.fuzzy.cz/tmp/gin/3-common-words-new.png (new patch)

for comparison vs. 9.4 HEAD. With the new patch there's no slowdown,
which seems nice. Compared to 9.3 it looks like this:

http://www.fuzzy.cz/tmp/gin/3-common-words-new-vs-93.png

so there's a significant speedup (thanks to the modified layout).

(2) The question is whether the new patch works fine on rare words. See
this for comparison of the patches against HEAD:

http://www.fuzzy.cz/tmp/gin/3-rare-words.png
http://www.fuzzy.cz/tmp/gin/3-rare-words-new.png

and this is the comparison of the two patches:

http://www.fuzzy.cz/tmp/gin/patches-rare-words.png

That seems fine to me - some queries are slower, but we're talking
about queries taking 1 or 2 ms, so the measurement error is probably
the main cause of the differences.

(3) With higher numbers of frequent words, the differences (vs. HEAD or
the previous patch) are not that dramatic as in (1) - the new patch
is consistently by ~20% faster.

Tomas

Attachment	Content-Type	Size
3-common-words.png	image/png	80.1 KB
	image/png	31.6 KB
	image/png	67.1 KB
	image/png	51.7 KB
3-common-words-new-vs-93.png	image/png	97.0 KB
	image/png	65.4 KB

In response to

Re: GIN improvements part2: fast scan at 2014-02-02 23:13:00 from Tomas Vondra

Responses

Re: GIN improvements part2: fast scan at 2014-02-03 06:53:45 from Oleg Bartunov
Re: GIN improvements part2: fast scan at 2014-02-03 13:59:29 from Jesper Krogh

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2014-02-03 01:47:43	Re: pg_basebackup and pg_stat_tmp directory
Previous Message	Craig Ringer	2014-02-03 01:22:19	Re: narwhal and PGDLLIMPORT