Re: GIN improvements part2: fast scan

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: GIN improvements part2: fast scan
Date: 2014-02-04 21:23:19
Message-ID: CAPpHfdv792ZSWt_Tk+GzzsoNwm7ZGyomfrRiF7WuFN22LUVNUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 3, 2014 at 6:31 PM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>wrote:

> On Mon, Jan 27, 2014 at 7:30 PM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>wrote:
>
>> On Mon, Jan 27, 2014 at 2:32 PM, Alexander Korotkov <aekorotkov(at)gmail(dot)com
>> > wrote:
>>
>>> On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas <
>>> hlinnakangas(at)vmware(dot)com> wrote:
>>>
>>>> On 01/26/2014 08:24 AM, Tomas Vondra wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> On 25.1.2014 22:21, Heikki Linnakangas wrote:
>>>>>
>>>>>> Attached is a new version of the patch set, with those bugs fixed.
>>>>>>
>>>>>
>>>>> I've done a bunch of tests with all the 4 patches applied, and it seems
>>>>> to work now. I've done tests with various conditions (AND/OR, number of
>>>>> words, number of conditions) and I so far I did not get any crashes,
>>>>> infinite loops or anything like that.
>>>>>
>>>>> I've also compared the results to 9.3 - by dumping the database and
>>>>> running the same set of queries on both machines, and indeed I got 100%
>>>>> match.
>>>>>
>>>>> I also did some performance tests, and that's when I started to worry.
>>>>>
>>>>> For example, I generated and ran 1000 queries that look like this:
>>>>>
>>>>> SELECT id FROM messages
>>>>> WHERE body_tsvector @@ to_tsquery('english','(header & 53 & 32 &
>>>>> useful & dropped)')
>>>>> ORDER BY ts_rank(body_tsvector, to_tsquery('english','(header & 53
>>>>> &
>>>>> 32 & useful & dropped)')) DESC;
>>>>>
>>>>> i.e. in this case the query always was 5 words connected by AND. This
>>>>> query is a pretty common pattern for fulltext search - sort by a list
>>>>> of
>>>>> words and give me the best ranked results.
>>>>>
>>>>> On 9.3, the script was running for ~23 seconds, on patched HEAD it was
>>>>> ~40. It's perfectly reproducible, I've repeated the test several times
>>>>> with exactly the same results. The test is CPU bound, there's no I/O
>>>>> activity at all. I got the same results with more queries (~100k).
>>>>>
>>>>> Attached is a simple chart with x-axis used for durations measured on
>>>>> 9.3.2, y-axis used for durations measured on patched HEAD. It's obvious
>>>>> a vast majority of queries is up to 2x slower - that's pretty obvious
>>>>> from the chart.
>>>>>
>>>>> Only about 50 queries are faster on HEAD, and >700 queries are more
>>>>> than
>>>>> 50% slower on HEAD (i.e. if the query took 100ms on 9.3, it takes
>>>>> >150ms
>>>>> on HEAD).
>>>>>
>>>>> Typically, the EXPLAIN ANALYZE looks something like this (on 9.3):
>>>>>
>>>>> http://explain.depesz.com/s/5tv
>>>>>
>>>>> and on HEAD (same query):
>>>>>
>>>>> http://explain.depesz.com/s/1lI
>>>>>
>>>>> Clearly the main difference is in the "Bitmap Index Scan" which takes
>>>>> 60ms on 9.3 and 120ms on HEAD.
>>>>>
>>>>> On 9.3 the "perf top" looks like this:
>>>>>
>>>>> 34.79% postgres [.] gingetbitmap
>>>>> 28.96% postgres [.] ginCompareItemPointers
>>>>> 9.36% postgres [.] TS_execute
>>>>> 5.36% postgres [.] check_stack_depth
>>>>> 3.57% postgres [.] FunctionCall8Coll
>>>>>
>>>>> while on 9.4 it looks like this:
>>>>>
>>>>> 28.20% postgres [.] gingetbitmap
>>>>> 21.17% postgres [.] TS_execute
>>>>> 8.08% postgres [.] check_stack_depth
>>>>> 7.11% postgres [.] FunctionCall8Coll
>>>>> 4.34% postgres [.] shimTriConsistentFn
>>>>>
>>>>> Not sure how to interpret that, though. For example where did the
>>>>> ginCompareItemPointers go? I suspect it's thanks to inlining, and that
>>>>> it might be related to the performance decrease. Or maybe not.
>>>>>
>>>>
>>>> Yeah, inlining makes it disappear from the profile, and spreads that
>>>> time to the functions calling it.
>>>>
>>>> The profile tells us that the consistent function is called a lot more
>>>> than before. That is expected - with the fast scan feature, we're calling
>>>> consistent not only for potential matches, but also to refute TIDs based on
>>>> just a few entries matching. If that's effective, it allows us to skip many
>>>> TIDs and avoid consistent calls, which compensates, but if it's not
>>>> effective, it's just overhead.
>>>>
>>>> I would actually expect it to be fairly effective for that query, so
>>>> that's a bit surprising. I added counters to see where the calls are coming
>>>> from, and it seems that about 80% of the calls are actually coming from
>>>> this little the feature I explained earlier:
>>>>
>>>>
>>>> In addition to that, I'm using the ternary consistent function to check
>>>>> if minItem is a match, even if we haven't loaded all the entries yet.
>>>>> That's less important, but I think for something like "rare1 | (rare2 &
>>>>> frequent)" it might be useful. It would allow us to skip fetching
>>>>> 'frequent', when we already know that 'rare1' matches for the current
>>>>> item. I'm not sure if that's worth the cycles, but it seemed like an
>>>>> obvious thing to do, now that we have the ternary consistent function.
>>>>>
>>>>
>>>> So, that clearly isn't worth the cycles :-). At least not with an
>>>> expensive consistent function; it might be worthwhile if we pre-build the
>>>> truth-table, or cache the results of the consistent function.
>>>>
>>>> Attached is a quick patch to remove that, on top of all the other
>>>> patches, if you want to test the effect.
>>>
>>>
>>> Every single change you did in fast scan seems to be reasonable, but
>>> testing shows that something went wrong. Simple test with 3 words of
>>> different selectivities.
>>>
>>> After applying your patches:
>>>
>>> # select count(*) from fts_test where fti @@ plainto_tsquery('english',
>>> 'gin index select');
>>> count
>>> ───────
>>> 627
>>> (1 row)
>>>
>>> Time: 21,252 ms
>>>
>>> In original fast-scan:
>>>
>>> # select count(*) from fts_test where fti @@ plainto_tsquery('english',
>>> 'gin index select');
>>> count
>>> ───────
>>> 627
>>> (1 row)
>>>
>>> Time: 3,382 ms
>>>
>>> I'm trying to get deeper into it.
>>>
>>
>> I had two guesses about why it's become so slower than in my original
>> fast-scan:
>> 1) Not using native consistent function
>> 2) Not sorting entries
>>
>> I attach two patches which rollback these two features (sorry for awful
>> quality of second). Native consistent function accelerates thing
>> significantly, as expected. Tt seems that sorting entries have almost no
>> effect. However it's still not as fast as initial fast-scan:
>>
>> # select count(*) from fts_test where fti @@ plainto_tsquery('english',
>> 'gin index select');
>> count
>> ───────
>> 627
>> (1 row)
>>
>> Time: 5,381 ms
>>
>> Tomas, could you rerun your tests with first and both these patches
>> applied against patches by Heikki?
>>
>
> I found my patch "0005-Ternary-consistent-implementation.patch" to be
> completely wrong. It introduces ternary consistent function to opclass, but
> don't uses it, because I forgot to include ginlogic.c change into patch.
> So, it shouldn't make any impact on performance. However, testing results
> with that patch significantly differs. That makes me very uneasy. Can we
> now reproduce exact same?
>
> Right version of these two patches in one against current head is
> attached. I've rerun tests with it, results are
> /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun
> postprocessing including graph drawing?
>
> Sometimes test cases are not what we expect. For example:
>
> =# explain SELECT id FROM messages WHERE body_tsvector @@
> to_tsquery('english','(5alpha1-initdb''d)');
> QUERY PLAN
>
>
> ─────────────────────────────────────────────────────────────────────────────────────────────────────────
> Bitmap Heap Scan on messages (cost=84.00..88.01 rows=1 width=4)
> Recheck Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1'' &
> ''initdb'' & ''d'''::tsquery)
> -> Bitmap Index Scan on messages_body_tsvector_idx (cost=0.00..84.00
> rows=1 width=0)
> Index Cond: (body_tsvector @@ '''5alpha1-initdb'' & ''5alpha1'' &
> ''initdb'' & ''d'''::tsquery)
> Planning time: 0.257 ms
> (5 rows)
>
> 5alpha1-initdb'd is 3 gin entries with different frequencies.
>
> Also, these patches are not intended to change relevance ordering speed.
> When number of results are high, most of time is relevance calculating and
> sorting. I propose to remove ORDER BY clause from test cases to see scan
> speed more clear.
>
> I've dump of postgresql.org search queries from Magnus. We can add them
> to our test case.
>

It turns out that version 10 contained bug in ternary consistent function
implementation for tsvector. Fixed in attached version.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gin-fast-scan.11.patch.gz application/x-gzip 11.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-02-04 21:42:51 Re: Wait free LW_SHARED acquisition
Previous Message Andrew Dunstan 2014-02-04 20:53:46 Re: narwhal and PGDLLIMPORT