Re: GIN improvements part2: fast scan

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part2: fast scan
Date: 2014-02-20 09:48:30
Message-ID: 5305CF6E.1080508@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/09/2014 12:11 PM, Alexander Korotkov wrote:
> I've rebased catalog changes with last master. Patch is attached. I've
> rerun my test suite with both last master ('committed') and attached
> patch ('ternary-consistent').

Thanks!

> method | sum
> ------------------------+------------------
> committed | 143491.715000001
> fast-scan-11 | 126916.111999999
> fast-scan-light | 137321.211
> fast-scan-light-heikki | 138168.028000001
> master | 446976.288
> ternary-consistent | 125923.514
>
> I explain regression in last master by change of MAX_MAYBE_ENTRIES from 8
> to 4.

Yeah, probably. I set MAX_MAYBE_ENTRIES to 8 in initial versions to make
sure we get similar behavior in Tomas' tests that used 6 search terms.
But I always felt that it was too large for real queries, once we have
the catalog changes, that's why I lowered to 4 when committing. If an
opclass benefits greatly from fast scan, it should provide the ternary
consistent function, and not rely on the shim implementation.

> I'm not sure about decision to reserve separate procedure number for
> ternary consistent. Probably, it would be better to add ginConfig method.
> It would be useful for post 9.4 improvements.

Hmm, it might be useful for an opclass to provide both, a boolean and
ternary consistent function, if the boolean version is significantly
more efficient when all the arguments are TRUE/FALSE. OTOH, you could
also do a quick check through the array to see if there are any MAYBE
arguments, within the consistent function. But I'm inclined to keep the
possibility to provide both versions. As long as we support the boolean
version at all, there's not much difference in terms of the amount of
code to support having them both for the same opclass.

A ginConfig could be useful for many other things, but I don't think
it's worth adding it now.

What's the difference between returning GIN_MAYBE and GIN_TRUE+recheck?
We discussed that earlier, but didn't reach any conclusion. That needs
to be clarified in the docs. One possibility is to document that they're
equivalent. Another is to forbid one of them. Yet another is to assign a
different meaning to each.

I've been thinking that it might be useful to define them so that a
MAYBE result from the tri-consistent function means that it cannot
decide if you have a match or not, because some of the inputs were
MAYBE. And TRUE+recheck means that even if all the MAYBE inputs were
passed as TRUE or FALSE, the result would be the same, TRUE+recheck. The
practical difference would be that if the tri-consistent function
returns TRUE+recheck, ginget.c wouldn't need to bother fetching the
other entries, it could just return the entry with recheck=true
immediately. While with MAYBE result, it would fetch the other entries
and call tri-consistent again. ginget.c doesn't currently use the
tri-consistent function that way - it always fetches all the entries for
a potential match before calling tri-consistent, but it could. I had it
do that in some of the patch versions, but Tomas' testing showed that it
was a big loss on some queries, because the consistent function was
called much more often. Still, something like that might be sensible in
the future, so it might be good to distinguish those cases in the API
now. Note that ginarrayproc is already using the return values like
that: in GinContainedStrategy, it always returns TRUE+recheck regardless
of the inputs, but in other cases it uses GIN_MAYBE.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bernd Helmle 2014-02-20 09:56:47 Re: Selecting large tables gets killed
Previous Message Ashutosh Bapat 2014-02-20 09:19:28 Re: Selecting large tables gets killed