Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-27 18:43:37
Message-ID: 4870.1322419417@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> On Sat, Nov 26, 2011 at 11:11 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> There's been some significant change in rangetypes_gist.c, can you
>> please rebase this patch?

> OK, rebased with head.

I looked at this patch a bit. I agree with the aspect of it that says
"let's add a flag bit so we can tell whether an upper GiST item includes
any empty ranges"; I think we really need that in order to make
contained_by searches usable. However, I'm not so happy with the
proposed rewrite of the penalty/picksplit functions. I see two problems
there:

1. penalty is using both hard-wired penalty values (1.0, 2.0, etc) and
values obtained from subtype_diff. This is not good, because you have
no idea what scale the subtype differences will be expressed on. The
hard-wired values could be greatly larger than range widths, or greatly
smaller, resulting in randomly different index behavior.

2. It's too large/complicated. You're proposing to add nearly a
thousand lines to rangetypes_gist.c, and I do not see any reason to
think that this is so much better than what's there now as to justify
that kind of increment in the code size. I saw your performance
results, but one set of results on an arbitrary (not-real-world) test
case doesn't prove a lot to me; and in particular it doesn't prove that
we couldn't do as well with a much smaller and simpler patch.

There are a lot of garden-variety coding problems, too, for instance here:

+ *penalty = Max(DatumGetFloat8(FunctionCall2(
+ subtype_diff, orig_lower.val, new_lower.val)), 0.0);

which is going to uselessly call the subtype_diff function twice most of
the time (Max() is only a macro), plus you left off the collation
argument. But I don't think it's worth worrying about those until the
big picture is correct, which I feel it isn't yet.

Earlier in the thread you wrote:

> Questions:
> 1) I'm not sure about whether we need support of <> in GiST, because it
> always produces full index scan (except search for non-empty ranges).

I was thinking the same thing; that opclass entry seems pretty darn
useless.

I propose to pull out and apply the changes related to the
RANGE_CONTAIN_EMPTY flag, and also remove the <> opclass entry,
because I think these are uncontroversial and in the nature of
"must fix quickly". The redesign of the penalty and picksplit
functions should be discussed separately.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2011-11-27 18:47:21 information schema/aclexplode doesn't know about default privileges
Previous Message Alexander Soudakov 2011-11-27 18:29:45 Re: Feature proposal: www_fdw