Quick Links

Re: B-Tree support function number 3 (strxfrm() optimization)

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: B-Tree support function number 3 (strxfrm() optimization)
Date:	2014-09-02 19:22:48
Message-ID:	CA+TgmoYuRrT_b=rmkQKRms2WFgeXYR1kTN1mnrPof90m=GOeOg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Aug 26, 2014 at 4:09 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Tue, Aug 26, 2014 at 12:59 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I have committed a fix for that problem. Let me know if I missed
>> something else.
>
> Yes, that's all I meant.

OK. The patch needs another rebase as a result of that fix. In
general, you would probably do well to separate unrelated changes out
of your patches and submit a series of patches, one with the unrelated
fixes and a second with the new changes that applies on top of the
first, instead of munging two sets of changes together.

I found the capitalization of sortKeyOther, sortKeyAbbreviated, and
sortKeyTiebreak somewhat surprising. The great majority of precedents
in src/include use all-caps separated by underscores for this, i.e.
SORT_KEY_OTHER. I think it'd be better to use that style here, too.
I also don't particularly like the content of the naming:
SORT_KEY_OTHER does not obviously mean "don't use the abbreviated-key
optimization". We could use ABBREVIATE_KEYS_YES and
ABBREVIATE_KEYS_NO, but what to do about the "tiebreak" value? Hmm.

Maybe we should get rid of the tiebreak case altogether: the second
SortSupport object is just containing all the same values as the first
one, with only the comparator being different. Can't we just have
both the abbreviated-comparator and authoritative-comparator as
members of the SortSupport, and call whichever one is appropriate,
instead of copying the whole SortSupport object? That would have the
nice property of avoiding the need for special handling in
reversedirection_heap().

In bttextcmp_abbreviated, I wondered why you used memcmp() rather than
just testing e.g. a < b ? -1 : a > b ? 1 : 0. Then I realized the
latter is probably wrong on little-endian machines. Not sure if a
comment is warranted.

This comment in sortsupport.h seems to be no longer true, as of commit
1d41739e5a04b0e93304d24d864b6bfa3efc45f2:

* (However, all opclasses that provide BTSORTSUPPORT are required to provide
* the comparator function.)

Independent of this patch, it seems to me that that comment should get
deleted, since we now allow that.

Most places that use a SortSupportData initialize ssup.position
explicitly, but tuplesort_begin_datum() doesn't. That's an
inconsistency that should be fixed, but I'm not sure which direction
is best. Since enum SortKeyPosition is (quite rightly) defined such
that sortKeyOther (i.e. no optimization) is 0, all of the
initializations of pre-zeroed structures are redundant. It might be
just as well to leave those out entirely, and only initialize it in
those places that want to opt *in* to the optimization. But if not,
tuplesort_begin_datum() shouldn't be the only one missing the party.

I think that the naming of the new SortSupport members is not the
greatest. There's nothing (outside of the detailed comments, of
course) to indicate that "position" or "converter" or
"abort_conversion" or "tiebreak" are things that pertain specifically
to abbreviated keys. And I think that's a must, because the
SortSupport comments clearly foresee that it might oversee multiple
types of optimizations. There are several ways to accomplish this.
One is to give them all a common prefix (like "ak_"). Another is to
just rename them a bit. For example, I think "converter" could be
called something like "abbreviate_key" and "abort_conversion" could be
called something like "abort_key_abbreviation". I think I like that
better than a common prefix. I'm not exactly sure what to recommend
for "position" (but I note that it is not, in any relevant sense, the
position of anything) and "tiebreak", and the right answers might
depend on how we resolve some of the other comments noted above.

There are similar problems with the naming of the fields in
Tuplesortstate. You can't just have a flag in there called "aborted"
that relates only to aborting one very specific thing and not the
whole tuplesort. I grant you there's a comment explaining that the
field relates specifically to the abbreviated key optimization, but it
should be also be named in a way that makes that self-evident. There
are other instances of this problem elsewhere; e.g. bttext_abort is
not an abort function for bttext, but something much more specific.

n_distinct is a cardinality estimate, but AFAIK not using hyperloglog.
What is different about that problem vs. this one that makes each
method correct for its respective case? Or should analyze be using
hyperloglog too?

Is it the right decision to suppress the abbreviated-key optimization
unconditionally on 32-bit systems and on Darwin? There's certainly
more danger, on those platforms, that the optimization could fail to
pay off. But it could also win big, if in fact the first character or
two of the string is enough to distinguish most rows, or if Darwin
improves their implementation in the future. If the other defenses
against pathological cases in the patch are adequate, I would think
it'd be OK to remove the hard-coded checks here and let those cases
use the optimization or not according to its merits in particular
cases. We'd want to look at what the impact of that is, of course,
but if it's bad, maybe those other defenses aren't adequate anyway.

Does the lengthy comment in btsortsupport_worker() survive pgindent?

+ * Based on Hideaki Ohno's C++ implementation. The copyright term's of Ohno's

terms, not term's.

+ /* memset() so untouched bytes are NULL */
+ /* By convention, we use buffer 1 to store and NULL terminate text */
+ /* Just like strcoll(), strxfrm() expects a NULL-terminated string */

Please use NUL or \0.

+ * There is no special handling of the C locale here. strxfrm() is used
+ * indifferently.

Comments should explain the C code, not just recapitulate it.

+ * First, Hash key proper, or a significant fraction of it.
Mix in length

There's no reason why "Hash" should be capitalized here.

+ * Every Datum byte is compared. This is safe because the
strxfrm() blob
+ * is itself NULL-terminated, leaving no danger of
misinterpreting any NULL
+ * bytes not intended to be interpreted as logically representing
+ * termination.

Reading from an uninitialized byte could provide a valgrind warning
even if it's harmless, I think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: B-Tree support function number 3 (strxfrm() optimization) at 2014-08-26 20:09:00 from Peter Geoghegan

Responses

Re: B-Tree support function number 3 (strxfrm() optimization) at 2014-09-02 20:41:29 from Peter Geoghegan
Re: B-Tree support function number 3 (strxfrm() optimization) at 2014-09-02 23:51:23 from Peter Geoghegan
Re: B-Tree support function number 3 (strxfrm() optimization) at 2014-09-03 00:29:06 from Peter Geoghegan
Re: B-Tree support function number 3 (strxfrm() optimization) at 2014-09-03 01:13:16 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2014-09-02 20:12:53	Re: PL/pgSQL 2
Previous Message	Andres Freund	2014-09-02 19:05:17	Re: Escaping from blocked send() reprised.