Re: Inlining comparators as a performance optimisation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inlining comparators as a performance optimisation
Date: 2011-11-18 05:20:26
Message-ID: CA+TgmoYr8O_fK3WvOPXzU6bzZ0utCfTxUZPN3=nYon3Wfh+KHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 25, 2011 at 10:12 PM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
> I've produced something much neater than my first patch, attached,
> although I still consider this to be at the POC stage, not least since
> I'm not exactly sure how I should be selecting the right
> specialisation in tuplesort_performsort() (a hack is used right now
> that does a direct pg_proc OID comparison), and also because I haven't
> implemented anything other than qsort for heap tuples yet (a minor
> detail, I suppose). I'm pleased to say that a much clearer picture of
> what's really going on here has emerged.
>
> Statistics: Total runtime according to explain analyze for query
> "select * from orderlines order by prod_id" (dellstore2 sample db), at
> GCC 4.5's -02 optimisation level, after warming the cache, on my
> desktop:
>
> Without the patch:
>
> ~82ms
>
> With the patch, but with the "inline" keyword commented out for all
> new functions/meta-functions:
>
> ~60ms
>
> with the patch, unmodified:
>
> ~52ms

Reviewing away when I should be sleeping, wahoo...!

I think that we should really consider doing with this patch what Tom
suggested upthread; namely, looking for a mechanism to allow
individual datatypes to offer up a comparator function that doesn't
require bouncing through FunctionCall2Coll(). It seems to me that
duplicating the entire qsort() algorithm is a little iffy. Sure, in
this case it works out to a win. But it's only a small win -
three-quarters of it is in the uncontroversial activity of reducing
the impedance mismatch - and how do we know it will always be a win?
Adding more copies of the same code can be an anti-optimization if it
means that a smaller percentage of it fits in the instruction cache,
and sometimes small changes in runtime are caused by random shifts in
the layout of memory that align things more or less favorably across
cache lines rather than by real effects. Now it may well be that this
is a real effect, but will it still look as good when we do this for
10 data types? For 100 data types?

In contrast, it seems to me that reducing the impedance mismatch is
something that we could go and do across our entire code base, and
every single data type would benefit from it. It would also be
potentially usable by other sorting algorithms, not just quick sort.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2011-11-18 07:25:55 Re: WIP: Collecting statistics on CSV file data
Previous Message Robert Haas 2011-11-18 04:49:06 Re: RangeVarGetRelid()