Re: [PATCH v3] Avoid manual shift-and-test logic in AllocSetFreeIndex

From: Jeremy Kerr <jk(at)ozlabs(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH v3] Avoid manual shift-and-test logic in AllocSetFreeIndex
Date: 2009-07-20 00:47:18
Message-ID: 200907201047.18603.jk@ozlabs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

> That having been said, Jeremy, you probably want to take a look at
> those comments and I have a few responses to them as well.

OK, thanks for the heads-up.

> following comment:
> > Applied and built cleanly. Regress passes. Trying to hunt down ppc
> > box to see if performance enhancement can be seen.
>
> Question: are we only doing this because of PowerPC?

No, this patch will benefit any architecture that has a gcc
implementation of __builtin_clz. I know that both x86 and powerpc gcc
support this.

> What is going to happen on x86 and other architectures? x86 is not
> the center of the universe, but at a very minimum we need to make sure
> that things don't go backwards on what is a very common platform. Has
> anyone done any benchmarking of this?

Yes, Atsushi Ogawa did some benchmarking with this patch on x86, his
numbers are here:

http://archives.postgresql.org/message-id/4A2895C4.9050108@hi-ho.ne.jp

In fact, Atushi originally submitted a patch using inline asm (using
"bsr") to do this on x86. Coincidentally, I was working on a powerpc
implementation (using "cntlz") at the same time, so submitted a patch
using the gcc builtin that would work on both (and possibly other)
architectures.

> A related question: the original email for this patch says that it
> results in a performance increase of about 2% on PPC. But since it
> gives no details on exactly what the test was that improved by 2%,
> it's hard to judge the impact of this. If this means that
> AllocSetFreeIndex() is 2% faster, I think we should reject this patch
> and move on. It's not worth introducing architecture-specific code
> to get a performance improvement that probably won't even move the
> needle on a macro-benchmark. On the other hand, if the test was
> something like a pgbench run, then this is really very impressive.
> But we don't know which it is.

The 2% improvement I saw is for a sysbench OLTP run. I'm happy to re-do
the run and report specific numbers if you need them.

> Zdenek Kotala added this comment:
> > I have several objections:
> >
> > - inline is forbidden to use in PostgreSQL - you need exception or
> > do it differently
> >
> > - type mismatch - Size vs unsigned int vs 32. you should use only
> > Size and sizeof(Size)

OK, happy to make these changes. What's the commitfest process here,
should I redo the patch and re-send to -hackers?

(inline again: should I just make this a static, the compiler can inline
where possible? or do you want a macro?)

> > And general comment:
> >
> > Do we want to have this kind of code which is optimized for one
> > compiler and platform.

One compiler, multiple platforms.

> > See openSSL or MySQL, they have many
> > optimizations which work fine on one platform with specific version
> > of compiler and specific version of OS. But if you select different
> > version of compiler or different compiler you can get very
> > different performance result (e.g. MySQL 30% degradation, OpenSSL
> > RSA 3x faster and so on).

I don't think we're going to see a lot of variation here (besides the
difference where __builtin_clz isn't available). Given that this is
implemented with a couple of instructions, I'm confident that we won't
see any degradation for platforms that support __builtin_clz. For the
other cases, we just use the exiting while-loop algorithm so there
should be no change (unless we mess up the inlining...).

> > I think in this case, it makes sense to have optimization here, but
> > we should do it like spinlocks are implemented and put this code
> > into /port.

Unless I'm missing something, spinlocks are done the same way - we have
one file, src/include/storage/s_lock.h, which is mostly different
implementations of the lock primitives for different platforms,
separated by preprocessor tests.

Essentially, this is just one line of code, protected by
HAVE_BUILTIN_CLZ (which is a feature-test, rather than a specific
platform or compiler test), and is only used in one place. I don't think
that warrants a separate file.

Cheers,

Jeremy

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2009-07-20 00:49:20 Re: navigation menu for documents
Previous Message Tom Lane 2009-07-20 00:42:40 Re: [PATCH 1/2 v3] [libpq] rework sigpipe-handling macros