Lists: | pgsql-hackers |
---|
From: | Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | faster version of AllocSetFreeIndex for x86 architecture |
Date: | 2009-06-02 14:53:36 |
Message-ID: | 4A253CF0.1000702@hi-ho.ne.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
I made a faster version of AllocSetFreeIndex for x86 architecture.
Attached files are benchmark programs and patch file.
alloc_test.pl: benchmark script
alloc_test.c: benchmark program
aset_free_index.patch: patch for util/mmgr/aset.c
This benchmark compares the original function with a faster version.
To try the benchmark, only execute alloc_test.pl. This script compiles
alloc_test.c and execute the benchmark.
Results of benchmark script:
Xeon(Core architecture), RedHat EL4, gcc 3.4.6
bytes : 4 8 16 32 64 128 256 512 1024 mix
original: 0.780 0.780 0.820 0.870 0.930 0.970 1.030 1.080 1.130 0.950
patched : 0.380 0.170 0.170 0.170 0.170 0.180 0.170 0.180 0.180 0.280
Core2, Windows XP, gcc 3.4.4 (cygwin)
bytes : 4 8 16 32 64 128 256 512 1024 mix
original: 0.249 0.249 0.515 0.452 0.577 0.671 0.796 0.890 0.999 1.577
patched : 0.358 0.218 0.202 0.218 0.218 0.218 0.202 0.218 0.218 0.218
Xeon(Pentium4 architecture), RedHal EL4, gcc 3.4.6
bytes : 4 8 16 32 64 128 256 512 1024 mix
original: 0.510 0.520 0.620 0.860 0.970 1.260 1.150 1.220 1.290 0.860
patched : 0.620 0.530 0.530 0.540 0.540 0.530 0.540 0.530 0.530 0.490
The effect of the patch that I measured by oprofile is:
- test program: pgbench -c 1 -t 50000 (fsync=off)
original:
CPU: P4 / Xeon with 2 hyper-threads, speed 2793.55 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events
with a unit mask of 0x01 (mandatory) count 100000
samples % symbol name
66854 6.6725 AllocSetAlloc
47679 4.7587 base_yyparse
29058 2.9002 hash_search_with_hash_value
22053 2.2011 SearchCatCache
19264 1.9227 MemoryContextAllocZeroAligned
16223 1.6192 base_yylex
13819 1.3792 ScanKeywordLookup
13305 1.3279 expression_tree_walker
12144 1.2121 LWLockAcquire
11850 1.1827 XLogInsert
11817 1.1794 AllocSetFree
patched:
CPU: P4 / Xeon with 2 hyper-threads, speed 2793.55 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events
with a unit mask of 0x01 (mandatory) count 100000
samples % symbol name
47610 4.9333 AllocSetAlloc
47441 4.9158 base_yyparse
28243 2.9265 hash_search_with_hash_value
22197 2.3000 SearchCatCache
18984 1.9671 MemoryContextAllocZeroAligned
15747 1.6317 base_yylex
13368 1.3852 ScanKeywordLookup
12889 1.3356 expression_tree_walker
12092 1.2530 LWLockAcquire
12078 1.2515 XLogInsert
(skip)
6248 0.6474 AllocSetFree
I think this patch improves AllocSetAlloc/AllocSetFree performance.
Best regards,
---
Atsushi Ogawa
a_ogawa(at)hi-ho(dot)ne(dot)jp
Attachment | Content-Type | Size |
---|---|---|
alloc_test.pl | text/plain | 729 bytes |
alloc_test.c | text/plain | 1.6 KB |
aset_free_index.patch | text/plain | 968 bytes |
From: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
Subject: | Re: faster version of AllocSetFreeIndex for x86 architecture |
Date: | 2009-06-02 15:39:52 |
Message-ID: | 200906030139.52998.jk@ozlabs.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
> I made a faster version of AllocSetFreeIndex for x86 architecture.
Neat, I have a version for PowerPC too.
In order to prevent writing multiple copies of AllocSetFreeIndex, I
propose that we add a fls() function ("find last set"); this can be
defined in an architecture-independent manner (ie, shift mask & test in
a loop), and re-defined for arches that have faster ways of doing the
same (ie, cntlz instruction on powerpc).
We can then change AllocSetFreeIndex to use fls().
Patches coming...
Jeremy
From: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
---|---|
To: | <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
Subject: | [PATCH 1/2] Add bit operations util header |
Date: | 2009-06-02 15:44:39 |
Message-ID: | 1243957479.379462.496709474029.1.gpush@pingu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Add a utility header for simple bit operatios - bitops.h.
At present, just contains the fls() (find last set bit) function.
Signed-off-by: Jeremy Kerr <jk(at)ozlabs(dot)org>
---
src/include/utils/bitops.h | 52 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/src/include/utils/bitops.h b/src/include/utils/bitops.h
new file mode 100644
index 0000000..de11624
--- /dev/null
+++ b/src/include/utils/bitops.h
@@ -0,0 +1,52 @@
+/*-------------------------------------------------------------------------
+ *
+ * bitops.h
+ * Simple bit operations.
+ *
+ * Portions Copyright (c) 2009, PostgreSQL Global Development Group
+ *
+ * $PostgreSQL$
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BITOPS_H
+#define BITOPS_H
+
+#if defined(__ppc__) || defined(__powerpc__) || \
+ defined(__ppc64__) || defined (__powerpc64__)
+
+static inline int
+fls(unsigned int x)
+{
+ int lz;
+ asm("cntlz %0,%1" : "=r" (lz) : "r" (x));
+ return 32 - lz;
+}
+
+#else /* !powerpc */
+
+/* Architecture-independent implementations */
+
+/*
+ * fls: find last set bit.
+ *
+ * Returns the 1-based index of the most-significant bit in x. The MSB
+ * is bit number 32, the LSB is bit number 1. If x is zero, returns zero.
+ */
+static inline int
+fls(unsigned int x)
+{
+ int ls = 0;
+
+ while (x != 0)
+ {
+ ls++;
+ x >>= 1;
+ }
+
+ return ls;
+}
+
+#endif
+
+#endif /* BITOPS_H */
From: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
---|---|
To: | <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
Subject: | [PATCH 2/2] Use fls() to find chunk set |
Date: | 2009-06-02 15:44:39 |
Message-ID: | 1243957479.379718.764595571686.2.gpush@pingu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Results in a ~2% performance increase by using the powerpc fls()
implementation.
Signed-off-by: Jeremy Kerr <jk(at)ozlabs(dot)org>
---
src/backend/utils/mmgr/aset.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 0e2d4d5..762cf72 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -65,6 +65,7 @@
#include "postgres.h"
#include "utils/memutils.h"
+#include "utils/bitops.h"
/* Define this to detail debug alloc information */
/* #define HAVE_ALLOCINFO */
@@ -270,12 +271,7 @@ AllocSetFreeIndex(Size size)
if (size > 0)
{
- size = (size - 1) >> ALLOC_MINBITS;
- while (size != 0)
- {
- idx++;
- size >>= 1;
- }
+ idx = fls((size - 1) >> ALLOC_MINBITS);
Assert(idx < ALLOCSET_NUM_FREELISTS);
}
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
Subject: | Re: [PATCH 1/2] Add bit operations util header |
Date: | 2009-06-02 16:07:44 |
Message-ID: | 13732.1243958864@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Jeremy Kerr <jk(at)ozlabs(dot)org> writes:
> Add a utility header for simple bit operatios - bitops.h.
This will fail outright on any non-gcc compiler.
regards, tom lane
From: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
---|---|
To: | <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
Subject: | [PATCH v2] Add bit operations util header |
Date: | 2009-06-02 23:33:21 |
Message-ID: | 1243985601.846636.478412111657.1.gpush@pingu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Add a utility header for simple bit operatios - bitops.h.
At present, just contains the fls() (find last set bit) function.
Signed-off-by: Jeremy Kerr <jk(at)ozlabs(dot)org>
---
v2: only use inline asm with gcc
---
src/include/utils/bitops.h | 53 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 53 insertions(+)
diff --git a/src/include/utils/bitops.h b/src/include/utils/bitops.h
new file mode 100644
index 0000000..4f2bbc9
--- /dev/null
+++ b/src/include/utils/bitops.h
@@ -0,0 +1,53 @@
+/*-------------------------------------------------------------------------
+ *
+ * bitops.h
+ * Simple bit operations.
+ *
+ * Portions Copyright (c) 2009, PostgreSQL Global Development Group
+ *
+ * $PostgreSQL$
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef BITOPS_H
+#define BITOPS_H
+
+#if defined(__GNUC__) && \
+ (defined(__ppc__) || defined(__powerpc__) || \
+ defined(__ppc64__) || defined (__powerpc64__))
+
+static inline int
+fls(unsigned int x)
+{
+ int lz;
+ asm("cntlz %0,%1" : "=r" (lz) : "r" (x));
+ return 32 - lz;
+}
+
+#else /* !(gcc && powerpc) */
+
+/* Architecture-independent implementations */
+
+/*
+ * fls: find last set bit.
+ *
+ * Returns the 1-based index of the most-significant bit in x. The MSB
+ * is bit number 32, the LSB is bit number 1. If x is zero, returns zero.
+ */
+static inline int
+fls(unsigned int x)
+{
+ int ls = 0;
+
+ while (x != 0)
+ {
+ ls++;
+ x >>= 1;
+ }
+
+ return ls;
+}
+
+#endif
+
+#endif /* BITOPS_H */
From: | Florian Weimer <fweimer(at)bfk(dot)de> |
---|---|
To: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
Cc: | <pgsql-hackers(at)postgresql(dot)org>, Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
Subject: | Re: [PATCH v2] Add bit operations util header |
Date: | 2009-06-03 09:54:37 |
Message-ID: | 82k53t3an6.fsf@mid.bfk.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
* Jeremy Kerr:
> +#if defined(__GNUC__) && \
> + (defined(__ppc__) || defined(__powerpc__) || \
> + defined(__ppc64__) || defined (__powerpc64__))
If you require GCC anyway, you can use __builtin_clz instead.
(It's been available since GCC 4.1 at least.)
--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
From: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
---|---|
To: | Florian Weimer <fweimer(at)bfk(dot)de> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
Subject: | Re: [PATCH v2] Add bit operations util header |
Date: | 2009-06-03 11:27:16 |
Message-ID: | 200906032127.16906.jk@ozlabs.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Florian,
> > +#if defined(__GNUC__) && \
> > + (defined(__ppc__) || defined(__powerpc__) || \
> > + defined(__ppc64__) || defined (__powerpc64__))
>
> If you require GCC anyway, you can use __builtin_clz instead.
> (It's been available since GCC 4.1 at least.)
Because now we have to test the compiler *and* the version as well?
But I do agree that using the builtins makes for much better code; I'm
looking at a future change that does this.
Cheers,
Jeremy
From: | Florian Weimer <fweimer(at)bfk(dot)de> |
---|---|
To: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
Subject: | Re: [PATCH v2] Add bit operations util header |
Date: | 2009-06-03 11:34:53 |
Message-ID: | 82r5y11rfm.fsf@mid.bfk.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
* Jeremy Kerr:
> Florian,
>
>> > +#if defined(__GNUC__) && \
>> > + (defined(__ppc__) || defined(__powerpc__) || \
>> > + defined(__ppc64__) || defined (__powerpc64__))
>>
>> If you require GCC anyway, you can use __builtin_clz instead.
>> (It's been available since GCC 4.1 at least.)
>
> Because now we have to test the compiler *and* the version as well?
This builtin is not architecture-specific, so you'd save the
architecture check.
--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Florian Weimer <fweimer(at)bfk(dot)de> |
Cc: | Jeremy Kerr <jk(at)ozlabs(dot)org>, pgsql-hackers(at)postgresql(dot)org, Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
Subject: | Re: [PATCH v2] Add bit operations util header |
Date: | 2009-06-03 15:52:49 |
Message-ID: | 5714.1244044369@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Florian Weimer <fweimer(at)bfk(dot)de> writes:
> * Jeremy Kerr:
>> Because now we have to test the compiler *and* the version as well?
> This builtin is not architecture-specific, so you'd save the
> architecture check.
The appropriate way to handle it would be a configure probe to see if
the function is available, thus avoiding any wired-in knowledge about
compiler or compiler version *or* architecture.
The other thing I didn't like about the patch was the assumption that
it's okay to have a "static inline" function in a header. You can
get away with that in gcc but *not* in other compilers. Look at the
existing coding patterns for, eg, list_head; then go thou and do
likewise. Or, since there's currently no need for the code outside
aset.c, forget about putting it in a header and just plop it into
aset.c.
regards, tom lane
From: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Florian Weimer <fweimer(at)bfk(dot)de>, pgsql-hackers(at)postgresql(dot)org, Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
Subject: | Re: [PATCH v2] Add bit operations util header |
Date: | 2009-06-03 23:44:30 |
Message-ID: | 200906040944.30578.jk@ozlabs.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Tom,
> The other thing I didn't like about the patch was the assumption that
> it's okay to have a "static inline" function in a header. You can
> get away with that in gcc but *not* in other compilers.
Gee, you user-space guys have it tough! :D
Point taken, will rework.
> Look at the existing coding patterns for, eg, list_head; then go thou
> and do likewise. Or, since there's currently no need for the code
> outside aset.c, forget about putting it in a header and just plop it
> into aset.c.
OK, I'll add a configure check and conditionally use the builtin if it's
available. I have some other patches that could be improved by using
other builtins, so it would be a good opportunity to figure out a nice
pattern for doing this.
Cheers,
Jeremy
From: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
---|---|
To: | <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Weimer <fweimer(at)bfk(dot)de> |
Subject: | [RFC,PATCH] Avoid manual shift-and-test logic in AllocSetFreeIndex |
Date: | 2009-06-04 05:53:36 |
Message-ID: | 1244094816.434713.890869745434.1.gpush@pingu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Move the shift-and-test login into a separate fls() function, which
can use __builtin_clz() if it's available.
This requires a new check for __builtin_clz in the configure script.
Results in a ~2% performance increase on PowerPC.
Signed-off-by: Jeremy Kerr <jk(at)ozlabs(dot)org>
---
configure.in | 13 +++++++++++++
src/backend/utils/mmgr/aset.c | 32 ++++++++++++++++++++++++++------
2 files changed, 39 insertions(+), 6 deletions(-)
diff --git a/configure.in b/configure.in
index b8d2685..6a317b0 100644
--- a/configure.in
+++ b/configure.in
@@ -1361,6 +1361,19 @@ case $host_os in
AC_FUNC_FSEEKO;;
esac
+# GCC builtins
+#
+# We need AC_TRY_LINK here, as the prototype generated by AC_CHECK_FUNC
+# will cause gcc to try to reference a non-builtin symbol.
+
+AC_MSG_CHECKING([for __builtin_clz])
+AC_TRY_LINK([],
+ [__builtin_clz(0);],
+ [AC_DEFINE(HAVE_BUILTIN_CLZ, 1,
+ [Define to 1 if you have __builtin_clz().])
+ AC_MSG_RESULT(yes)],
+ [AC_MSG_RESULT(no)])
+
#
# Pthreads
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 0e2d4d5..af352b8 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -255,6 +255,31 @@ static MemoryContextMethods AllocSetMethods = {
#define AllocAllocInfo(_cxt, _chunk)
#endif
+/*
+ * fls: find last set bit.
+ *
+ * Returns the 1-based index of the most-significant bit in x. The MSB
+ * is bit number 32, the LSB is bit number 1. If x is zero, the result is
+ * undefined.
+ */
+static inline int
+fls(unsigned int x)
+{
+#ifdef HAVE_BUILTIN_CLZ
+ return 32 - __builtin_clz(x);
+#else
+ int ls = 0;
+
+ while (x != 0)
+ {
+ ls++;
+ x >>= 1;
+ }
+
+ return ls;
+#endif
+}
+
/* ----------
* AllocSetFreeIndex -
*
@@ -270,12 +295,7 @@ AllocSetFreeIndex(Size size)
if (size > 0)
{
- size = (size - 1) >> ALLOC_MINBITS;
- while (size != 0)
- {
- idx++;
- size >>= 1;
- }
+ idx = fls((size - 1) >> ALLOC_MINBITS);
Assert(idx < ALLOCSET_NUM_FREELISTS);
}
From: | Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
---|---|
To: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Weimer <fweimer(at)bfk(dot)de> |
Subject: | Re: [RFC, PATCH] Avoid manual shift-and-test logic in AllocSetFreeIndex |
Date: | 2009-06-04 13:12:23 |
Message-ID: | 4A27C837.4060608@hi-ho.ne.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> +/*
> + * fls: find last set bit.
> + *
> + * Returns the 1-based index of the most-significant bit in x. The MSB
> + * is bit number 32, the LSB is bit number 1. If x is zero, the result is
> + * undefined.
> + */
> +static inline int
> +fls(unsigned int x)
...
> + idx = fls((size - 1) >> ALLOC_MINBITS);
If size <= 8, fls((size - 1) >> ALLOC_MINBITS) is fls(0).
The result of fls(0) is undefined.
I think we have to never call fls(0) from AllocSetFreeIndex().
My proposal code:
if (size > (1 << ALLOC_MINBITS))
{
idx = fls((size - 1) >> ALLOC_MINBITS);
Assert(idx < ALLOCSET_NUM_FREELISTS);
}
Best regards,
---
Atsushi Ogawa
From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: faster version of AllocSetFreeIndex for x86 architecture |
Date: | 2009-06-04 18:10:29 |
Message-ID: | 1244139029.23910.235.camel@ebony.2ndQuadrant |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, 2009-06-02 at 23:53 +0900, Atsushi Ogawa wrote:
> I made a faster version of AllocSetFreeIndex for x86 architecture.
> Results of benchmark script:
> Xeon(Core architecture), RedHat EL4, gcc 3.4.6
> bytes : 4 8 16 32 64 128 256 512 1024 mix
> original: 0.780 0.780 0.820 0.870 0.930 0.970 1.030 1.080 1.130 0.950
> patched : 0.380 0.170 0.170 0.170 0.170 0.180 0.170 0.180 0.180 0.280
> The effect of the patch that I measured by oprofile is:
> - test program: pgbench -c 1 -t 50000 (fsync=off)
>
> original:
> CPU: P4 / Xeon with 2 hyper-threads, speed 2793.55 MHz (estimated)
> Counted GLOBAL_POWER_EVENTS events
> with a unit mask of 0x01 (mandatory) count 100000
> samples % symbol name
> 66854 6.6725 AllocSetAlloc
> patched:
> CPU: P4 / Xeon with 2 hyper-threads, speed 2793.55 MHz (estimated)
> Counted GLOBAL_POWER_EVENTS events
> with a unit mask of 0x01 (mandatory) count 100000
> samples % symbol name
> 47610 4.9333 AllocSetAlloc
> I think this patch improves AllocSetAlloc/AllocSetFree performance.
Looks like very good work. Much appreciated.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
From: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
---|---|
To: | Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Weimer <fweimer(at)bfk(dot)de> |
Subject: | Re: [RFC, PATCH] Avoid manual shift-and-test logic in AllocSetFreeIndex |
Date: | 2009-06-04 23:18:24 |
Message-ID: | 200906050918.25358.jk@ozlabs.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Atsushi,
> If size <= 8, fls((size - 1) >> ALLOC_MINBITS) is fls(0).
> The result of fls(0) is undefined.
Yep, got caught out by this because my previous fls() supported zero.
> I think we have to never call fls(0) from AllocSetFreeIndex().
> My proposal code:
>
> if (size > (1 << ALLOC_MINBITS))
> {
> idx = fls((size - 1) >> ALLOC_MINBITS);
> Assert(idx < ALLOCSET_NUM_FREELISTS);
> }
Looks good, I'll send an updated patch.
Also, are you still seeing the same improvement with the __builtin_clz
as your inline asm implementation?
Cheers,
Jeremy
From: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
---|---|
To: | <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Weimer <fweimer(at)bfk(dot)de> |
Subject: | [PATCH v2] Avoid manual shift-and-test logic in AllocSetFreeIndex |
Date: | 2009-06-05 03:23:28 |
Message-ID: | 1244172208.150091.269876726689.1.gpush@pingu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Move the shift-and-test login into a separate fls() function, which
can use __builtin_clz() if it's available.
This requires a new check for __builtin_clz in the configure script.
Results in a ~2% performance increase on PowerPC.
Signed-off-by: Jeremy Kerr <jk(at)ozlabs(dot)org>
---
v2: prevent fls(0)
---
configure.in | 13 +++++++++++++
src/backend/utils/mmgr/aset.c | 34 +++++++++++++++++++++++++++-------
2 files changed, 40 insertions(+), 7 deletions(-)
diff --git a/configure.in b/configure.in
index b8d2685..6a317b0 100644
--- a/configure.in
+++ b/configure.in
@@ -1361,6 +1361,19 @@ case $host_os in
AC_FUNC_FSEEKO;;
esac
+# GCC builtins
+#
+# We need AC_TRY_LINK here, as the prototype generated by AC_CHECK_FUNC
+# will cause gcc to try to reference a non-builtin symbol.
+
+AC_MSG_CHECKING([for __builtin_clz])
+AC_TRY_LINK([],
+ [__builtin_clz(0);],
+ [AC_DEFINE(HAVE_BUILTIN_CLZ, 1,
+ [Define to 1 if you have __builtin_clz().])
+ AC_MSG_RESULT(yes)],
+ [AC_MSG_RESULT(no)])
+
#
# Pthreads
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 0e2d4d5..9eb3117 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -255,6 +255,31 @@ static MemoryContextMethods AllocSetMethods = {
#define AllocAllocInfo(_cxt, _chunk)
#endif
+/*
+ * fls: find last set bit.
+ *
+ * Returns the 1-based index of the most-significant bit in x. The MSB
+ * is bit number 32, the LSB is bit number 1. If x is zero, the result is
+ * undefined.
+ */
+static inline int
+fls(unsigned int x)
+{
+#ifdef HAVE_BUILTIN_CLZ
+ return 32 - __builtin_clz(x);
+#else
+ int ls = 0;
+
+ while (x != 0)
+ {
+ ls++;
+ x >>= 1;
+ }
+
+ return ls;
+#endif
+}
+
/* ----------
* AllocSetFreeIndex -
*
@@ -268,14 +293,9 @@ AllocSetFreeIndex(Size size)
{
int idx = 0;
- if (size > 0)
+ if (size > (1 << ALLOC_MINBITS))
{
- size = (size - 1) >> ALLOC_MINBITS;
- while (size != 0)
- {
- idx++;
- size >>= 1;
- }
+ idx = fls((size - 1) >> ALLOC_MINBITS);
Assert(idx < ALLOCSET_NUM_FREELISTS);
}
From: | Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp> |
---|---|
To: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Weimer <fweimer(at)bfk(dot)de> |
Subject: | Re: [RFC, PATCH] Avoid manual shift-and-test logic in AllocSetFreeIndex |
Date: | 2009-06-05 03:49:24 |
Message-ID: | 4A2895C4.9050108@hi-ho.ne.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
> Also, are you still seeing the same improvement with the __builtin_clz
> as your inline asm implementation?
In my benchmark program, it is a little different performance
in fls implementation and inline asm implementation.
However, the result of a pgbench is almost the same improvement.
Here is the result of my benchmark.
Xeon(Core architecture)
bytes : 4 8 16 32 64 128 256 512 1024 mix
original : 0.780 0.790 0.820 0.870 0.930 0.980 1.040 1.080 1.140 0.910
inline asm: 0.320 0.180 0.190 0.180 0.190 0.180 0.190 0.180 0.190 0.170
fls : 0.270 0.260 0.290 0.290 0.290 0.290 0.290 0.300 0.290 0.380
Xeon(P4 architecrure)
bytes : 4 8 16 32 64 128 256 512 1024 mix
original : 0.520 0.520 0.670 0.780 0.950 1.000 1.060 1.190 1.250 0.940
inline asm: 0.610 0.530 0.530 0.520 0.520 0.540 0.540 0.580 0.540 0.600
fls : 0.390 0.370 0.780 0.780 0.780 0.790 0.780 0.780 0.780 0.520
pgbench result (measured by oprofile)
CPU: Xeon(P4 architecrure)
test program: pgbench -c 1 -t 50000 (fsync=off)
original
samples % symbol name
66854 6.6725 AllocSetAlloc
11817 1.1794 AllocSetFree
inline asm
samples % symbol name
47610 4.9333 AllocSetAlloc
6248 0.6474 AllocSetFree
fls
samples % symbol name
48779 4.9954 AllocSetAlloc
7648 0.7832 AllocSetFree
Best regards,
---
Atsushi Ogawa
From: | Jeremy Kerr <jk(at)ozlabs(dot)org> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Weimer <fweimer(at)bfk(dot)de> |
Subject: | Re: [RFC, PATCH] Avoid manual shift-and-test logic in AllocSetFreeIndex |
Date: | 2009-06-05 04:04:00 |
Message-ID: | 200906051404.01571.jk@ozlabs.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Atsushi,
> In my benchmark program, it is a little different performance
> in fls implementation and inline asm implementation.
> However, the result of a pgbench is almost the same improvement.
>
> Here is the result of my benchmark.
Excellent, thank you for getting this extra set of numbers.
Cheers,
Jeremy