tweaking MemSet() performance

Lists: pgsql-hackers
From: Neil Conway <neilc(at)samurai(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: tweaking MemSet() performance
Date: 2002-08-29 05:27:41
Message-ID: 87wuqaw7xu.fsf@mailbox.samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In include/c.h, MemSet() is defined to be different than the stock
function memset() only when copying less than or equal to
MEMSET_LOOP_LIMIT bytes (currently 64). The comments above the macro
definition note:

* We got the 64 number by testing this against the stock memset() on
* BSD/OS 3.0. Larger values were slower. bjm 1997/09/11
*
* I think the crossover point could be a good deal higher for
* most platforms, actually. tgl 2000-03-19

I decided to investigate Tom's suggestion and determine the
performance of MemSet() versus memset() on my machine, for various
values of MEMSET_LOOP_LIMIT. The machine this is being tested on is a
Pentium 4 1.8 Ghz with RDRAM, running Linux 2.4.19pre8 with GCC 3.1.1
and glibc 2.2.5 -- the results may or may not apply to other
machines.

The test program was:

#include <string.h>
#include "postgres.h"

#undef MEMSET_LOOP_LIMIT
#define MEMSET_LOOP_LIMIT BUFFER_SIZE

int
main(void)
{
char buffer[BUFFER_SIZE];
long long i;

for (i = 0; i < 99000000; i++)
{
MemSet(buffer, 0, sizeof(buffer));
}

return 0;
}

(I manually changed MemSet() to memset() when testing the performance
of the latter function.)

It was compiled like so:

gcc -O2 -DBUFFER_SIZE=xxx -Ipgsql/src/include memset.c

(The -O2 optimization flag is important: the results are significantly
different if it is not used.)

Here are the results (each timing is the 'total' listing from 'time
./a.out'):

BUFFER_SIZE = 64
MemSet() -> 2.756, 2.810, 2.789
memset() -> 13.844, 13.782, 13.778

BUFFER_SIZE = 128
MemSet() -> 5.848, 5.989, 5.861
memset() -> 15.637, 15.631, 15.631

BUFFER_SIZE = 256
MemSet() -> 9.602, 9.652, 9.633
memset() -> 19.305, 19.370, 19.302

BUFFER_SIZE = 512
MemSet() -> 17.416, 17.462, 17.353
memset() -> 26.657, 26.658, 26.678

BUFFER_SIZE = 1024
MemSet() -> 32.144, 32.179, 32.086
memset() -> 41.186, 41.115, 41.176

BUFFER_SIZE = 2048
MemSet() -> 60.39, 60.48, 60.32
memset() -> 71.19, 71.18, 71.17

BUFFER_SIZE = 4096
MemSet() -> 118.29, 120.07, 118.69
memset() -> 131.40, 131.41

... at which point I stopped benchmarking.

Is the benchmark above a reasonable assessment of memset() / MemSet()
performance when copying word-aligned amounts of memory? If so, what's
a good value for MEMSET_LOOP_LIMIT (perhaps 512)?

Also, if anyone would like to contribute the results of doing the
benchmark on their particular system, that might provide some useful
additional data points.

Cheers,

Neil

--
Neil Conway <neilc(at)samurai(dot)com> || PGP Key ID: DB3C29FC


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tweaking MemSet() performance
Date: 2002-08-29 19:37:26
Message-ID: 200208291937.g7TJbQC20180@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I consider this a very good test. As you can see from the date of my
last test, 1997/09/11, I think I may have had a dual Pentium Pro at that
point, and hardware has certainly changed since then. I did try 128 at
that time and found it to be slower, but with newer hardware, it is very
possible it has improved.

I remember in writing that macro how surprised I was that there was any
improvements, but obviously there is a gain and the gain is getting
bigger.

I tested the following program:

#include <string.h>
#include "postgres.h"

#undef MEMSET_LOOP_LIMIT
#define MEMSET_LOOP_LIMIT 1000000

int
main(int argc, char **argv)
{
int len = atoi(argv[1]);
char buffer[len];
long long i;

for (i = 0; i < 9900000; i++)
MemSet(buffer, 0, len);
return 0;
}

and, yes, -O2 is significant! Looks like we use -O2 on all platforms
that use GCC so we should be OK there.

I tested with the following script:

for TIME in 64 128 256 512 1024 2048 4096; do echo "*$TIME\c";
time tst1 $TIME; done

and got for MemSet:

*64
real 0m1.001s
user 0m1.000s
sys 0m0.003s
*128
real 0m1.578s
user 0m1.567s
sys 0m0.013s
*256
real 0m2.723s
user 0m2.723s
sys 0m0.003s
*512
real 0m5.044s
user 0m5.029s
sys 0m0.013s
*1024
real 0m9.621s
user 0m9.621s
sys 0m0.003s
*2048
real 0m18.821s
user 0m18.811s
sys 0m0.013s
*4096
real 0m37.266s
user 0m37.266s
sys 0m0.003s

and for memset():

*64
real 0m1.813s
user 0m1.801s
sys 0m0.014s
*128
real 0m2.489s
user 0m2.499s
sys 0m0.994s
*256
real 0m4.397s
user 0m5.389s
sys 0m0.005s
*512
real 0m5.186s
user 0m6.170s
sys 0m0.015s
*1024
real 0m6.676s
user 0m6.676s
sys 0m0.003s
*2048
real 0m9.766s
user 0m9.776s
sys 0m0.994s
*4096
real 0m15.970s
user 0m15.954s
sys 0m0.003s

so for BSD/OS, the break-even is 512.

I am on a dual P3/550 using 2.95.2. I will tell you exactly why my
break-even is lower than most --- I have assembly language memset()
functions in libc on BSD/OS.

I suggest changing the MEMSET_LOOP_LIMIT to 512.

---------------------------------------------------------------------------

Neil Conway wrote:
> In include/c.h, MemSet() is defined to be different than the stock
> function memset() only when copying less than or equal to
> MEMSET_LOOP_LIMIT bytes (currently 64). The comments above the macro
> definition note:
>
> * We got the 64 number by testing this against the stock memset() on
> * BSD/OS 3.0. Larger values were slower. bjm 1997/09/11
> *
> * I think the crossover point could be a good deal higher for
> * most platforms, actually. tgl 2000-03-19
>
> I decided to investigate Tom's suggestion and determine the
> performance of MemSet() versus memset() on my machine, for various
> values of MEMSET_LOOP_LIMIT. The machine this is being tested on is a
> Pentium 4 1.8 Ghz with RDRAM, running Linux 2.4.19pre8 with GCC 3.1.1
> and glibc 2.2.5 -- the results may or may not apply to other
> machines.
>
> The test program was:
>
> #include <string.h>
> #include "postgres.h"
>
> #undef MEMSET_LOOP_LIMIT
> #define MEMSET_LOOP_LIMIT BUFFER_SIZE
>
> int
> main(void)
> {
> char buffer[BUFFER_SIZE];
> long long i;
>
> for (i = 0; i < 99000000; i++)
> {
> MemSet(buffer, 0, sizeof(buffer));
> }
>
> return 0;
> }
>
> (I manually changed MemSet() to memset() when testing the performance
> of the latter function.)
>
> It was compiled like so:
>
> gcc -O2 -DBUFFER_SIZE=xxx -Ipgsql/src/include memset.c
>
> (The -O2 optimization flag is important: the results are significantly
> different if it is not used.)
>
> Here are the results (each timing is the 'total' listing from 'time
> ./a.out'):
>
> BUFFER_SIZE = 64
> MemSet() -> 2.756, 2.810, 2.789
> memset() -> 13.844, 13.782, 13.778
>
> BUFFER_SIZE = 128
> MemSet() -> 5.848, 5.989, 5.861
> memset() -> 15.637, 15.631, 15.631
>
> BUFFER_SIZE = 256
> MemSet() -> 9.602, 9.652, 9.633
> memset() -> 19.305, 19.370, 19.302
>
> BUFFER_SIZE = 512
> MemSet() -> 17.416, 17.462, 17.353
> memset() -> 26.657, 26.658, 26.678
>
> BUFFER_SIZE = 1024
> MemSet() -> 32.144, 32.179, 32.086
> memset() -> 41.186, 41.115, 41.176
>
> BUFFER_SIZE = 2048
> MemSet() -> 60.39, 60.48, 60.32
> memset() -> 71.19, 71.18, 71.17
>
> BUFFER_SIZE = 4096
> MemSet() -> 118.29, 120.07, 118.69
> memset() -> 131.40, 131.41
>
> ... at which point I stopped benchmarking.
>
> Is the benchmark above a reasonable assessment of memset() / MemSet()
> performance when copying word-aligned amounts of memory? If so, what's
> a good value for MEMSET_LOOP_LIMIT (perhaps 512)?
>
> Also, if anyone would like to contribute the results of doing the
> benchmark on their particular system, that might provide some useful
> additional data points.
>
> Cheers,
>
> Neil
>
> --
> Neil Conway <neilc(at)samurai(dot)com> || PGP Key ID: DB3C29FC
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Andrew Sullivan <andrew(at)libertyrms(dot)info>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tweaking MemSet() performance
Date: 2002-08-29 21:59:51
Message-ID: 20020829175951.M1322@mail.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 29, 2002 at 01:27:41AM -0400, Neil Conway wrote:
>
> Also, if anyone would like to contribute the results of doing the
> benchmark on their particular system, that might provide some useful
> additional data points.

Ok, here's a run on a Sun E450, Solaris 7. I presume your "total"
time label corresponds to my "real" time. That's what I'm including,
anyway.

System Configuration: Sun Microsystems sun4u Sun Enterprise 450 (2
X UltraSPARC-II 400MHz)
System clock frequency: 100 MHz
Memory size: 2560 Megabytes

BUFFER_SIZE = 64
MemSet(): 0m13.343s,12.567s,13.659s
memset(): 0m1.255s,0m1.258s,0m1.254s

BUFFER_SIZE = 128
MemSet(): 0m21.347s,0m21.200s,0m20.541s
memset(): 0m18.041s,0m17.963s,0m17.990s

BUFFER_SIZE = 256
MemSet(): 0m38.023s,0m37.480s,0m37.631s
memset(): 0m25.969s,0m26.047s,0m26.012s

BUFFER_SIZE = 512
MemSet(): 1m9.226s,1m9.901s,1m10.148s
memset(): 2m17.897s,2m18.310s,2m17.984s

BUFFER_SIZE = 1024
MemSet(): 2m13.690s,2m13.981s,2m13.206s
memset(): 4m43.195s,4m43.405s,4m43.390s

. . .at which point I gave up.

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M2P 2A8
+1 416 646 3304 x110


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Andrew Sullivan <andrew(at)libertyrms(dot)info>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tweaking MemSet() performance
Date: 2002-08-29 23:35:13
Message-ID: 200208292335.g7TNZDQ08775@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Sullivan wrote:
> On Thu, Aug 29, 2002 at 01:27:41AM -0400, Neil Conway wrote:
> >
> > Also, if anyone would like to contribute the results of doing the
> > benchmark on their particular system, that might provide some useful
> > additional data points.
>
> Ok, here's a run on a Sun E450, Solaris 7. I presume your "total"
> time label corresponds to my "real" time. That's what I'm including,
> anyway.

Now, these are unusual results. In the 64 case, MemSet is dramatically
slower, and it only starts to win around 512, and seems to speed up
after that.

These are strange results. The idea of MemSet was to prevent the
function call overhead for memset, but in such a case, you would think
the function call overhead would reduce as a percentage of the total
time as the buffer got longer.

In your results it seems to suggest that memset() gets slower for longer
buffer lengths, and a for loop starts to win at longer sizes. Should I
pull out my Solaris kernel source and see what memset() is doing?

---------------------------------------------------------------------------

> System Configuration: Sun Microsystems sun4u Sun Enterprise 450 (2
> X UltraSPARC-II 400MHz)
> System clock frequency: 100 MHz
> Memory size: 2560 Megabytes
>
> BUFFER_SIZE = 64
> MemSet(): 0m13.343s,12.567s,13.659s
> memset(): 0m1.255s,0m1.258s,0m1.254s
>
> BUFFER_SIZE = 128
> MemSet(): 0m21.347s,0m21.200s,0m20.541s
> memset(): 0m18.041s,0m17.963s,0m17.990s
>
> BUFFER_SIZE = 256
> MemSet(): 0m38.023s,0m37.480s,0m37.631s
> memset(): 0m25.969s,0m26.047s,0m26.012s
>
> BUFFER_SIZE = 512
> MemSet(): 1m9.226s,1m9.901s,1m10.148s
> memset(): 2m17.897s,2m18.310s,2m17.984s
>
> BUFFER_SIZE = 1024
> MemSet(): 2m13.690s,2m13.981s,2m13.206s
> memset(): 4m43.195s,4m43.405s,4m43.390s
>
> . . .at which point I gave up.
>
> A
>
> --
> ----
> Andrew Sullivan 204-4141 Yonge Street
> Liberty RMS Toronto, Ontario Canada
> <andrew(at)libertyrms(dot)info> M2P 2A8
> +1 416 646 3304 x110
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Alvaro Herrera <alvherre(at)atentus(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: andrew(at)libertyrms(dot)info, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tweaking MemSet() performance
Date: 2002-08-29 23:53:50
Message-ID: 20020829195350.5b9b0683.alvherre@atentus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

En Thu, 29 Aug 2002 19:35:13 -0400 (EDT)
Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> escribió:

> In your results it seems to suggest that memset() gets slower for longer
> buffer lengths, and a for loop starts to win at longer sizes. Should I
> pull out my Solaris kernel source and see what memset() is doing?

No, because memset() belongs to the libc AFAICS... Do you have source
code for that?

--
Alvaro Herrera (<alvherre[a]atentus.com>)
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, CSLewis)


From: Larry Rosenman <ler(at)lerctr(dot)org>
To: Alvaro Herrera <alvherre(at)atentus(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, andrew(at)libertyrms(dot)info, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tweaking MemSet() performance
Date: 2002-08-29 23:56:20
Message-ID: 1030665380.403.0.camel@lerlaptop.lerctr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2002-08-29 at 18:53, Alvaro Herrera wrote:
> En Thu, 29 Aug 2002 19:35:13 -0400 (EDT)
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> escribió:
>
> > In your results it seems to suggest that memset() gets slower for longer
> > buffer lengths, and a for loop starts to win at longer sizes. Should I
> > pull out my Solaris kernel source and see what memset() is doing?
>
> No, because memset() belongs to the libc AFAICS... Do you have source
> code for that?
and if you do, what vintage is it? I believe Solaris has mucked with
stuff over the last few rev's.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)atentus(dot)com>
Cc: andrew(at)libertyrms(dot)info, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tweaking MemSet() performance
Date: 2002-08-30 00:08:45
Message-ID: 200208300008.g7U08kD10186@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> En Thu, 29 Aug 2002 19:35:13 -0400 (EDT)
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> escribi?:
>
> > In your results it seems to suggest that memset() gets slower for longer
> > buffer lengths, and a for loop starts to win at longer sizes. Should I
> > pull out my Solaris kernel source and see what memset() is doing?
>
> No, because memset() belongs to the libc AFAICS... Do you have source
> code for that?

You bet. I have source code to it all, libs, /bin, etc.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Larry Rosenman <ler(at)lerctr(dot)org>
Cc: Alvaro Herrera <alvherre(at)atentus(dot)com>, andrew(at)libertyrms(dot)info, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tweaking MemSet() performance
Date: 2002-08-30 00:09:33
Message-ID: 200208300009.g7U09XF10228@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Larry Rosenman wrote:
> On Thu, 2002-08-29 at 18:53, Alvaro Herrera wrote:
> > En Thu, 29 Aug 2002 19:35:13 -0400 (EDT)
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> escribi?:
> >
> > > In your results it seems to suggest that memset() gets slower for longer
> > > buffer lengths, and a for loop starts to win at longer sizes. Should I
> > > pull out my Solaris kernel source and see what memset() is doing?
> >
> > No, because memset() belongs to the libc AFAICS... Do you have source
> > code for that?
> and if you do, what vintage is it? I believe Solaris has mucked with
> stuff over the last few rev's.

8.0. Looks like there is interested so I will dig the CD's out of the
the box the moves moved and take a look. Now where is that...

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Larry Rosenman <ler(at)lerctr(dot)org>
Cc: Alvaro Herrera <alvherre(at)atentus(dot)com>, andrew(at)libertyrms(dot)info, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tweaking MemSet() performance
Date: 2002-08-30 01:20:14
Message-ID: 200208300120.g7U1KE712030@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Larry Rosenman wrote:
> On Thu, 2002-08-29 at 18:53, Alvaro Herrera wrote:
> > En Thu, 29 Aug 2002 19:35:13 -0400 (EDT)
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> escribi?:
> >
> > > In your results it seems to suggest that memset() gets slower for longer
> > > buffer lengths, and a for loop starts to win at longer sizes. Should I
> > > pull out my Solaris kernel source and see what memset() is doing?
> >
> > No, because memset() belongs to the libc AFAICS... Do you have source
> > code for that?
> and if you do, what vintage is it? I believe Solaris has mucked with
> stuff over the last few rev's.

OK, I am not permitted to discuss the contents of the source with anyone
except other Solaris source licensees, but I can say that there isn't
anything fancy in the source.

There is nothing that would explain the slowdown of memset >512 bytes
compared to MemSet. All lengths 64, 128, ... use the same algorithm in
the memset code.

I got the source from the now-cancelled Solaris Foundation Source
Program:

http://wwws.sun.com/software/solaris/source/

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Andrew Sullivan <andrew(at)libertyrms(dot)info>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tweaking MemSet() performance
Date: 2002-08-30 03:07:03
Message-ID: 200208300307.g7U373A16220@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Would you please retest this. I have attached my email showing a
simpler test that is less error-prone.

I can't come up with any scenario that would produce what you have
reported. If I look at function call cost, MemSet loop efficiency, and
memset loop efficiency, I can't come up with a combination that produces
what you reported.

The standard assumption is that function call overhead is significant,
and that memset it faster than C MemSet. What compiler are you using?
Is the memset() call being inlined by the compiler? You will have to
look at the assembler code to be sure.

My only guess is that memset is inlined and that it is only moving
single bytes. If that is the case, there is no function call overhead
and it would explain why MemSet gets faster as the buffer gets larger.

---------------------------------------------------------------------------

Andrew Sullivan wrote:
> On Thu, Aug 29, 2002 at 01:27:41AM -0400, Neil Conway wrote:
> >
> > Also, if anyone would like to contribute the results of doing the
> > benchmark on their particular system, that might provide some useful
> > additional data points.
>
> Ok, here's a run on a Sun E450, Solaris 7. I presume your "total"
> time label corresponds to my "real" time. That's what I'm including,
> anyway.
>
> System Configuration: Sun Microsystems sun4u Sun Enterprise 450 (2
> X UltraSPARC-II 400MHz)
> System clock frequency: 100 MHz
> Memory size: 2560 Megabytes
>
> BUFFER_SIZE = 64
> MemSet(): 0m13.343s,12.567s,13.659s
> memset(): 0m1.255s,0m1.258s,0m1.254s
>
> BUFFER_SIZE = 128
> MemSet(): 0m21.347s,0m21.200s,0m20.541s
> memset(): 0m18.041s,0m17.963s,0m17.990s
>
> BUFFER_SIZE = 256
> MemSet(): 0m38.023s,0m37.480s,0m37.631s
> memset(): 0m25.969s,0m26.047s,0m26.012s
>
> BUFFER_SIZE = 512
> MemSet(): 1m9.226s,1m9.901s,1m10.148s
> memset(): 2m17.897s,2m18.310s,2m17.984s
>
> BUFFER_SIZE = 1024
> MemSet(): 2m13.690s,2m13.981s,2m13.206s
> memset(): 4m43.195s,4m43.405s,4m43.390s
>
> . . .at which point I gave up.
>
> A
>
> --
> ----
> Andrew Sullivan 204-4141 Yonge Street
> Liberty RMS Toronto, Ontario Canada
> <andrew(at)libertyrms(dot)info> M2P 2A8
> +1 416 646 3304 x110
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 0 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Andrew Sullivan <andrew(at)libertyrms(dot)info>, Neil Conway <neilc(at)samurai(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tweaking MemSet() performance
Date: 2002-08-30 06:17:44
Message-ID: 27529.1030688264@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Would you please retest this. I have attached my email showing a
> simpler test that is less error-prone.

What did you consider less error-prone, exactly?

Neil's original test considered the case where both the value being
set and the buffer length (second and third args of MemSet) are
compile-time constants. Your test used a compile-time-constant second
arg and a variable third arg. It's obvious from looking at the source
of MemSet that this will make a difference in what an optimizing
compiler can do.

I believe that both cases are interesting in practice in the Postgres
sources, but I have no idea about their relative frequency of
occurrence.

FWIW, I get numbers like the following for the constant-third-arg
scenario, using "gcc -O2" with gcc 2.95.3 on HPUX 10.20, HPPA C180
processor:

bufsize MemSet memset
64 0m1.71s 0m4.89s
128 0m2.51s 0m5.36s
256 0m4.11s 0m7.02s
512 0m7.32s 0m10.31s
1024 0m13.74s 0m16.90s
2048 0m26.58s 0m30.08s
4096 0m52.24s 0m56.43s

So I'd go for a crossover point of *at least* 512. IIRC, I got
similar numbers two years ago that led me to put the comment into
c.h that Neil is reacting to...

regards, tom lane


From: Andrew Sullivan <andrew(at)libertyrms(dot)info>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tweaking MemSet() performance
Date: 2002-08-30 12:01:02
Message-ID: 20020830080102.B1191@mail.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 29, 2002 at 07:35:13PM -0400, Bruce Momjian wrote:
> Andrew Sullivan wrote:
> > On Thu, Aug 29, 2002 at 01:27:41AM -0400, Neil Conway wrote:
> > >
> > > Also, if anyone would like to contribute the results of doing the
> > > benchmark on their particular system, that might provide some useful
> > > additional data points.
> >
> > Ok, here's a run on a Sun E450, Solaris 7. I presume your "total"
> > time label corresponds to my "real" time. That's what I'm including,
> > anyway.
>
>
> Now, these are unusual results. In the 64 case, MemSet is dramatically
> slower, and it only starts to win around 512, and seems to speed up
> after that.
>
> These are strange results. The idea of MemSet was to prevent the

Yes, I was rather surprised, too. In fact, the first couple of runs
I thought I must have made a mistake and compiled with (for instance)
MemSet() instead of memset(). But I triple-checked, and I hadn't.

FWIW, here's an example of what I used to call the compiler:

gcc -O2 -DBUFFER_SIZE=1024 -Ipostgresql-7.2.1/src/include/ memset.c

A
--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M2P 2A8
+1 416 646 3304 x110


From: Andrew Sullivan <andrew(at)libertyrms(dot)info>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tweaking MemSet() performance
Date: 2002-08-30 12:04:14
Message-ID: 20020830080414.C1191@mail.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 29, 2002 at 11:07:03PM -0400, Bruce Momjian wrote:

> and that memset it faster than C MemSet. What compiler are you using?

Sorry. Should have included that.

$gcc --version
2.95.3

> Is the memset() call being inlined by the compiler? You will have to
> look at the assembler code to be sure.

No idea. I can maybe check this out later, but I'll have to ask one
of my colleagues for help. My knowledge of what I am looking at runs
out way before looking at assembler code :(

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M2P 2A8
+1 416 646 3304 x110


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Sullivan <andrew(at)libertyrms(dot)info>, Neil Conway <neilc(at)samurai(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tweaking MemSet() performance
Date: 2002-08-30 14:53:19
Message-ID: 200208301453.g7UErJv00664@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Would you please retest this. I have attached my email showing a
> > simpler test that is less error-prone.
>
> What did you consider less error-prone, exactly?
>
> Neil's original test considered the case where both the value being
> set and the buffer length (second and third args of MemSet) are
> compile-time constants. Your test used a compile-time-constant second
> arg and a variable third arg. It's obvious from looking at the source
> of MemSet that this will make a difference in what an optimizing
> compiler can do.

It was less error-prone because you don't have to recompile for every
constant, though your idea that a non-constant length may effect the
optimizer is possible, though I assumed for >=64, the length would not
be significant to the optimizer.

Should we take it to 1024 as a switchover point? I am low at 512, and
others are higher, so 1024 seems like a good average.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Andrew Sullivan <andrew(at)libertyrms(dot)info>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tweaking MemSet() performance
Date: 2002-08-30 21:15:24
Message-ID: 20020830171524.U10695@mail.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 29, 2002 at 11:07:03PM -0400, Bruce Momjian wrote:
>
> Would you please retest this. I have attached my email showing a
> simpler test that is less error-prone.

Ok, here you go. Same machine as before, 2-way UltraSPARC-II @400
MHz, 2.5 G, gcc 2.95.3, Solaris 7. This gcc compiles 32 bit apps.

MemSet():

*64

real 0m1.298s
user 0m1.290s
sys 0m0.010s
*128

real 0m2.251s
user 0m2.250s
sys 0m0.000s
*256

real 0m3.734s
user 0m3.720s
sys 0m0.010s
*512

real 0m7.041s
user 0m7.010s
sys 0m0.020s
*1024

real 0m13.353s
user 0m13.350s
sys 0m0.000s
*2048

real 0m26.178s
user 0m26.040s
sys 0m0.000s
*4096

real 0m51.838s
user 0m51.670s
sys 0m0.010s

and memset()

*64

real 0m1.469s
user 0m1.460s
sys 0m0.000s
*128

real 0m1.813s
user 0m1.810s
sys 0m0.000s
*256

real 0m2.747s
user 0m2.730s
sys 0m0.010s
*512

real 0m12.478s
user 0m12.370s
sys 0m0.010s
*1024

real 0m26.128s
user 0m26.010s
sys 0m0.000s
*2048

real 0m57.663s
user 0m57.320s
sys 0m0.010s
*4096

real 1m53.772s
user 1m53.290s
sys 0m0.000s

A

--
----
Andrew Sullivan 204-4141 Yonge Street
Liberty RMS Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M2P 2A8
+1 416 646 3304 x110


From: Ashley Cambrell <ash(at)freaky-namuh(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tweaking MemSet() performance
Date: 2002-08-30 23:31:25
Message-ID: 3D70004D.9010201@freaky-namuh.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Sullivan wrote:

>On Thu, Aug 29, 2002 at 01:27:41AM -0400, Neil Conway wrote:
>
>
>>Also, if anyone would like to contribute the results of doing the
>>benchmark on their particular system, that might provide some useful
>>additional data points.
>>
>>
Linux 2.4.18 (preempt) gcc 2.95.4 Dual Althon 1600XP 1Gb DDR

memset() *64 2.999s
MemSet() *64 3.640s

memset() *128 4.211s
MemSet() *128 5.933s

memset() *256 6.624s
MemSet()*256 14.889s

memset() *512 11.182s
MemSet()*512 28.583s

memset() *1024 20.288s
MemSet()*1024 55.853s

memset() *2048 38.513s
MemSet()*2048 1m50.555s

memset()*4096 1m15.010s
MemSet()*4096 3m40.381s

Linux 2.4.16 gcc 2.95.4 Dual Celeron 400 512Mb PC66

memset() *64 15.618s
MemSet() *64 12.864s

memset() *128 24.524s
MemSet() *128 21.852s

memset() *256 53.963s
MemSet() *256 52.012s

memset() *512 1m31.232s
MemSet() *512 1m39.445s

memset() *1024 2m44.609s
MemSet() *1024 3m14.567s

memset() *2048 5m12.630s
MemSet() *2048 6m24.916s

memset() *4096 10m8.183s
MemSet() *4096 12m43.830s

Ashley Cambrell


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Andrew Sullivan <andrew(at)libertyrms(dot)info>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tweaking MemSet() performance
Date: 2002-09-01 23:41:12
Message-ID: 200209012341.g81NfCp01026@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


OK, seems we have wildly different values for MemSet for different
machines. I am going to up the MEMSET_LOOP_LIMIT value to 1024 because
it seems to be the best value on most machines. We can revisit this in
7.4.

I wonder if a configure test is going to be required for this
evenutally. I think random page size needs the same handling.

Maybe I should add to TODO:

o compute optimal MEMSET_LOOP_LIMIT value via configure.

Is there a significant benefit? Can someone run some query with MemSet
vs. memset and see a timing difference? You can use the new GUC param
log_duration I just committed.

Remember, I added MemSet to eliminate the function call overhead, but at
this point, we are now seeing that MemSet beats memset() for ordinary
memory setting, and function call overhead isn't even an issue with the
larger buffer sizes.

---------------------------------------------------------------------------

Andrew Sullivan wrote:
> On Thu, Aug 29, 2002 at 11:07:03PM -0400, Bruce Momjian wrote:
> >
> > Would you please retest this. I have attached my email showing a
> > simpler test that is less error-prone.
>
> Ok, here you go. Same machine as before, 2-way UltraSPARC-II @400
> MHz, 2.5 G, gcc 2.95.3, Solaris 7. This gcc compiles 32 bit apps.
>
> MemSet():
>
> *64
>
> real 0m1.298s
> user 0m1.290s
> sys 0m0.010s
> *128
>
> real 0m2.251s
> user 0m2.250s
> sys 0m0.000s
> *256
>
> real 0m3.734s
> user 0m3.720s
> sys 0m0.010s
> *512
>
> real 0m7.041s
> user 0m7.010s
> sys 0m0.020s
> *1024
>
> real 0m13.353s
> user 0m13.350s
> sys 0m0.000s
> *2048
>
> real 0m26.178s
> user 0m26.040s
> sys 0m0.000s
> *4096
>
> real 0m51.838s
> user 0m51.670s
> sys 0m0.010s
>
> and memset()
>
> *64
>
> real 0m1.469s
> user 0m1.460s
> sys 0m0.000s
> *128
>
> real 0m1.813s
> user 0m1.810s
> sys 0m0.000s
> *256
>
> real 0m2.747s
> user 0m2.730s
> sys 0m0.010s
> *512
>
> real 0m12.478s
> user 0m12.370s
> sys 0m0.010s
> *1024
>
> real 0m26.128s
> user 0m26.010s
> sys 0m0.000s
> *2048
>
> real 0m57.663s
> user 0m57.320s
> sys 0m0.010s
> *4096
>
> real 1m53.772s
> user 1m53.290s
> sys 0m0.000s
>
> A
>
> --
> ----
> Andrew Sullivan 204-4141 Yonge Street
> Liberty RMS Toronto, Ontario Canada
> <andrew(at)libertyrms(dot)info> M2P 2A8
> +1 416 646 3304 x110
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073