Re: MemSet inline for newNode

Lists: pgsql-hackerspgsql-patches
From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: MemSet inline for newNode
Date: 2002-11-10 02:18:15
Message-ID: 200211100218.gAA2IG901542@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I have applied this patch to inline MemSet for newNode.

I will make another commit to make more general use of palloc0.

This was already discussed on hackers.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 4.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: MemSet inline for newNode
Date: 2002-11-10 14:01:06
Message-ID: 8937.1036936866@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> I have applied this patch to inline MemSet for newNode.
> I will make another commit to make more general use of palloc0.

Refresh my memory on why either of these was a good idea?

> This was already discussed on hackers.

And this was not the approach agreed to, IIRC. What you've done has
eliminated the possibility of optimizing away the controlling tests
in MemSet.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: MemSet inline for newNode
Date: 2002-11-10 20:04:23
Message-ID: 200211102004.gAAK4NV03862@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > I have applied this patch to inline MemSet for newNode.
> > I will make another commit to make more general use of palloc0.
>
> Refresh my memory on why either of these was a good idea?

You are talking about the use of palloc0 in place of palloc/MemSet(0),
not the use of palloc0 in newNode, right?

> > This was already discussed on hackers.
>
> And this was not the approach agreed to, IIRC. What you've done has
> eliminated the possibility of optimizing away the controlling tests
> in MemSet.

I see now:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=200210120006.g9C06i502964%40candle.pha.pa.us

I thought someone had tested that there was little/no performance
difference between these two statements:

MemSet(ptr, 0, 256)

vs.

i = 256;
MemSet(ptr, 0, i)

However, seeing no email reply, I now assume no test was done.

Neil, can you run such a test and let us know. It has to be with a
compiler that optimizes out the MemSet length test when the len is a
constant. I can't remember what compiler version/settings that was, but
it may have been -O3 gcc 3.20.

I can back out my changes, but it would be easier to see if there is a
benefit before doing that. Personally, I am going to be surprised if a
single conditional tests in MemSet (which is not in the assignment loop)
will make any measurable difference.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [PATCHES] MemSet inline for newNode
Date: 2002-11-10 21:06:44
Message-ID: 200211102106.gAAL6ir09610@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


OK, my compiler, gcc 2.95 does the optimization at -O2, so I ran the
tests here with the attached program, testing a constant of 256 and a
variable 'j' equal to 256.

I found a 1.7% slowdown with the variable compared to the constant.
The 256 value is rather large. For a length of 32, the difference was
14%!

So, seems I should back out the palloc0 usage and let the MemSet inline
at each location. Does that seems like the direction to head?

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > I have applied this patch to inline MemSet for newNode.
> > > I will make another commit to make more general use of palloc0.
> >
> > Refresh my memory on why either of these was a good idea?
>
> You are talking about the use of palloc0 in place of palloc/MemSet(0),
> not the use of palloc0 in newNode, right?
>
> > > This was already discussed on hackers.
> >
> > And this was not the approach agreed to, IIRC. What you've done has
> > eliminated the possibility of optimizing away the controlling tests
> > in MemSet.
>
> I see now:
>
> http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=200210120006.g9C06i502964%40candle.pha.pa.us
>
> I thought someone had tested that there was little/no performance
> difference between these two statements:
>
> MemSet(ptr, 0, 256)
>
> vs.
>
> i = 256;
> MemSet(ptr, 0, i)
>
> However, seeing no email reply, I now assume no test was done.
>
> Neil, can you run such a test and let us know. It has to be with a
> compiler that optimizes out the MemSet length test when the len is a
> constant. I can't remember what compiler version/settings that was, but
> it may have been -O3 gcc 3.20.
>
> I can back out my changes, but it would be easier to see if there is a
> benefit before doing that. Personally, I am going to be surprised if a
> single conditional tests in MemSet (which is not in the assignment loop)
> will make any measurable difference.
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 1.2 KB

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] MemSet inline for newNode
Date: 2002-11-10 23:07:34
Message-ID: 200211102307.gAAN7Y611251@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Further update:

I ran a test and compared the sizes of the postgres binary: 2926925
with palloc0, 2931293 without, a 0.14% decrease. Not very much,
considering the 1-14% speedup of MemSet with a constant.

Tom, I really didn't think those conditional tests would be significant,
but clearly they are --- call me surprised.

Back out palloc0()?

---------------------------------------------------------------------------

Bruce Momjian wrote:
>
> OK, my compiler, gcc 2.95 does the optimization at -O2, so I ran the
> tests here with the attached program, testing a constant of 256 and a
> variable 'j' equal to 256.
>
> I found a 1.7% slowdown with the variable compared to the constant.
> The 256 value is rather large. For a length of 32, the difference was
> 14%!
>
> So, seems I should back out the palloc0 usage and let the MemSet inline
> at each location. Does that seems like the direction to head?
>
> ---------------------------------------------------------------------------
>
> Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > > I have applied this patch to inline MemSet for newNode.
> > > > I will make another commit to make more general use of palloc0.
> > >
> > > Refresh my memory on why either of these was a good idea?
> >
> > You are talking about the use of palloc0 in place of palloc/MemSet(0),
> > not the use of palloc0 in newNode, right?
> >
> > > > This was already discussed on hackers.
> > >
> > > And this was not the approach agreed to, IIRC. What you've done has
> > > eliminated the possibility of optimizing away the controlling tests
> > > in MemSet.
> >
> > I see now:
> >
> > http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=200210120006.g9C06i502964%40candle.pha.pa.us
> >
> > I thought someone had tested that there was little/no performance
> > difference between these two statements:
> >
> > MemSet(ptr, 0, 256)
> >
> > vs.
> >
> > i = 256;
> > MemSet(ptr, 0, i)
> >
> > However, seeing no email reply, I now assume no test was done.
> >
> > Neil, can you run such a test and let us know. It has to be with a
> > compiler that optimizes out the MemSet length test when the len is a
> > constant. I can't remember what compiler version/settings that was, but
> > it may have been -O3 gcc 3.20.
> >
> > I can back out my changes, but it would be easier to see if there is a
> > benefit before doing that. Personally, I am going to be surprised if a
> > single conditional tests in MemSet (which is not in the assignment loop)
> > will make any measurable difference.
> >
> > --
> > Bruce Momjian | http://candle.pha.pa.us
> > pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> > + If your life is a hard drive, | 13 Roberts Road
> > + Christ can be your backup. | Newtown Square, Pennsylvania 19073
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
> >
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania 19073

> #define INT_ALIGN_MASK (sizeof(int) - 1)
> #include <stdlib.h>
>
> /*
> * MemSet
> * Exactly the same as standard library function memset(), but considerably
> * faster for zeroing small word-aligned structures (such as parsetree nodes).
> * This has to be a macro because the main point is to avoid function-call
> * overhead. However, we have also found that the loop is faster than
> * native libc memset() on some platforms, even those with assembler
> * memset() functions. More research needs to be done, perhaps with
> * platform-specific MEMSET_LOOP_LIMIT values or tests in configure.
> *
> * bjm 2002-10-08
> */
> #define MemSet(start, val, len) \
> do \
> { \
> int * _start = (int *) (start); \
> int _val = (val); \
> size_t _len = (len); \
> \
> if ((((long) _start) & INT_ALIGN_MASK) == 0 && \
> (_len & INT_ALIGN_MASK) == 0 && \
> _val == 0 && \
> _len <= MEMSET_LOOP_LIMIT) \
> { \
> int * _stop = (int *) ((char *) _start + _len); \
> while (_start < _stop) \
> *_start++ = 0; \
> } \
> else \
> memset((char *) _start, _val, _len); \
> } while (0)
>
> #define MEMSET_LOOP_LIMIT 1024
>
> int
> main(int argc, char **argv)
> {
> int i,j;
> char *buf;
>
> buf = malloc(256);
>
> j = 32;
> for (i = 0; i < 1000000; i++)
> MemSet(buf, 0, j /* choose 256 or j here */);
> return 0;
> }

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: MemSet inline for newNode
Date: 2002-11-11 02:50:03
Message-ID: 23217.1036983003@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> I thought someone had tested that there was little/no performance
> difference between these two statements:
> MemSet(ptr, 0, 256)
> vs.
> i = 256;
> MemSet(ptr, 0, i)

But with any reasonably smart compiler, those *will* be the same because
the compiler can do constant-folding (ie, it knows i is 256 when control
reaches the MemSet). Pushing the MemSet into MemoryContextAlloc
eliminates the possibility of knowing the size argument's value.

> I can back out my changes, but it would be easier to see if there is a
> benefit before doing that. Personally, I am going to be surprised if a
> single conditional tests in MemSet (which is not in the assignment loop)
> will make any measurable difference.

Then why were we bothering? IIRC, this was being sold as a performance
improvement.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: MemSet inline for newNode
Date: 2002-11-11 02:58:07
Message-ID: 200211110258.gAB2w7C03559@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > I thought someone had tested that there was little/no performance
> > difference between these two statements:
> > MemSet(ptr, 0, 256)
> > vs.
> > i = 256;
> > MemSet(ptr, 0, i)
>
> But with any reasonably smart compiler, those *will* be the same because
> the compiler can do constant-folding (ie, it knows i is 256 when control
> reaches the MemSet). Pushing the MemSet into MemoryContextAlloc
> eliminates the possibility of knowing the size argument's value.

Yes, however, I am not seeing that getting optimized with gcc 2.95 -O2.

> > I can back out my changes, but it would be easier to see if there is a
> > benefit before doing that. Personally, I am going to be surprised if a
> > single conditional tests in MemSet (which is not in the assignment loop)
> > will make any measurable difference.
>
> Then why were we bothering? IIRC, this was being sold as a performance
> improvement.

Well, the palloc0 use by newNode was a performance boost, as tested by
Neil Conway. Without palloc0, there was no way to inline newNode. He
found in his tests that the palloc0 version of newNode was as fast as
his inline newNode, and it didn't cause the same code bloat.

The palloc0's used elsewhere in the code was merely for code clarity
and to reduce code bloat caused by MemSet in a few cases. I will back
it out while I do some more tests.

One thing I am concerned about is that newNode _isn't_ making use of a
constant len for MemSet, but doing it inline was causing too much bloat.

I made a new version that did the alignment test of start and len in one
pass:

if ((( ((long) _start) | _len) & INT_ALIGN_MASK) == 0 && \
_val == 0 && \
_len <= MEMSET_LOOP_LIMIT) \

I have found that the last test is the one that slows it down.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: MemSet inline for newNode
Date: 2002-11-11 03:10:32
Message-ID: 23463.1036984232@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom Lane wrote:
>> But with any reasonably smart compiler, those *will* be the same because
>> the compiler can do constant-folding (ie, it knows i is 256 when control
>> reaches the MemSet). Pushing the MemSet into MemoryContextAlloc
>> eliminates the possibility of knowing the size argument's value.

> Yes, however, I am not seeing that getting optimized with gcc 2.95 -O2.

You aren't? I just checked gcc 2.95.3 on HPPA; it definitely propagates
the constant at -O2.

> Well, the palloc0 use by newNode was a performance boost, as tested by
> Neil Conway. Without palloc0, there was no way to inline newNode.

But your patch as committed does NOT inline newNode, in fact it does the
exact opposite: the MemSet macro is removed from the callsite.

> I made a new version that did the alignment test of start and len in one
> pass:

I thought the point of this exercise was to eliminate the alignment test
altogether, by making it doable at compile time. IIRC, the upshot of
the prior discussion was to keep the MemSet calls at the call site and
to eliminate the alignment test on the pointer by special-casing palloc0
(essentially, wiring into the macro the assumption that palloc's
returned pointer will be suitably aligned).

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: MemSet inline for newNode
Date: 2002-11-11 03:36:13
Message-ID: 200211110336.gAB3aDw10925@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom Lane wrote:
> >> But with any reasonably smart compiler, those *will* be the same because
> >> the compiler can do constant-folding (ie, it knows i is 256 when control
> >> reaches the MemSet). Pushing the MemSet into MemoryContextAlloc
> >> eliminates the possibility of knowing the size argument's value.
>
> > Yes, however, I am not seeing that getting optimized with gcc 2.95 -O2.
>
> You aren't? I just checked gcc 2.95.3 on HPPA; it definitely propagates
> the constant at -O2.

I am definately seeing different timings using the constant and the
variable in the test.

>
> > Well, the palloc0 use by newNode was a performance boost, as tested by
> > Neil Conway. Without palloc0, there was no way to inline newNode.
>
> But your patch as committed does NOT inline newNode, in fact it does the
> exact opposite: the MemSet macro is removed from the callsite.

Yes, there were actually two patches; the first inlined newNode by
calling palloc0. Without palloc, there was no way to inline newNode.

The second was a more general one to merge palloc/MemSet into a single
palloc0 call. That has been backed out while I research it.

>
> > I made a new version that did the alignment test of start and len in one
> > pass:
>
> I thought the point of this exercise was to eliminate the alignment test
> altogether, by making it doable at compile time. IIRC, the upshot of
> the prior discussion was to keep the MemSet calls at the call site and
> to eliminate the alignment test on the pointer by special-casing palloc0
> (essentially, wiring into the macro the assumption that palloc's
> returned pointer will be suitably aligned).

I didn't assume we could special case palloc0 by knowing it was going to
be properly aligned and be of the proper length (multiple if int).

I have a new idea. Right now we have:

#define palloc0(sz) MemoryContextAllocZero(CurrentMemoryContext, (sz))

My new idea is to add a third boolean argument to
MemoryContextAllocZero() which will control whether the MemSet
assignment loop is used, or memset(). My idea is to use "? :" to test
for the proper value in the macro. Basically, this will decouple the
MemSet test and the MemSet loop, allowing the optimizer to process the
length at the call site, but allow palloc0 to perform the assignment
loop. I need palloc0 because MemSet can't return a pointer.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: MemSet inline for newNode
Date: 2002-11-11 03:47:02
Message-ID: 23824.1036986422@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>> But your patch as committed does NOT inline newNode, in fact it does the
>> exact opposite: the MemSet macro is removed from the callsite.

> Yes, there were actually two patches; the first inlined newNode by
> calling palloc0. Without palloc, there was no way to inline newNode.

But you have *not* inlined newNode: the memset is still done at a location
that cannot know the size at compile time.

> The second was a more general one to merge palloc/MemSet into a single
> palloc0 call. That has been backed out while I research it.

That part could be sold just on the basis of making the code easier
to read and less error-prone; particularly for call sites where the
length is a runtime computation anyway. I think that newNode() is the
principal case where the length is knowable at compile time. So my
feeling is that the general change to invent a palloc0() call is fine
... but if you want performance then newNode() should *not* be using the
generic palloc0() call.

> My new idea is to add a third boolean argument to
> MemoryContextAllocZero() which will control whether the MemSet
> assignment loop is used, or memset().

But we are *trying to eliminate any runtime test whatever*. Short
of that, you aren't going to get the speedup. Certainly passing a third
argument to the alloc subroutine will eat enough cycles to negate any
savings from simplifying the runtime test slightly.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: MemSet inline for newNode
Date: 2002-11-11 04:50:28
Message-ID: 200211110450.gAB4oSW29177@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> >> But your patch as committed does NOT inline newNode, in fact it does the
> >> exact opposite: the MemSet macro is removed from the callsite.
>
> > Yes, there were actually two patches; the first inlined newNode by
> > calling palloc0. Without palloc, there was no way to inline newNode.
>
> But you have *not* inlined newNode: the memset is still done at a location
> that cannot know the size at compile time.

I am sorry. I inlined newNode, but by pushing MemSet into palloc0, the
MemSet conditional tests can not be optimized away. Is that clearer?

#define newNode(size, tag) \
( \
AssertMacro((size) >= sizeof(Node)), /* need the tag, at least */ \
\
newNodeMacroHolder = (Node *) palloc0(size), \
newNodeMacroHolder->type = (tag), \
newNodeMacroHolder \
)

In this case, I needed palloc0 to make newNode inlined, and the old
function version of newNode couldn't optimize away the conditional test
either because it was called with various sizes.

> > The second was a more general one to merge palloc/MemSet into a single
> > palloc0 call. That has been backed out while I research it.
>
> That part could be sold just on the basis of making the code easier
> to read and less error-prone; particularly for call sites where the
> length is a runtime computation anyway. I think that newNode() is the

Yes, that was my thought.

> principal case where the length is knowable at compile time. So my
> feeling is that the general change to invent a palloc0() call is fine
> ... but if you want performance then newNode() should *not* be using the
> generic palloc0() call.

OK. I am still struggling about how to do that.

> > My new idea is to add a third boolean argument to
> > MemoryContextAllocZero() which will control whether the MemSet
> > assignment loop is used, or memset().
>
> But we are *trying to eliminate any runtime test whatever*. Short
> of that, you aren't going to get the speedup. Certainly passing a third
> argument to the alloc subroutine will eat enough cycles to negate any
> savings from simplifying the runtime test slightly.

My idea is that the conditional tests will be optimized away before the
palloc0 call is made.

Attached is a patch that implements the MemSet split and allows the size
tests to be optimized away _before_ at the call location. It isn't done
yet, but you can get the idea.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 3.9 KB

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: MemSet inline for newNode
Date: 2002-11-11 16:06:40
Message-ID: 200211111606.gABG6eS29372@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


OK, here is a newer version of the macro. I tested it on
catalog/index.c and saw the optimizer remove the MemSetTest conditional
for the newNode/palloc0 call in that file:

pushl $1 <---------
pushl $108
movl CurrentMemoryContext,%eax
pushl %eax
call MemoryContextAllocZero

I still test the palloc pointer alignment in MemSetLoop because that is
not a constant.

I just ran a test and now see a 7% difference for the following test:

j = 32;
for (i = 0; i < 1000000; i++)
MemSet(buf, 0, j /* choose 32 or j here */);
return 0;

It used to be a 14% difference. However, this looks more like an
optimizer problem than actual coding and with the new macros, MemSet
will be called with a constant third parameter anyway in almost all
cases, including newNode and the merge of palloc/MemSet into palloc0
patch.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > >> But your patch as committed does NOT inline newNode, in fact it does the
> > >> exact opposite: the MemSet macro is removed from the callsite.
> >
> > > Yes, there were actually two patches; the first inlined newNode by
> > > calling palloc0. Without palloc, there was no way to inline newNode.
> >
> > But you have *not* inlined newNode: the memset is still done at a location
> > that cannot know the size at compile time.
>
> I am sorry. I inlined newNode, but by pushing MemSet into palloc0, the
> MemSet conditional tests can not be optimized away. Is that clearer?
>
> #define newNode(size, tag) \
> ( \
> AssertMacro((size) >= sizeof(Node)), /* need the tag, at least */ \
> \
> newNodeMacroHolder = (Node *) palloc0(size), \
> newNodeMacroHolder->type = (tag), \
> newNodeMacroHolder \
> )
>
> In this case, I needed palloc0 to make newNode inlined, and the old
> function version of newNode couldn't optimize away the conditional test
> either because it was called with various sizes.
>
>
> > > The second was a more general one to merge palloc/MemSet into a single
> > > palloc0 call. That has been backed out while I research it.
> >
> > That part could be sold just on the basis of making the code easier
> > to read and less error-prone; particularly for call sites where the
> > length is a runtime computation anyway. I think that newNode() is the
>
> Yes, that was my thought.
>
> > principal case where the length is knowable at compile time. So my
> > feeling is that the general change to invent a palloc0() call is fine
> > ... but if you want performance then newNode() should *not* be using the
> > generic palloc0() call.
>
> OK. I am still struggling about how to do that.
>
> > > My new idea is to add a third boolean argument to
> > > MemoryContextAllocZero() which will control whether the MemSet
> > > assignment loop is used, or memset().
> >
> > But we are *trying to eliminate any runtime test whatever*. Short
> > of that, you aren't going to get the speedup. Certainly passing a third
> > argument to the alloc subroutine will eat enough cycles to negate any
> > savings from simplifying the runtime test slightly.
>
> My idea is that the conditional tests will be optimized away before the
> palloc0 call is made.
>
> Attached is a patch that implements the MemSet split and allows the size
> tests to be optimized away _before_ at the call location. It isn't done
> yet, but you can get the idea.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 4.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: MemSet inline for newNode
Date: 2002-11-11 16:23:44
Message-ID: 2301.1037031824@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> I still test the palloc pointer alignment in MemSetLoop because that is
> not a constant.

But it's known aligned when you just got it from palloc.

Why don't you simply implement what was agreed to in the original
thread, namely:

* provide a MemSetAligned macro that is just like the standard
one except it omits the pointer alignment test

* provide a palloc0 macro that does MemSetAligned inside the
macro; known safe because palloc returns a maxaligned pointer

* use palloc0 in the newNode macro

The approach you are taking is messier *and* slower than this.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: MemSet inline for newNode
Date: 2002-11-11 16:40:02
Message-ID: 200211111640.gABGe2U02117@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > I still test the palloc pointer alignment in MemSetLoop because that is
> > not a constant.
>
> But it's known aligned when you just got it from palloc.
>
> Why don't you simply implement what was agreed to in the original
> thread, namely:
>
> * provide a MemSetAligned macro that is just like the standard
> one except it omits the pointer alignment test
>
> * provide a palloc0 macro that does MemSetAligned inside the
> macro; known safe because palloc returns a maxaligned pointer

I can't do MemSet in a macro that returns a value, as palloc requires.
MemSet has a loop, and that can't be done in a macro that returns a value.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: MemSet inline for newNode
Date: 2002-11-11 18:08:30
Message-ID: 3167.1037038110@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> I can't do MemSet in a macro that returns a value, as palloc requires.
> MemSet has a loop, and that can't be done in a macro that returns a value.

Hm. How did Neil test this originally --- was he relying on being able
to "inline" newNode()?

Anyway, I don't think that passing an extra parameter can be a win.
If there has to be a runtime test, testing whether the two low bits
of the length are zero is probably about the same speed as testing a
boolean parameter. It's unlikely to be enough slower to justify the
cost of passing another parameter.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: MemSet inline for newNode
Date: 2002-11-11 19:18:15
Message-ID: 200211111918.gABJIFn21757@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > I can't do MemSet in a macro that returns a value, as palloc requires.
> > MemSet has a loop, and that can't be done in a macro that returns a value.
>
> Hm. How did Neil test this originally --- was he relying on being able
> to "inline" newNode()?

Yes.

> Anyway, I don't think that passing an extra parameter can be a win.
> If there has to be a runtime test, testing whether the two low bits
> of the length are zero is probably about the same speed as testing a
> boolean parameter. It's unlikely to be enough slower to justify the
> cost of passing another parameter.

OK, new version attached, with extra parameter removed.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 4.6 KB

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: MemSet inline for newNode
Date: 2002-11-12 01:17:58
Message-ID: 200211120117.gAC1Hwg13084@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


I just ran a test on catalog/index.c and found that with the new patch,
the conditional test is completely gone. It now calls the palloc0
memory allocator unconditionally because all the conditions are met by
constants:

BuildIndexInfo:
pushl %ebp
movl %esp,%ebp
subl $32,%esp
pushl %esi
pushl %ebx
addl $-8,%esp
pushl $108
movl CurrentMemoryContext,%eax
pushl %eax
call MemoryContextAllocPalloc0

and, in MemoryContextAllocPalloc0, the inline MemSet loop is used
unconditionally.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > I can't do MemSet in a macro that returns a value, as palloc requires.
> > > MemSet has a loop, and that can't be done in a macro that returns a value.
> >
> > Hm. How did Neil test this originally --- was he relying on being able
> > to "inline" newNode()?
>
> Yes.
>
> > Anyway, I don't think that passing an extra parameter can be a win.
> > If there has to be a runtime test, testing whether the two low bits
> > of the length are zero is probably about the same speed as testing a
> > boolean parameter. It's unlikely to be enough slower to justify the
> > cost of passing another parameter.
>
> OK, new version attached, with extra parameter removed.
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073