From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Speedup to our barrier code |
Date: | 2018-10-11 17:32:23 |
Message-ID: | 20181011173223.z4ebjo2n27c7yxqe@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
This is more a note for the future, than something I'm planning to
pursue right now. Turns out our x86 full memory barrier could be sped
up noticably on larger systems with a trivial change.
Just changing
__asm__ __volatile__ ("lock; addl $0,0(%%rsp)" : : : "memory", "cc")
into something like
__asm__ __volatile__ ("lock; addl $0,-8(%%rsp)" : : : "memory", "cc")
makes the barrier faster because there's no dependency slowing down
other uses of %rsp. Which are, rsp being the stack pointer, not rare.
Obviously that requires to have a few bytes below the stack pointer, but
I can't see that ever being a problem on x86.
More details, among others, is available at:
https://shipilev.net/blog/2014/on-the-fence-with-dependencies/
In the past we'd run into barriers being relevant for performance both
around the latch code and shm_mq. So it's probably worth trying the
above in a benchmark exercising either heavily.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-10-11 17:41:57 | Re: TupleTableSlot abstraction |
Previous Message | Tom Lane | 2018-10-11 17:19:11 | Re: Soon-to-be-broken regression test case |