Re: strict aliasing

From: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Florian Weimer <fweimer(at)bfk(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <munro(at)ip9(dot)org>, Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: strict aliasing
Date: 2011-11-16 00:14:36
Message-ID: CA+CSw_s=Pe3CiK-tfD9fpmVSBorCiV64WeRcma3dw6ZYnMv1CA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 15, 2011 at 9:02 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> From my reading, it appears that if we get safe code in terms of
> strict aliasing, we might be able to use the "restrict" keyword to
> get further optimizations which bring it to a net win, but I think
> there is currently lower-hanging fruit than monkeying with these
> compiler options.  I'm letting this go, although I still favor the
> const-ifying which started this discussion, on the grounds of API
> clarity.

Speaking of lower-hanging fruit...

I ran a series of tests to see how different optimization flags
affect performance. I was particularly interested in what effect
link time optimization has. The results are somewhat interesting.

Benchmark machine is my laptop, Intel Core i5 M 540 @ 2.53GHz.
2 cores + hyperthreading for a total of 4 threads. Ubuntu 11.10.
Compiled with GCC 4.6.1-9ubuntu3.

I ran pgbench read only test with scale factor 10, default
options except for shared_buffers = 256MB. The dataset fits fully
in shared buffers.

I tried following configurations:
default: plain old ./configure; make; make install
-O3: what it says on the label
lto: CFLAGS="-O3 -flto" This should do some global optimizations
at link time.
PGO: compiled with CFLAGS="-O3 -fprofile-generate", then ran
pgbench -T 30 on a scalefactor 100 database (IO bound rw load
to mix the profile up a bit). Then did
# sed -i s/-fprofile-generate/-fprofile-use/ src/Makefile.global
and recompiled and installed.
lto + PGO: same as previous, but with added -flto.

Median tps of 3 5 minute runs at different concurrency levels:

-c default -O3 lto PGO lto + PGO
==================================================
1 6753.40 6689.76 6498.37 6614.73 5918.65
2 11600.87 11659.33 12074.63 12957.81 13353.54
4 18852.86 18918.32 19008.89 20006.49 20652.93
8 15232.30 15762.70 14568.06 15880.19 16091.24
16 15693.93 15625.87 16563.91 17088.28 18223.02

Percentage increase from default flags:

-c default -O3 lto PGO lto + PGO
==================================================
1 6753.40 -0.94% -3.78% -2.05% -12.36%
2 11600.87 0.50% 4.08% 11.70% 15.11%
4 18852.86 0.35% 0.83% 6.12% 9.55%
8 15232.30 3.48% -4.36% 4.25% 5.64%
16 15693.93 -0.43% 5.54% 8.88% 16.12%

Concurrency 8 results should probably be ignored - variance was huge,
definitely more than the differences. For other results, variance was
~1%.

I don't know what to make of the single client results, why they seem
to be going in the opposite direction of all other results. Other than
that both profile guided optimization and link time optimization give
a pretty respectable boost. If anyone can suggest some more diverse
workloads to test, I could try to see if the PGO results persist when
profiling and benchmark loads differ more. These results suggest that
giving the compiler information about hot and cold paths results in a
significant improvement in generated code quality.

--
Ants Aasma

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2011-11-16 00:18:07 Re: Group Commit
Previous Message Robert Haas 2011-11-15 23:57:51 Re: ISN was: Core Extensions relocation