Re: bad estimation together with large work_mem generates terrible slow hash joins

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bad estimation together with large work_mem generates terrible slow hash joins
Date: 2014-10-02 00:20:44
Message-ID: 1412209244.98078.YahooMailNeo@web122301.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> On 12.9.2014 23:22, Robert Haas wrote:

>> My first thought is to revert to NTUP_PER_BUCKET=1, but it's
>> certainly arguable. Either method, though, figures to be better than
>> doing nothing, so let's do something.
>
> OK, but can we commit the remaining part first? Because it'll certainly
> interact with this proposed part, and it's easier to tweak when the code
> is already there. Instead of rebasing over and over.

The patch applied and built without problem, and pass `make
check-world`. The only thing that caught my eye was the addition
of debug code using printf() instead of logging at a DEBUG level.
Is there any reason for that?

I still need to work through the (rather voluminous) email threads
and the code to be totally comfortable with all the issues, but
if Robert and Heikki are comfortable with it, I'm not gonna argue.

Preliminary benchmarks look outstanding! I tried to control
carefully to ensure consistent results (by dropping, recreating,
vacuuming, analyzing, and checkpointing before each test), but
there were surprising outliers in the unpatched version. It turned
out that it was because slight differences in the random samples
caused different numbers of buckets for both unpatched and patched,
but the patched version dealt with the difference gracefully while
the unpatched version's performance fluctuated randomly.

My thinking is that we should get this much committed and then
discuss further optimizations. I tried to throw something at it
that would be something of a "worst case" because with the new code
it would start with one batch and go to two.

Five runs per test, alternating between unpatched and patched.

First I tried with the default work_mem size of 4MB:

Planning time / Execution time (ms)

unpatched, work_mem = 4MB

0.694 / 10392.930
0.327 / 10388.787
0.412 / 10533.036
0.650 / 17186.719
0.338 / 10707.423

Fast: Buckets: 2048 Batches: 1 Memory Usage: 3516kB
Slow: Buckets: 1024 Batches: 1 Memory Usage: 3516kB

patched, work_mem = 4MB

0.764 / 5021.792
0.655 / 4991.887
0.436 / 5068.777
0.410 / 5057.902
0.444 / 5021.543

1st & 5th: Buckets: 131072 (originally 1024) Batches: 2 (originally 1) Memory Usage: 3073kB
all others: Buckets: 131072 (originally 2048) Batches: 2 (originally 1) Memory Usage: 3073kB

Then, just to see how both dealt with extra memory I did it again with 1GB:

unpatched, work_mem = 1GB

0.328 / 10407.836
0.319 / 10465.348
0.324 / 16606.948
0.569 / 10408.671
0.542 / 16996.209

Memory usage same as before. Guess which runs started with 1024 buckets. ;-)

0.556 / 3974.741
0.325 / 4012.613
0.334 / 3941.734
0.834 / 3894.391
0.603 / 3959.440

2nd & 4th: Buckets: 131072 (originally 2048) Batches: 1 (originally 1) Memory Usage: 4540kB
all others: Buckets: 131072 (originally 1024) Batches: 1 (originally 1) Memory Usage: 4540kB

My only concern from the benchmarks is that it seemed like there
was a statistically significant increase in planning time:

unpatched plan time average: 0.450 ms
patched plan time average: 0.536 ms

That *might* just be noise, but it seems likely to be real. For
the improvement in run time, I'd put up with an extra 86us in
planning time per hash join; but if there's any way to shave some
of that off, all the better.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-10-02 00:42:46 Re: superuser() shortcuts
Previous Message Peter Geoghegan 2014-10-01 23:52:25 UPSERT wiki page, and SQL MERGE syntax