Re: 9.4 regression

From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>
Subject: Re: 9.4 regression
Date: 2013-08-09 03:58:42
Message-ID: CAKuK5J0UUQh25HcK6cGgXi2kCdFjfQSWvhOu9th-FoVqJhSaWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 8, 2013 at 9:27 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-08-08 16:12:06 -0500, Jon Nelson wrote:
...

>> At this point I'm convinced that the issue is a pathological case in
>> ext4. The performance impact disappears as soon as the unwritten
>> extent(s) are written to with real data. Thus, even though allocating
>> files with posix_fallocate is - frequently - orders of magnitude
>> quicker than doing it with write(2), the subsequent re-write can be
>> more expensive. At least, that's what I'm gathering from the various
>> threads.
>
>
>> Why this issue didn't crop up in earlier testing and why I
>> can't seem to make test_fallocate do it (even when I modify
>> test_fallocate to write to the newly-allocated file in a mostly-random
>> fashion) has me baffled.
>
> It might be kernel version specific and concurrency seems to play a
> role. If you reproduce the problem, could you run a "perf record -ga" to
> collect a systemwide profile?

Finally, an excuse to learn how to use 'perf'! I'll try to provide
that info when I am able.

> There's some more things to test:
> - is the slowdown dependent on the scale? I.e is it visible with -j 1 -c
> 1?

scale=1 (-j 1 -c 1):
with fallocate: 685 tps
without: 727

scale=20
with fallocate: 129
without: 402

scale=40
with fallocate: 163
without: 511

> - Does it also occur in synchronous_commit=off configurations? Those
> don't fdatasync() from so many backends, that might play a role.

With synchronous_commit=off, the performance is vastly improved.
Interestingly, the fallocate case is (immaterially) faster than the
non-fallocate case: 3766tps vs 3700tps.

I tried a few other wal_sync_methods besides the default of fdatasync,
all with scale=80.

fsync:
198 tps (with fallocate) vs 187.

open_sync:
195 tps (with fallocate) vs. 192

> - do bulkloads see it? E.g. the initial pgbench load?

time pgbench -s 200 -p 54320 -d pgb -i

with fallocate: 2m47s
without: 2m50s

Hopefully the above is useful.

--
Jon

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-08-09 04:04:55 pg_dump and schema names
Previous Message Tomonari Katsumata 2013-08-09 03:14:57 Re: Should we remove "not fast" promotion at all?