Re: Maximum transaction rate

From: Marco Colombo <pgsql(at)esiway(dot)net>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Christophe <xof(at)thebuild(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Maximum transaction rate
Date: 2009-03-14 04:25:11
Message-ID: 49BB31A7.1090604@esiway.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Scott Marlowe wrote:
> On Fri, Mar 13, 2009 at 1:09 PM, Christophe <xof(at)thebuild(dot)com> wrote:
>> So, if the software calls fsync, but fsync doesn't actually push the data to
>> the controller, you are still at risk... right?
>
> Ding!
>

I've been doing some googling, now I'm not sure that not supporting barriers
implies not supporting (of lying) at blkdev_issue_flush(). It seems that
it's pretty common (and well-defined) for block devices to report
-EOPNOTSUPP at BIO_RW_BARRIER requests. device mapper apparently falls in
this category.

See:
http://lkml.org/lkml/2007/5/25/71
this is an interesting discussion on barriers and flushing.

It seems to me that PostgreSQL needs both ordered and synchronous
writes, maybe at different times (not that EVERY write must be both ordered
and synchronous).

You can emulate ordered with single+synchronous althought with a price.
You can't emulate synchronous writes with just barriers.

OPTIMAL: write-barrier-write-barrier-write-barrier-flush

SUBOPTIMAL: write-flush-write-flush-write-flush

As I understand it, fsync() should always issue a real flush: it's unrelated
to the barriers issue.
There's no API to issue ordered writes (or barriers) at user level,
AFAIK. (Uhm... O_DIRECT, maybe implies that?)

FS code may internally issue barrier requests to the block device, for
its own purposes (e.g. journal updates), but there's not useland API for
that.

Yet, there's no reference to DM not supporting flush correctly in the
whole thread... actually there are refereces to the opposite. DM devices
are defined as FLUSHABLE.

Also see:
http://lkml.org/lkml/2008/2/26/41
but it seems to me that all this discussion is under the assuption that
disks have write-back caches.
"The alternative is to disable the disk write cache." says it all.

.TM.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Devrim GÜNDÜZ 2009-03-14 09:02:10 New shapshot RPMs (Mar 10, 2009) are ready for testing
Previous Message Tatsuo Ishii 2009-03-14 03:32:14 Re: [Pgpool-general] panic: index siblings mismatch