Re: SCSI vs. IDE performance test

From: Allen Landsidel <all(at)biosys(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Rick Gigger <rick(at)alpinenetworking(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SCSI vs. IDE performance test
Date: 2003-10-27 23:42:27
Message-ID: 6.0.0.22.0.20031027183334.0245bb08@pop.hotpop.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom, this discussion brings up something that's been bugging me about the
recommendations for getting more performance out of PG.. in particular the
one that suggests you put your WAL files on a different physical drive from
the database.

Consider the following scenario:
Database on drive1
WAL on drive2

1. PG write of some sort occurs.
2. PG writes out the WAL.
3. PG writes out the data.
4. PG updates the WAL to reflect data actually written.
5. System crashes/reboots/whatever.

With the DB and the WAL on different drives, it seems possible to me that
drive2 could've fsync()'d or otherwise properly written all of the data
out, but drive1 could have failed somewhere along the way and not actually
written the data to the DB.

The next time PG is brought up, the WAL would indicate the transaction, as
it were, was a success.. but the data wouldn't actually be there.

In the case of using only one drive, the rollback (from a FS perspective)
couldn't possibly occur in such a way as to leave step 4 as a success, but
step 3 as a failure -- worst case, the data would be written out but the
WAL wouldn't have been updated (rolled back say by the FS) and thus PG will
roll back the data itself, or use whatever mechanism it uses to insure data
integrity is consistent with the WAL.

Am I smoking something here or is this a real, if rare in practice, risk
that occurs when you have the WAL on a different drive than the data is on?

At 17:39 10/27/2003, Tom Lane wrote:
>"Rick Gigger" <rick(at)alpinenetworking(dot)com> writes:
> > It seems to me file system journaling should fix the whole problem by
> giving
> > you a record of what was actually commited to disk and what was not.
>
>Nope, a journaling FS has exactly the same problem Postgres does
>(because the underlying "WAL" concept is the same: write the log entries
>before you change the files they describe). If the drive lies about
>write order, the FS can be screwed just as badly. Now the FS code might
>have a low-level way to force write order that Postgres doesn't have
>access to ... but simply uttering the magic incantation "journaling file
>system" will not make this problem disappear.
>
> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2003-10-28 00:05:44 Re: SCSI vs. IDE performance test
Previous Message Tom Lane 2003-10-27 23:38:06 Re: What is an RT index?