disk caching for writing log

Lists: pgsql-hackers
From: flyusa2010 fly <flyusa2010(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: disk caching for writing log
Date: 2010-12-03 11:49:25
Message-ID: AANLkTik_wxokXpU=1U3_dXWtnBqMPm9pycWcNqZgfe+C@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

When writing log, dbms should synchronously flush log to disk. I'm
wondering, if it is possible that the logs are in disk cache, while the
control is returned to dbms again, so dbms thinks logs are persistent on
disk. In this case, if the disk fails, then there's incorrectness for dbms
log writing, because the log is not persistent, but dbms considers it is
persistent!

Am I correct?


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: flyusa2010 fly <flyusa2010(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: disk caching for writing log
Date: 2010-12-03 17:43:13
Message-ID: 4CF92C31.104@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03.12.2010 13:49, flyusa2010 fly wrote:
> When writing log, dbms should synchronously flush log to disk. I'm
> wondering, if it is possible that the logs are in disk cache, while the
> control is returned to dbms again, so dbms thinks logs are persistent on
> disk. In this case, if the disk fails, then there's incorrectness for dbms
> log writing, because the log is not persistent, but dbms considers it is
> persistent!

I have no idea what you mean. The method we use to flush the WAL to disk
should not be fallible to such failures, we wait for fsync() or
fdatasync() to return before we assume the logs are safely on disk. If
you can elaborate what you mean by "control is returned to dbms", maybe
someone can explain why in more detail.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: flyusa2010 fly <flyusa2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: disk caching for writing log
Date: 2010-12-03 18:14:20
Message-ID: 4CF9337C.2030502@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/03/2010 06:43 PM, Heikki Linnakangas wrote:
> On 03.12.2010 13:49, flyusa2010 fly wrote:
>> When writing log, dbms should synchronously flush log to disk. I'm
>> wondering, if it is possible that the logs are in disk cache, while the
>> control is returned to dbms again, so dbms thinks logs are persistent on
>> disk. In this case, if the disk fails, then there's incorrectness for
>> dbms
>> log writing, because the log is not persistent, but dbms considers it is
>> persistent!
>
> I have no idea what you mean. The method we use to flush the WAL to disk
> should not be fallible to such failures, we wait for fsync() or
> fdatasync() to return before we assume the logs are safely on disk. If
> you can elaborate what you mean by "control is returned to dbms", maybe
> someone can explain why in more detail.

I think he is refering to the plain old "the disk/os is lying about
whether the data really made it to stable storage" issue(especially with
the huge local caches on modern disks) - if you have such a disk and/or
an OS with broken barrier support you are doomed.

Stefan


From: flyusa2010 fly <flyusa2010(at)gmail(dot)com>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: disk caching for writing log
Date: 2010-12-05 06:30:35
Message-ID: AANLkTi=6+b3AgzOnDG5vSQ=5PN03cd5a+AFSzotdEyno@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for your reply.
Yes, i mean disk may lie to os.

On Fri, Dec 3, 2010 at 12:14 PM, Stefan Kaltenbrunner
<stefan(at)kaltenbrunner(dot)cc> wrote:

> On 12/03/2010 06:43 PM, Heikki Linnakangas wrote:
>
>> On 03.12.2010 13:49, flyusa2010 fly wrote:
>>
>>> When writing log, dbms should synchronously flush log to disk. I'm
>>> wondering, if it is possible that the logs are in disk cache, while the
>>> control is returned to dbms again, so dbms thinks logs are persistent on
>>> disk. In this case, if the disk fails, then there's incorrectness for
>>> dbms
>>> log writing, because the log is not persistent, but dbms considers it is
>>> persistent!
>>>
>>
>> I have no idea what you mean. The method we use to flush the WAL to disk
>> should not be fallible to such failures, we wait for fsync() or
>> fdatasync() to return before we assume the logs are safely on disk. If
>> you can elaborate what you mean by "control is returned to dbms", maybe
>> someone can explain why in more detail.
>>
>
> I think he is refering to the plain old "the disk/os is lying about whether
> the data really made it to stable storage" issue(especially with the huge
> local caches on modern disks) - if you have such a disk and/or an OS with
> broken barrier support you are doomed.
>
>
> Stefan
>


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: flyusa2010 fly <flyusa2010(at)gmail(dot)com>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: disk caching for writing log
Date: 2010-12-24 16:47:22
Message-ID: 201012241647.oBOGlMr21323@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

flyusa2010 fly wrote:
> Thanks for your reply.
> Yes, i mean disk may lie to os.

Our documentation covers this extensively:

http://www.postgresql.org/docs/9.0/static/wal-reliability.html

---------------------------------------------------------------------------

>
>
> On Fri, Dec 3, 2010 at 12:14 PM, Stefan Kaltenbrunner
> <stefan(at)kaltenbrunner(dot)cc> wrote:
>
> > On 12/03/2010 06:43 PM, Heikki Linnakangas wrote:
> >
> >> On 03.12.2010 13:49, flyusa2010 fly wrote:
> >>
> >>> When writing log, dbms should synchronously flush log to disk. I'm
> >>> wondering, if it is possible that the logs are in disk cache, while the
> >>> control is returned to dbms again, so dbms thinks logs are persistent on
> >>> disk. In this case, if the disk fails, then there's incorrectness for
> >>> dbms
> >>> log writing, because the log is not persistent, but dbms considers it is
> >>> persistent!
> >>>
> >>
> >> I have no idea what you mean. The method we use to flush the WAL to disk
> >> should not be fallible to such failures, we wait for fsync() or
> >> fdatasync() to return before we assume the logs are safely on disk. If
> >> you can elaborate what you mean by "control is returned to dbms", maybe
> >> someone can explain why in more detail.
> >>
> >
> > I think he is refering to the plain old "the disk/os is lying about whether
> > the data really made it to stable storage" issue(especially with the huge
> > local caches on modern disks) - if you have such a disk and/or an OS with
> > broken barrier support you are doomed.
> >
> >
> > Stefan
> >

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +