Lists: | pgsql-hackers |
---|
From: | Mark Wong <markwkm(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | fix use of posix_fadvise in xlog.c |
Date: | 2010-06-10 03:47:26 |
Message-ID: | AANLkTin_PXWiqMOMS5-vRFXKUc6047T5QFboQqKsyHn4@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi all,
I wanted to propose a fix for to xlog.c regarding the use of
posix_fadvise() for 9.1 (unless someone feels it's ok for 9.0).
Currently posix_fadvise() is used right before a log file is closed so
it's effectively not doing anything, when posix_fadvise is to be
called. This patch moves the posix_fadvise() call into 3 other
locations within XLogFileInit() where a file handle is returned. The
first case is where an existing open file handle is returned. The
next case is when a file is to be zeroed out. The third case is
returning a file handle, which may be the file that was just zeroed
out.
Does this look ok?
Regards,
Mark
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-use-of-posix_fadvise-in-xlog.patch | application/octet-stream | 3.0 KB |
From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Mark Wong <markwkm(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: fix use of posix_fadvise in xlog.c |
Date: | 2010-06-10 06:25:08 |
Message-ID: | 4C108544.5060808@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 10/06/10 06:47, Mark Wong wrote:
> I wanted to propose a fix for to xlog.c regarding the use of
> posix_fadvise() for 9.1 (unless someone feels it's ok for 9.0).
> Currently posix_fadvise() is used right before a log file is closed so
> it's effectively not doing anything, when posix_fadvise is to be
> called. This patch moves the posix_fadvise() call into 3 other
> locations within XLogFileInit() where a file handle is returned. The
> first case is where an existing open file handle is returned. The
> next case is when a file is to be zeroed out. The third case is
> returning a file handle, which may be the file that was just zeroed
> out.
I don't think POSIX_FADV_DONTNEED does what you think it does. It tells
the kernel that "you don't need to keep these pages in the cache
anymore, I won't be accessing them anymore". If you call it when you
open the file, before reading/writing, there is nothing in the cache and
the call will do nothing.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From: | Mark Wong <markwkm(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fix use of posix_fadvise in xlog.c |
Date: | 2010-06-10 15:17:19 |
Message-ID: | F5CE985E-CA7B-4723-BBA2-0EC414127B90@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Jun 9, 2010, at 11:25 PM, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com
> wrote:
> On 10/06/10 06:47, Mark Wong wrote:
>> I wanted to propose a fix for to xlog.c regarding the use of
>> posix_fadvise() for 9.1 (unless someone feels it's ok for 9.0).
>> Currently posix_fadvise() is used right before a log file is closed
>> so
>> it's effectively not doing anything, when posix_fadvise is to be
>> called. This patch moves the posix_fadvise() call into 3 other
>> locations within XLogFileInit() where a file handle is returned. The
>> first case is where an existing open file handle is returned. The
>> next case is when a file is to be zeroed out. The third case is
>> returning a file handle, which may be the file that was just zeroed
>> out.
>
> I don't think POSIX_FADV_DONTNEED does what you think it does. It
> tells the kernel that "you don't need to keep these pages in the
> cache anymore, I won't be accessing them anymore". If you call it
> when you open the file, before reading/writing, there is nothing in
> the cache and the call will do nothing.
Oops, my bad. I think I was confused by the short description in the
man page. I didn't read the longer descriptoon. :( Then would it be
worth making the this call after the file is zeroed out?
Regards,
Mark
From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Mark Wong <markwkm(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fix use of posix_fadvise in xlog.c |
Date: | 2010-06-10 15:37:49 |
Message-ID: | 4C1106CD.6030205@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 10/06/10 18:17, Mark Wong wrote:
> On Jun 9, 2010, at 11:25 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> I don't think POSIX_FADV_DONTNEED does what you think it does. It
>> tells the kernel that "you don't need to keep these pages in the cache
>> anymore, I won't be accessing them anymore". If you call it when you
>> open the file, before reading/writing, there is nothing in the cache
>> and the call will do nothing.
>
> Oops, my bad. I think I was confused by the short description in the man
> page. I didn't read the longer descriptoon. :( Then would it be worth
> making the this call after the file is zeroed out?
Not sure. If you're churning through WAL files at a reasonable speed,
the zeroed-out file will soon be written to again. OTOH, we always write
whole pages, so maybe the OS is smart enough to not read the page back
to memory just to overwrite it.
In a steady-state situation new WAL files are not created very often
because we recycle old ones, so it probably doesn't make much difference.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Mark Wong <markwkm(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fix use of posix_fadvise in xlog.c |
Date: | 2010-06-10 15:44:02 |
Message-ID: | 17458.1276184642@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> In a steady-state situation new WAL files are not created very often
> because we recycle old ones, so it probably doesn't make much difference.
Yeah. We really don't worry too much about the performance of the
new-WAL-file-creation code path because of this.
regards, tom lane
From: | Greg Smith <greg(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Mark Wong <markwkm(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fix use of posix_fadvise in xlog.c |
Date: | 2010-06-10 21:34:29 |
Message-ID: | 4C115A65.5010300@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>
>> In a steady-state situation new WAL files are not created very often
>> because we recycle old ones, so it probably doesn't make much difference.
>>
>
> Yeah. We really don't worry too much about the performance of the
> new-WAL-file-creation code path because of this.
>
The only situation where the WAL zeroing path turns ugly is if you
launch a bunch of activity against a fresh server that doesn't have any
segments to recycle yet. The last time we talked about improving that,
the best idea I thought came out was to be better about preallocating
segments than the code already is, rather than trying to speed up how
the kernel deals with the situation. See the links for "Be more
aggressive about creating WAL files" at http://wiki.postgresql.org/wiki/Todo
I'm also not very optimistic about adding more posix_fadvise calls
really helping just because the implementations of those are so
unpredictable across operating systems. I'm sure that Mark could figure
out the right magic to speed up this specific case on Linux, but have my
doubts that work would translate very well to many other operating
systems. Whereas a more generic preallocation improvement would help
everywhere.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us