Re: Free WAL caches on switching segments

Lists: pgsql-patches
From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-patches(at)postgresql(dot)org
Subject: Free WAL caches on switching segments
Date: 2005-08-30 07:15:32
Message-ID: 20050830151029.4A59.ITAGAKI.TAKAHIRO@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Hi,

Here is a small patch to prevent undesired WAL file caching by kernel.
posix_fadvise(POSIX_FADV_DONTNEED) attempts to free cached pages and
the kernel will discard them in preference to other data caches.

I think it works just like as O_DIRECT in terms of cache control.
O_DIRECT may be a better solution than posix_fadvise, but
posix_fadvise may be used on platforms where O_DIRECT is not supported.

pgbench results are as follows:
wal_sync_method
- open_sync : 156.0 tps
- fdatasync : 126.3 tps
- fdatasync+fadvise : 161.2 tps
(8.1beta1 on Linux 2.6.8-24)

I'll appreciate any comments and advices,
Takahiro

*** xlog.c Mon Aug 29 11:51:19 2005
--- xlog-with-fadvise.c Tue Aug 30 15:08:24 2005
***************
*** 1352,1357 ****
--- 1352,1358 ----
Assert(npages == 0);
if (openLogFile >= 0)
{
+ posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
if (close(openLogFile))
ereport(PANIC,
(errcode_for_file_access(),
***************
*** 1535,1540 ****
--- 1536,1542 ----
if (openLogFile >= 0 &&
!XLByteInPrevSeg(LogwrtResult.Write, openLogId, openLogSeg))
{
+ posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
if (close(openLogFile))
ereport(PANIC,
(errcode_for_file_access(),


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2005-08-30 07:25:27
Message-ID: 18680.1125386727@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> Here is a small patch to prevent undesired WAL file caching by kernel.
> posix_fadvise(POSIX_FADV_DONTNEED) attempts to free cached pages and
> the kernel will discard them in preference to other data caches.

On plenty of platforms, this won't even compile ...

regards, tom lane


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2005-08-30 07:38:53
Message-ID: 20050830162950.4A5E.ITAGAKI.TAKAHIRO@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> > Here is a small patch to prevent undesired WAL file caching by kernel.
> > posix_fadvise(POSIX_FADV_DONTNEED) attempts to free cached pages and
> > the kernel will discard them in preference to other data caches.
>
> On plenty of platforms, this won't even compile ...

Do you mean simply following code? or more pretty way?

#ifdef POSIX_FADV_DONTNEED
posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
#endif

---
ITAGAKI Takahiro
NTT Cyber Space Laboratories


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2005-09-23 17:56:09
Message-ID: 200509231756.j8NHu9U13411@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

ITAGAKI Takahiro wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > > Here is a small patch to prevent undesired WAL file caching by kernel.
> > > posix_fadvise(POSIX_FADV_DONTNEED) attempts to free cached pages and
> > > the kernel will discard them in preference to other data caches.
> >
> > On plenty of platforms, this won't even compile ...
>
>
> Do you mean simply following code? or more pretty way?
>
> #ifdef POSIX_FADV_DONTNEED
> posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
> #endif

Yes, but we probably have to have a configure test to see if
posix_fadvise exists too. I will keep this for 8.2.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-01-12 05:22:41
Message-ID: 20060112135902.4D32.ITAGAKI.TAKAHIRO@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:

> > > > Here is a small patch to prevent undesired WAL file caching by kernel.
> > > > posix_fadvise(POSIX_FADV_DONTNEED) attempts to free cached pages and
> > > > the kernel will discard them in preference to other data caches.
> > >
> > > On plenty of platforms, this won't even compile ...
>
> Yes, but we probably have to have a configure test to see if
> posix_fadvise exists too. I will keep this for 8.2.

I think we can use _POSIX_ADVISORY_INFO to test if posix_fadvise exists.
Also, I added the check on whether WAL archiving is enabled, because
archivers might use the caches to read the WAL segment.

By the way, should we put posix_fadvise on the separated place with renaming
pg_fadvise? If we use posix_fadvise in other purposes, for example,
read-ahead control, the separation would be good to keep codes clean.

---
ITAGAKI Takahiro
NTT Cyber Space Laboratories

Attachment Content-Type Size
xlog-with-fadvise.patch application/octet-stream 3.5 KB

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-12 06:10:14
Message-ID: 200602120610.k1C6AEF17450@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


I looked this over and I am unsure what this does for us that isn't
already accomplished using the wal_sync_method settings. See xlog.c for
a description of O_DIRECT and when it is used.

---------------------------------------------------------------------------

ITAGAKI Takahiro wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>
> > > > > Here is a small patch to prevent undesired WAL file caching by kernel.
> > > > > posix_fadvise(POSIX_FADV_DONTNEED) attempts to free cached pages and
> > > > > the kernel will discard them in preference to other data caches.
> > > >
> > > > On plenty of platforms, this won't even compile ...
> >
> > Yes, but we probably have to have a configure test to see if
> > posix_fadvise exists too. I will keep this for 8.2.
>
> I think we can use _POSIX_ADVISORY_INFO to test if posix_fadvise exists.
> Also, I added the check on whether WAL archiving is enabled, because
> archivers might use the caches to read the WAL segment.
>
>
> By the way, should we put posix_fadvise on the separated place with renaming
> pg_fadvise? If we use posix_fadvise in other purposes, for example,
> read-ahead control, the separation would be good to keep codes clean.
>
> ---
> ITAGAKI Takahiro
> NTT Cyber Space Laboratories
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-13 02:43:17
Message-ID: 20060213104414.49B2.ITAGAKI.TAKAHIRO@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:

> I looked this over and I am unsure what this does for us that isn't
> already accomplished using the wal_sync_method settings. See xlog.c for
> a description of O_DIRECT and when it is used.

I proposed it to supplement the cache control. There are some OSes that
supports posix_fadvise but not O_DIRECT, for example, NetBSD 4.0
(http://www.netbsd.org/Changes/changes-4.0.html).

---
ITAGAKI Takahiro
NTT Cyber Space Laboratories


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-13 02:49:30
Message-ID: 200602130249.k1D2nUe07718@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

ITAGAKI Takahiro wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>
> > I looked this over and I am unsure what this does for us that isn't
> > already accomplished using the wal_sync_method settings. See xlog.c for
> > a description of O_DIRECT and when it is used.
>
> I proposed it to supplement the cache control. There are some OSes that
> supports posix_fadvise but not O_DIRECT, for example, NetBSD 4.0
> (http://www.netbsd.org/Changes/changes-4.0.html).

Oh, that makes sense then. Let me re-add it to the queue.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-13 16:47:58
Message-ID: 2166.1139849278@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> ITAGAKI Takahiro wrote:
>> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>>> I looked this over and I am unsure what this does for us that isn't
>>> already accomplished using the wal_sync_method settings. See xlog.c for
>>> a description of O_DIRECT and when it is used.
>>
>> I proposed it to supplement the cache control. There are some OSes that
>> supports posix_fadvise but not O_DIRECT, for example, NetBSD 4.0
>> (http://www.netbsd.org/Changes/changes-4.0.html).

> Oh, that makes sense then. Let me re-add it to the queue.

Could we see some performance measurements *from those OSes*? The given
test on Linux certainly does not justify adding another operating
system dependency to the WAL code. For that matter, even if it is a big
win on some versions of NetBSD, I'm not sure I'd want to accept it ...
how many NetBSD users do we have who would care?

Depending on OS features we have never depended on before is a *huge*
ongoing maintenance cost, and I have not seen an argument that I think
justifies this one.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-13 17:56:01
Message-ID: 200602131756.k1DHu1300615@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > ITAGAKI Takahiro wrote:
> >> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
> >>> I looked this over and I am unsure what this does for us that isn't
> >>> already accomplished using the wal_sync_method settings. See xlog.c for
> >>> a description of O_DIRECT and when it is used.
> >>
> >> I proposed it to supplement the cache control. There are some OSes that
> >> supports posix_fadvise but not O_DIRECT, for example, NetBSD 4.0
> >> (http://www.netbsd.org/Changes/changes-4.0.html).
>
> > Oh, that makes sense then. Let me re-add it to the queue.
>
> Could we see some performance measurements *from those OSes*? The given
> test on Linux certainly does not justify adding another operating
> system dependency to the WAL code. For that matter, even if it is a big
> win on some versions of NetBSD, I'm not sure I'd want to accept it ...
> how many NetBSD users do we have who would care?
>
> Depending on OS features we have never depended on before is a *huge*
> ongoing maintenance cost, and I have not seen an argument that I think
> justifies this one.

I disagree. It is a localized change and seems like a win, and it uses
a standard POSIX feature, rather than an OS-specific one.

I fact the patch cleans up our code by centralizing the WAL close() code
and error message.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-13 18:09:34
Message-ID: 2884.1139854174@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom Lane wrote:
>> Depending on OS features we have never depended on before is a *huge*
>> ongoing maintenance cost, and I have not seen an argument that I think
>> justifies this one.

> I disagree. It is a localized change and seems like a win, and it uses
> a standard POSIX feature, rather than an OS-specific one.

It's still gonna need a configure test and so on. "POSIX" does not mean
"exists everywhere". Moreover, the submitter has not even proven that
the code works (or even builds, much less does anything useful) on the
platforms it's supposedly for.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-13 18:13:58
Message-ID: 200602131813.k1DIDwk03860@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom Lane wrote:
> >> Depending on OS features we have never depended on before is a *huge*
> >> ongoing maintenance cost, and I have not seen an argument that I think
> >> justifies this one.
>
> > I disagree. It is a localized change and seems like a win, and it uses
> > a standard POSIX feature, rather than an OS-specific one.
>
> It's still gonna need a configure test and so on. "POSIX" does not mean
> "exists everywhere". Moreover, the submitter has not even proven that
> the code works (or even builds, much less does anything useful) on the
> platforms it's supposedly for.

The submitter believes the C macro test is sufficient:

I think we can use _POSIX_ADVISORY_INFO to test if posix_fadvise exists.
Also, I added the check on whether WAL archiving is enabled, because
archivers might use the caches to read the WAL segment.

I assume if it follows the POSIX spec it will work on all platforms that
support this feature.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-13 18:22:53
Message-ID: 3033.1139854973@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom Lane wrote:
>> It's still gonna need a configure test and so on.

> The submitter believes the C macro test is sufficient:

Do I get to revert the patch the moment that fails in buildfarm?

More to the point, the utility of the patch remains unproven.
We are not in the habit of adding OS dependencies on speculation.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-13 18:47:51
Message-ID: 200602131847.k1DIlpM09109@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom Lane wrote:
> >> It's still gonna need a configure test and so on.
>
> > The submitter believes the C macro test is sufficient:
>
> Do I get to revert the patch the moment that fails in buildfarm?

You bet. In fact I liked his centralizing the WAL close() irrespective
of the feature itself.

> More to the point, the utility of the patch remains unproven.
> We are not in the habit of adding OS dependencies on speculation.

He ran tests, though it is speculation because non-caching is a pretty
hard thing to find a benefit from except under low memory situations.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-13 18:49:31
Message-ID: 200602131849.k1DInVS09465@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom Lane wrote:
> >> It's still gonna need a configure test and so on.
>
> > The submitter believes the C macro test is sufficient:
>
> Do I get to revert the patch the moment that fails in buildfarm?

In fact, I will revert it when it fails in the build farm. :-)

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-13 19:03:24
Message-ID: 3443.1139857404@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom Lane wrote:
>> More to the point, the utility of the patch remains unproven.
>> We are not in the habit of adding OS dependencies on speculation.

> He ran tests, though it is speculation because non-caching is a pretty
> hard thing to find a benefit from except under low memory situations.

Well, the tests (a) didn't show any particularly good speedup, and
(b) were not on the platforms that this is speculated to be useful on
(ie, those without O_DIRECT).

I really don't think that an adequate case has been made for adding
a new OS dependency.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-13 19:59:43
Message-ID: 200602131959.k1DJxhr20072@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom Lane wrote:
> >> More to the point, the utility of the patch remains unproven.
> >> We are not in the habit of adding OS dependencies on speculation.
>
> > He ran tests, though it is speculation because non-caching is a pretty
> > hard thing to find a benefit from except under low memory situations.
>
> Well, the tests (a) didn't show any particularly good speedup, and
> (b) were not on the platforms that this is speculated to be useful on
> (ie, those without O_DIRECT).
>
> I really don't think that an adequate case has been made for adding
> a new OS dependency.

Well, I think the patch should be applied, and the submitter does too,
so unless I hear other votes, it is going in.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: daveg <daveg(at)sonic(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-13 22:50:36
Message-ID: 20060213225036.GE23771@sonic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Mon, Feb 13, 2006 at 02:59:43PM -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > Tom Lane wrote:
> > >> More to the point, the utility of the patch remains unproven.
> > >> We are not in the habit of adding OS dependencies on speculation.
> >
> > > He ran tests, though it is speculation because non-caching is a pretty
> > > hard thing to find a benefit from except under low memory situations.
> >
> > Well, the tests (a) didn't show any particularly good speedup, and
> > (b) were not on the platforms that this is speculated to be useful on
> > (ie, those without O_DIRECT).
> >
> > I really don't think that an adequate case has been made for adding
> > a new OS dependency.
>
> Well, I think the patch should be applied, and the submitter does too,
> so unless I hear other votes, it is going in.

I vote no for whatever that is worth. A "performance" change needs to
actually demonstrate improved performance. If the change is really
desireable to clean up some messy code, then add it as a cleanup change
without the extra system calls. Otherwise it just adds one more bit of
mystery for future maintainers who may be decieved into thinking that posix
advise calls are important voodoo.

-dg

--
David Gould daveg(at)sonic(dot)net
If simplicity worked, the world would be overrun with insects.


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: daveg <daveg(at)sonic(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-14 01:47:22
Message-ID: 200602140147.k1E1lMv14577@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

daveg wrote:
> On Mon, Feb 13, 2006 at 02:59:43PM -0500, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > > Tom Lane wrote:
> > > >> More to the point, the utility of the patch remains unproven.
> > > >> We are not in the habit of adding OS dependencies on speculation.
> > >
> > > > He ran tests, though it is speculation because non-caching is a pretty
> > > > hard thing to find a benefit from except under low memory situations.
> > >
> > > Well, the tests (a) didn't show any particularly good speedup, and
> > > (b) were not on the platforms that this is speculated to be useful on
> > > (ie, those without O_DIRECT).
> > >
> > > I really don't think that an adequate case has been made for adding
> > > a new OS dependency.
> >
> > Well, I think the patch should be applied, and the submitter does too,
> > so unless I hear other votes, it is going in.
>
> I vote no for whatever that is worth. A "performance" change needs to
> actually demonstrate improved performance. If the change is really
> desireable to clean up some messy code, then add it as a cleanup change
> without the extra system calls. Otherwise it just adds one more bit of
> mystery for future maintainers who may be decieved into thinking that posix
> advise calls are important voodoo.

Yes, your vote counts very much. What if I apply the patch, but mark
the posix_advise() call in a NOT_USED macro block, so it will be ready
for people to test, but will not be used until we are sure.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: daveg <daveg(at)sonic(dot)net>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-14 04:33:31
Message-ID: 9855.1139891611@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Yes, your vote counts very much. What if I apply the patch, but mark
> the posix_advise() call in a NOT_USED macro block, so it will be ready
> for people to test, but will not be used until we are sure.

Sounds like a recipe for ensuring it never will be tested. What's
needed here is some actual tests, not preparation...

regards, tom lane


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, daveg <daveg(at)sonic(dot)net>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-14 05:06:04
Message-ID: 43F1653C.2060101@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
>>Yes, your vote counts very much. What if I apply the patch, but mark
>>the posix_advise() call in a NOT_USED macro block, so it will be ready
>>for people to test, but will not be used until we are sure.
>
>
> Sounds like a recipe for ensuring it never will be tested. What's
> needed here is some actual tests, not preparation...
>

Gotta second that.

Does the OP have a test scenario that those of us with appropriate OS's
could try? Come to think of it, what are the appropriate OS's? (I see
NetBSD mentioned so I suppose all the *BSDs, but what others?).

Cheers

Mark


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, daveg <daveg(at)sonic(dot)net>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-14 05:17:39
Message-ID: 43F167F3.8010505@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Mark Kirkwood wrote:
> Come to think of it, what are the appropriate OS's? (I see
> NetBSD mentioned so I suppose all the *BSDs, but what others?).
>

FWIW FreeBSD (6.0) does *not* have posix_fadvise, only posix_madvise.

regards

Mark


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: daveg <daveg(at)sonic(dot)net>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-14 12:54:10
Message-ID: 1139921650.1258.894.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Mon, 2006-02-13 at 23:33 -0500, Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Yes, your vote counts very much. What if I apply the patch, but mark
> > the posix_advise() call in a NOT_USED macro block, so it will be ready
> > for people to test, but will not be used until we are sure.
>
> Sounds like a recipe for ensuring it never will be tested. What's
> needed here is some actual tests, not preparation...

Without discussing this particular patch, IMHO we need a clear checklist
of items that are required before a patch is accepted onto the patches
awaiting application list.

We seem to have had a few patches rejected on grounds that could have
been picked up by other people, not just Tom and Bruce. This wastes
everybody's time because the patch writer might have fixed whatever was
wrong with it a while back if the patch had been rejected earlier. It
will also eventually bring the process into disrepute if one accepts
patches onto the queue and then another raises major objections that
could easily have been rectified earlier.

So let's agree a checklist beforehand. That way patch submitters can be
told to resubmit much earlier by other list watchers, with less wasted
time (elapsed and from core folk) and perhaps a gentler experience for
first-time submitters. Control can and should still lie with committers.

Suggested checklist:
1. has patch been discussed previously? Y/N
- if Y, give direct link to archive of message, and/or bug#
- if N discuss on appropriate list, or expect to be rejected
2. patch target: cvstip or stated branch(es)
3. patch application location: root directory only
4. patch format: diff -c only
5. confirm licence is BSD: Y/N
6. port specific: Y/N, if Y list ports
7. confirm passes make check (on listed ports)
8. provide implementation overview, preferably in code comments
9. if it is a performance patch, provide confirming test results. It is
OK to post patches without these, though the patch will not be applied
until *somebody* has tested the patches and found a valuable performance
effect directly attributable to the patch.
10. If it is a new feature patch, confirm that it has been tested for
all desired scenarios. If it has not, this should be clearly stated as a
request for a particular kind of test to be performed. Note that the
patch will go no further until that test has been performed.
11. if it is a new feature patch, does it break any existing defaults?
Explain why this is *required* or patch will be rejected. New feature
patches should be accompanied by doc patches also.
12. Even if you pass all of the above, the patch may still be rejected
for other technical reasons. You should be prepared to listen to
comments received and perform any agreed rework. Even if you have
received positive comments from some community members, others may spot
problems with your approach, coding style or many other issues.
13. Successful patches will be notified to you by email and you will be
credited for that work in the next set of release notes.

I would also suggest that we have two patch queues:
- patches awaiting performance testing
- patches awaiting application (current one)

That way anybody wanting to test new performance add-ons can do so and
reply to the list with confirmation that the patch can now be added to
the second (main) list of patches.

Of course, this suggestion will be immediately rejected because it
wasn't discussed on -hackers ;-)

Best Regards, Simon Riggs


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, daveg <daveg(at)sonic(dot)net>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-14 14:47:43
Message-ID: 13654.1139928463@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Mark Kirkwood <markir(at)paradise(dot)net(dot)nz> writes:
> Tom Lane wrote:
>> Sounds like a recipe for ensuring it never will be tested. What's
>> needed here is some actual tests, not preparation...

> Does the OP have a test scenario that those of us with appropriate OS's
> could try? Come to think of it, what are the appropriate OS's? (I see
> NetBSD mentioned so I suppose all the *BSDs, but what others?).

The test run by the OP was just pgbench, which is probably not the
greatest scenario for showing the benefits of this patch, but at least
it's neutral ground. You need a situation in which the kernel is under
memory stress, else early free of disk cache buffers isn't going to make
any difference whatever --- so choose a pgbench scale factor that makes
the database noticeably larger than the test machine's RAM. Other than
that, follow the usual guidelines for producing trustworthy pgbench
numbers: number of clients smaller than scale factor, number of
transactions per client at least 1000 or so (to eliminate startup
transients), repeat test a couple times to make sure numbers are
reproducible.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, daveg <daveg(at)sonic(dot)net>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-14 15:10:35
Message-ID: 1139929835.1258.944.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Tue, 2006-02-14 at 12:54 +0000, Simon Riggs wrote:
> On Mon, 2006-02-13 at 23:33 -0500, Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > Yes, your vote counts very much. What if I apply the patch, but mark
> > > the posix_advise() call in a NOT_USED macro block, so it will be ready
> > > for people to test, but will not be used until we are sure.
> >
> > Sounds like a recipe for ensuring it never will be tested. What's
> > needed here is some actual tests, not preparation...
>
> Without discussing this particular patch, IMHO we need a clear checklist
> of items that are required before a patch is accepted onto the patches
> awaiting application list.

This was supposed to be a serious suggestion, so apologies if this came
across stronger than it was meant.

The onus is of course upon the patch submitter to improve their game,
but there seems only benefit in setting out the (simpler) rules of the
game to show people what is unacceptable, even before they submit.

Best Regards, Simon Riggs


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, daveg <daveg(at)sonic(dot)net>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-14 15:47:30
Message-ID: 15219.1139932050@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> This was supposed to be a serious suggestion, so apologies if this came
> across stronger than it was meant.

If you want to have a serious discussion about it, you need to start a
thread on pghackers under a more suitable subject line. The people
reading this thread are going to be a relatively small group.

regards, tom lane


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, daveg <daveg(at)sonic(dot)net>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-14 21:50:27
Message-ID: 43F250A3.6020304@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Mark Kirkwood <markir(at)paradise(dot)net(dot)nz> writes:
>
>>Tom Lane wrote:
>>
>>>Sounds like a recipe for ensuring it never will be tested. What's
>>>needed here is some actual tests, not preparation...
>
>
>>Does the OP have a test scenario that those of us with appropriate OS's
>>could try? Come to think of it, what are the appropriate OS's? (I see
>>NetBSD mentioned so I suppose all the *BSDs, but what others?).
>
>
> The test run by the OP was just pgbench,

Ah - right, missed that sorry.

> which is probably not the
> greatest scenario for showing the benefits of this patch, but at least
> it's neutral ground. You need a situation in which the kernel is under
> memory stress, else early free of disk cache buffers isn't going to make
> any difference whatever --- so choose a pgbench scale factor that makes
> the database noticeably larger than the test machine's RAM. Other than
> that, follow the usual guidelines for producing trustworthy pgbench
> numbers: number of clients smaller than scale factor, number of
> transactions per client at least 1000 or so (to eliminate startup
> transients), repeat test a couple times to make sure numbers are
> reproducible.
>

Thinking about this, presumably any write intensive, multi-user
benchmark would seem to be suitable, so would something like OSDL's
DBT-2 actually be better to try?

Cheers

Mark

(P.s - academic in my case, unless I try out the latest NetBSD or Linux
on one of my FreeBSD boxes....)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, daveg <daveg(at)sonic(dot)net>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-02-14 21:58:57
Message-ID: 23092.1139954337@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Mark Kirkwood <markir(at)paradise(dot)net(dot)nz> writes:
> Thinking about this, presumably any write intensive, multi-user
> benchmark would seem to be suitable, so would something like OSDL's
> DBT-2 actually be better to try?

I'm certainly not wedded to pgbench, give it a try.

BTW, I forgot to mention that it would be useful to try different
wal_sync_methods along with this. The reason why it seems unlikely
the patch is useful on Linux is that the sync methods that use O_DIRECT
probably dominate using the patch anyway. There may or may not be
a similar dependence on sync method on other kernels ...

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, daveg <daveg(at)sonic(dot)net>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-03-02 19:26:42
Message-ID: 200603021926.k22JQgb19347@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Mark Kirkwood <markir(at)paradise(dot)net(dot)nz> writes:
> > Thinking about this, presumably any write intensive, multi-user
> > benchmark would seem to be suitable, so would something like OSDL's
> > DBT-2 actually be better to try?
>
> I'm certainly not wedded to pgbench, give it a try.
>
> BTW, I forgot to mention that it would be useful to try different
> wal_sync_methods along with this. The reason why it seems unlikely
> the patch is useful on Linux is that the sync methods that use O_DIRECT
> probably dominate using the patch anyway. There may or may not be
> a similar dependence on sync method on other kernels ...

I am thinking the only way to test this would be to do one heavy update
session to generate a lot of WAL traffic, and another session that is
doing a sequential scan on a table that fills most of the cache. It
isn't an easy test to make, which was why I was thinking we just add the
patch, but the community disagrees.

--
Bruce Momjian http://candle.pha.pa.us
SRA OSS, Inc. http://www.sraoss.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-06-14 17:39:23
Message-ID: 200606141739.k5EHdO206649@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


I have modified your patch (attached) and will apply soon, unless there
are more community comments. Thanks.

---------------------------------------------------------------------------

ITAGAKI Takahiro wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>
> > > > > Here is a small patch to prevent undesired WAL file caching by kernel.
> > > > > posix_fadvise(POSIX_FADV_DONTNEED) attempts to free cached pages and
> > > > > the kernel will discard them in preference to other data caches.
> > > >
> > > > On plenty of platforms, this won't even compile ...
> >
> > Yes, but we probably have to have a configure test to see if
> > posix_fadvise exists too. I will keep this for 8.2.
>
> I think we can use _POSIX_ADVISORY_INFO to test if posix_fadvise exists.
> Also, I added the check on whether WAL archiving is enabled, because
> archivers might use the caches to read the WAL segment.
>
>
> By the way, should we put posix_fadvise on the separated place with renaming
> pg_fadvise? If we use posix_fadvise in other purposes, for example,
> read-ahead control, the separation would be good to keep codes clean.
>
> ---
> ITAGAKI Takahiro
> NTT Cyber Space Laboratories
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
unknown_filename text/plain 3.9 KB

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Free WAL caches on switching segments
Date: 2006-06-15 19:15:01
Message-ID: 200606151915.k5FJF1116456@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


Patch applied. Thanks.

---------------------------------------------------------------------------

Bruce Momjian wrote:
>
> I have modified your patch (attached) and will apply soon, unless there
> are more community comments. Thanks.
>
> ---------------------------------------------------------------------------
>
> ITAGAKI Takahiro wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
> >
> > > > > > Here is a small patch to prevent undesired WAL file caching by kernel.
> > > > > > posix_fadvise(POSIX_FADV_DONTNEED) attempts to free cached pages and
> > > > > > the kernel will discard them in preference to other data caches.
> > > > >
> > > > > On plenty of platforms, this won't even compile ...
> > >
> > > Yes, but we probably have to have a configure test to see if
> > > posix_fadvise exists too. I will keep this for 8.2.
> >
> > I think we can use _POSIX_ADVISORY_INFO to test if posix_fadvise exists.
> > Also, I added the check on whether WAL archiving is enabled, because
> > archivers might use the caches to read the WAL segment.
> >
> >
> > By the way, should we put posix_fadvise on the separated place with renaming
> > pg_fadvise? If we use posix_fadvise in other purposes, for example,
> > read-ahead control, the separation would be good to keep codes clean.
> >
> > ---
> > ITAGAKI Takahiro
> > NTT Cyber Space Laboratories
> >
>
> [ Attachment, skipping... ]
>
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> > choose an index scan if your joining column's datatypes do not
> > match
>
> --
> Bruce Momjian http://candle.pha.pa.us
> EnterpriseDB http://www.enterprisedb.com
>
> + If your life is a hard drive, Christ can be your backup. +

> Index: src/backend/access/transam/xlog.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
> retrieving revision 1.237
> diff -c -c -r1.237 xlog.c
> *** src/backend/access/transam/xlog.c 20 Apr 2006 04:07:38 -0000 1.237
> --- src/backend/access/transam/xlog.c 14 Jun 2006 17:38:10 -0000
> ***************
> *** 478,483 ****
> --- 478,484 ----
> bool use_lock);
> static int XLogFileOpen(uint32 log, uint32 seg);
> static int XLogFileRead(uint32 log, uint32 seg, int emode);
> + static void XLogFileClose(void);
> static bool RestoreArchivedFile(char *path, const char *xlogfname,
> const char *recovername, off_t expectedSize);
> static int PreallocXlogFiles(XLogRecPtr endptr);
> ***************
> *** 1384,1397 ****
> */
> Assert(npages == 0);
> if (openLogFile >= 0)
> ! {
> ! if (close(openLogFile))
> ! ereport(PANIC,
> ! (errcode_for_file_access(),
> ! errmsg("could not close log file %u, segment %u: %m",
> ! openLogId, openLogSeg)));
> ! openLogFile = -1;
> ! }
> XLByteToPrevSeg(LogwrtResult.Write, openLogId, openLogSeg);
>
> /* create/use new log file */
> --- 1385,1391 ----
> */
> Assert(npages == 0);
> if (openLogFile >= 0)
> ! XLogFileClose();
> XLByteToPrevSeg(LogwrtResult.Write, openLogId, openLogSeg);
>
> /* create/use new log file */
> ***************
> *** 1567,1580 ****
> {
> if (openLogFile >= 0 &&
> !XLByteInPrevSeg(LogwrtResult.Write, openLogId, openLogSeg))
> ! {
> ! if (close(openLogFile))
> ! ereport(PANIC,
> ! (errcode_for_file_access(),
> ! errmsg("could not close log file %u, segment %u: %m",
> ! openLogId, openLogSeg)));
> ! openLogFile = -1;
> ! }
> if (openLogFile < 0)
> {
> XLByteToPrevSeg(LogwrtResult.Write, openLogId, openLogSeg);
> --- 1561,1567 ----
> {
> if (openLogFile >= 0 &&
> !XLByteInPrevSeg(LogwrtResult.Write, openLogId, openLogSeg))
> ! XLogFileClose();
> if (openLogFile < 0)
> {
> XLByteToPrevSeg(LogwrtResult.Write, openLogId, openLogSeg);
> ***************
> *** 2153,2158 ****
> --- 2140,2173 ----
> }
>
> /*
> + * Close the current logfile segment for writing.
> + */
> + static void
> + XLogFileClose(void)
> + {
> + Assert(openLogFile >= 0);
> +
> + #ifdef _POSIX_ADVISORY_INFO
> + /*
> + * WAL caches will not be accessed in the future, so we advise OS to
> + * free them. But we will not do so if WAL archiving is active,
> + * because archivers might use the caches to read the WAL segment.
> + * While O_DIRECT works for O_SYNC, posix_fadvise() works for fsync()
> + * and O_SYNC, and some platforms only have posix_fadvise().
> + */
> + if (!XLogArchivingActive())
> + posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
> + #endif
> +
> + if (close(openLogFile))
> + ereport(PANIC,
> + (errcode_for_file_access(),
> + errmsg("could not close log file %u, segment %u: %m",
> + openLogId, openLogSeg)));
> + openLogFile = -1;
> + }
> +
> + /*
> * Attempt to retrieve the specified file from off-line archival storage.
> * If successful, fill "path" with its complete path (note that this will be
> * a temp file name that doesn't follow the normal naming convention), and
> ***************
> *** 5609,5622 ****
> errmsg("could not fsync log file %u, segment %u: %m",
> openLogId, openLogSeg)));
> if (open_sync_bit != new_sync_bit)
> ! {
> ! if (close(openLogFile))
> ! ereport(PANIC,
> ! (errcode_for_file_access(),
> ! errmsg("could not close log file %u, segment %u: %m",
> ! openLogId, openLogSeg)));
> ! openLogFile = -1;
> ! }
> }
> sync_method = new_sync_method;
> open_sync_bit = new_sync_bit;
> --- 5624,5630 ----
> errmsg("could not fsync log file %u, segment %u: %m",
> openLogId, openLogSeg)));
> if (open_sync_bit != new_sync_bit)
> ! XLogFileClose();
> }
> sync_method = new_sync_method;
> open_sync_bit = new_sync_bit;

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +