Re: RC2 and open issues

Lists: pgsql-hackers
From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: RC2 and open issues
Date: 2004-12-21 02:12:18
Message-ID: 200412210212.iBL2CIg21789@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We are now packaging RC2. If nothing comes up after RC2 is released, we
can move to final release.

The open items list is attached. The doc changes can be easily
completed before final. The only code issue left is with bgwriter. We
always knew we needed to find better defaults for its parameters, but we
are only now finding more fundamental issues.

I think the summary I have seen recently pegs it right --- our use of %
of dirty buffers requires a scan of the entire buffer cache, and the
current delay of bgwriter is too high, but we can't lower it because the
buffer cache scan will become too expensive if done too frequently.

I think the ideal solution would be to remove bgwriter_percent or change
it to be a percentage of all buffers, not just dirty buffers, so we
don't have to scan the entire list. If we set the new value to 10% with
a delay of 1 second, and the bgwriter remembers the place it stopped
scanning the buffer cache, you will clean out the buffer cache
completely every 10 seconds.

Right now it seems no one can find proper values. We were clear that
this was an issue but it is bad news that we are only addressing it
during RC.

The 8.1 solution is to have some feedback system so writes by individual
backends cause the bgwriter to work more frequently.

The big question is what to do during RC2? Do we just leave it as
suboptimal knowing we will revisit it in 8.1 or try an incremental
solution for 8.0 that might work better.

We have to decide now.

---------------------------------------------------------------------------

PostgreSQL 8.0 Open Items
=========================

Current version at http://candle.pha.pa.us/cgi-bin/pgopenitems.

Changes
-------
* change bgwriter buffer scan behavior?
* adjust bgwriter defaults

Documentation
-------------
* synchonize supported encodings and docs
* improve external interfaces documentation section
* manual pages

Fixed Since Last Beta
---------------------

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RC2 and open issues
Date: 2004-12-21 02:35:27
Message-ID: 955.1103596527@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> I think the ideal solution would be to remove bgwriter_percent or change
> it to be a percentage of all buffers, not just dirty buffers, so we
> don't have to scan the entire list. If we set the new value to 10% with
> a delay of 1 second, and the bgwriter remembers the place it stopped
> scanning the buffer cache, you will clean out the buffer cache
> completely every 10 seconds.

But we don't *want* it to clean out the buffer cache completely.
There's no point in writing a "hot" page every few seconds. So I don't
think I believe in remembering where we stopped anyway.

I think there's a reasonable case to be made for redefining
bgwriter_percent as the max percent of the total buffer list to scan
(not the max percent of the list to return --- Jan correctly pointed out
that the latter is useless). Then we could modify
StrategyDirtyBufferList so that the percent and maxpages parameters are
passed in, so it can stop as soon as either one is satisfied. This
would be a fairly small/safe code change and I wouldn't have a problem
doing it even at this late stage of the cycle.

Howeve ... we would have to crank up the default bgwriter_percent,
and I don't know if we have any better idea what to set it to after
such a change than we do now ...

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RC2 and open issues
Date: 2004-12-21 03:46:40
Message-ID: 200412210346.iBL3keZ01528@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > I think the ideal solution would be to remove bgwriter_percent or change
> > it to be a percentage of all buffers, not just dirty buffers, so we
> > don't have to scan the entire list. If we set the new value to 10% with
> > a delay of 1 second, and the bgwriter remembers the place it stopped
> > scanning the buffer cache, you will clean out the buffer cache
> > completely every 10 seconds.
>
> But we don't *want* it to clean out the buffer cache completely.

You are only cleaning out in pieces over a 10 second period so it is
getting dirty. You are not scanning the entire buffer at one time.

> There's no point in writing a "hot" page every few seconds. So I don't
> think I believe in remembering where we stopped anyway.

I was thinking if you are doing this scanning every X milliseconds then
after a while the front of the buffer cache will be mostly clean and the
end will be dirty so you will always be going over the same early ones
to get to the later dirty ones. Remembering the location gives the scan
more uniform coverage of the buffer cache.

You need a "clock sweep" like BSD uses (and probably others).

> I think there's a reasonable case to be made for redefining
> bgwriter_percent as the max percent of the total buffer list to scan
> (not the max percent of the list to return --- Jan correctly pointed out
> that the latter is useless). Then we could modify
> StrategyDirtyBufferList so that the percent and maxpages parameters are
> passed in, so it can stop as soon as either one is satisfied. This
> would be a fairly small/safe code change and I wouldn't have a problem
> doing it even at this late stage of the cycle.
>
> Howeve ... we would have to crank up the default bgwriter_percent,
> and I don't know if we have any better idea what to set it to after
> such a change than we do now ...

Once we make the change we will have to get our testers working on it.
We need those figure to change over time based on backends doing writes
but ath isn't going to happen for 8.0.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RC2 and open issues
Date: 2004-12-21 04:00:31
Message-ID: 2309.1103601631@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> You need a "clock sweep" like BSD uses (and probably others).

No, that's *fundamentally* wrong.

The reason we are going to the trouble of maintaining a complicated
cache algorithm like ARC is so that we can tell the heavily used pages
from the lesser used ones. To throw away that knowledge in favor of
doing I/O with a plain clock sweep algorithm is just wrong.

What's more, I don't even understand what clock sweep would mean given
that the ordering of the list is constantly changing.

regards, tom lane