double-buffering page writes

Lists: pgsql-hackers
From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: double-buffering page writes
Date: 2008-10-21 16:04:37
Message-ID: 20081021160437.GA4001@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I'm trying to see if it makes sense to do the double-buffering of page
writes before going further ahead with CRC checking. I came up with the
attached patch; it does the double-buffering inconditionally, because as
it was said, it allows releasing the io_in_progress lock (and resetting
BM_IO_IN_PROGRESS) early.

So far I have not managed to convince me that this is a correct change
to make; the io_in_progress bits are pretty convoluted -- for example, I
wonder how does "releasing" the buffer early (before actually sending
the write to the kernel and marking it not dirty) interact with
checkpoint and a possible full-page image.

Basically the change is to take the unsetting of BM_DIRTY out of
TerminateBufferIO and into its own routine; and in FlushBuffer, release
io_in_progress just after copying the buffer contents elsewhere, and
mark the buffer not dirty after actually doing the write.

Thoughts?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment Content-Type Size
dblbuf.patch text/x-diff 6.1 KB

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: double-buffering page writes
Date: 2008-10-23 02:09:43
Message-ID: 20081023105338.B715.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:

> I'm trying to see if it makes sense to do the double-buffering of page
> writes before going further ahead with CRC checking. I came up with the
> attached patch; it does the double-buffering inconditionally, because as
> it was said, it allows releasing the io_in_progress lock (and resetting
> BM_IO_IN_PROGRESS) early.

I have some comments about the double-buffering:

- Are there any performance degradation because of addtional memcpy?
8kB of memcpy seems not to be free.

- Is it ok to allocale dblbuf[BLCKSZ] as local variable?
It might be unaligned. AFAICS we avoid such usages in other places.

- It is the best if we can delay double-buffering until locks are
conflicted actually. But we might need to allocale shadow buffers
from shared buffers instead of local memory.

- Are there any other modules that can share in the benefits of
double-buffering? For example, we could avoid avoid waiting for
LockBufferForCleanup(). It is cool if the double-buffering can
be used for multiple purposes.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: double-buffering page writes
Date: 2008-10-23 16:02:46
Message-ID: 20081023160246.GD4845@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

ITAGAKI Takahiro wrote:

> I have some comments about the double-buffering:

Since posting this patch I have realized that this implementation is
bogus. I'm now playing with WAL-logging hint bits though. As to your
questions:

> - Are there any performance degradation because of addtional memcpy?
> 8kB of memcpy seems not to be free.

Of course, it is not free. However it comes with the benefit that we
can release the io_in_progress lock earlier for the block -- we lock,
copy, unlock; whereas the old code did lock, write(), unlock. Avoding a
system call in the locked area could be a win. Whether this is a net
benefit is something that I have not measured.

> - Is it ok to allocale dblbuf[BLCKSZ] as local variable?
> It might be unaligned. AFAICS we avoid such usages in other places.

I thought about that too. I admit I am not sure if this really works
portably; however I don't want to add a palloc() to that routine.

> - It is the best if we can delay double-buffering until locks are
> conflicted actually. But we might need to allocale shadow buffers
> from shared buffers instead of local memory.

The point of double-buffering is that the potential writer (a process
doing concurrent hint-bit setting) is not going to grab any locks.

> - Are there any other modules that can share in the benefits of
> double-buffering? For example, we could avoid avoid waiting for
> LockBufferForCleanup(). It is cool if the double-buffering can
> be used for multiple purposes.

Not sure on this.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: double-buffering page writes
Date: 2008-10-23 16:41:44
Message-ID: 4900A948.1000606@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> ITAGAKI Takahiro wrote:
>
>> I have some comments about the double-buffering:
>
> Since posting this patch I have realized that this implementation is
> bogus. I'm now playing with WAL-logging hint bits though.

Yeah, the torn page + hint bit updates problem is the tough question.

>> - Is it ok to allocale dblbuf[BLCKSZ] as local variable?
>> It might be unaligned. AFAICS we avoid such usages in other places.
>
> I thought about that too. I admit I am not sure if this really works
> portably; however I don't want to add a palloc() to that routine.

It should work, AFAIK, but unaligned memcpy()s and write()s can be a
significantly slower. There can be only one write() happening at any
time, so you could just palloc() a single 8k buffer in TopMemoryContext
in backend startup, and always use that.

>> - Are there any other modules that can share in the benefits of
>> double-buffering? For example, we could avoid avoid waiting for
>> LockBufferForCleanup(). It is cool if the double-buffering can
>> be used for multiple purposes.
>
> Not sure on this.

You'd need to keep both versions of the buffer simultaneously in the
buffer cache for that. When we talked about the various designs for HOT,
that was one of the ideas I had to enable more aggressive pruning: if
you can't immediately get a vacuum lock, allocate a new buffer in the
buffer cache for the same block, copy the page to the new buffer, and do
the pruning, including moving tuples around, there. Any new ReadBuffer
calls would return the new page version, but old readers would keep
referencing the old one. The intrusive part of that approach, in
addition to the obvious changes required in the buffer manager to keep
around multiple copies of the same block, is that all modifications must
be done on the new version, so anyone who needs to lock the page for
modification would need to switch to the new page version at the
LockBuffer call.

As discussed in the other thread with Simon, we also use vacuum locks in
b-tree for waiting out index scans, so avoiding the waiting there would
be just wrong.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: double-buffering page writes
Date: 2008-10-23 16:54:22
Message-ID: 10674.1224780862@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Alvaro Herrera wrote:
>> I thought about that too. I admit I am not sure if this really works
>> portably; however I don't want to add a palloc() to that routine.

> It should work, AFAIK, but unaligned memcpy()s and write()s can be a
> significantly slower. There can be only one write() happening at any
> time, so you could just palloc() a single 8k buffer in TopMemoryContext
> in backend startup, and always use that.

Some time ago, we arranged for shared buffers to be aligned on *more*
than maxalign boundaries (cf BUFFERALIGN) because on at least some
platforms this makes a very significant difference in the speed of
copying to/from kernel space. If you are going to double-buffer it
is going to be important to have the intermediate buffer just as well
aligned. A local char array won't be acceptable, and even for a
palloc'd one you'll need to take some extra steps (like wasting
ALIGNOF_BUFFER extra bytes so you can align the pointer palloc gives).

regards, tom lane