Re: simplify register_dirty_segment()
- From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
- To: pgsql-hackers(at)postgresql(dot)org
- Subject: Re: simplify register_dirty_segment()
- Date: Tue, 26 Apr 2005 11:27:55 +0800
- Message-id: <d4kcjl$1f8n$1(at)news(dot)hub(dot)org>
"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes
> On platforms that I'm familiar with, an fsync call causes the kernel
> to spend a significant amount of time groveling through its buffers
> to see if any are dirty. We shouldn't incur that cost to buy marginal
> speedups at the application level. (In other words, "it's only an
> open/close" is wrong.)
>
I did some tests in SunOS, Linux and windows. Basically, I create 100 files,
close them. Reopen them, write(dirty)/read(clean) 8192*100 bytes each, then
fsync() them. I mesured the fsync() time.
SunOS 5.8 + NFS + SCSI
Fsync dirty files: duration: 2404.573 ms
Fsync clean files: duration: 598.037 ms
Linux 2.4 + Ext3 + IDE
Fsync dirty files: duration: 6951.793 ms
Fsync clean files: duration: 18.132 ms
Window2000 + NTFS + IDE
Fsync dirty files: duration: 3005.000 ms
Fsync clean files: duration: 1101.000 ms
I can't figure out why it tooks so long time in windows and SunOS for clean
files - a possible reason is that they have to fsync some inode information
like last access time even for clean files. Linux is quite smart in this
sense.
> Also, it's not clear to me how this idea works at all, if a backend holds
> a relation open across more than one checkpoint. What will re-register
> the segment for the next cycle?
>
You are right. A possible (but not clean) solution is like this: The
bgwriter maintain a refcount for each file. When the file is open,
refcount++, when the file is closing, refcount--. When the refcount goes to
zero, Bgwriter could safely remove it from its PendingOpsTable after
checkpoint.
Regards,
Qingqing
Home |
Main Index |
Thread Index