Quick Links

Checkpoint gets stuck in mdsync

Lists:	pgsql-hackers

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Checkpoint gets stuck in mdsync
Date:	2007-04-05 09:11:34
Message-ID:	4614BD46.2070800@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Now that the CheckpointStartLock starvation has been taken care of, I'm
seeing another problem with checkpoints in my test run: mdsync never
finishes.

Here's what's happening:
1. checkpoint calls mdsync
2. mdsync start processing pending fsyncs from pendingOpsTable
(at this point, normal backends have to start doing writes themselves,
because bgwriter is busy checkpointing and isn't keeping buffers clean)
3. after fsyncing 10 files, it calls AbsorbFsyncRequests
4. AbsorbFsyncRequests puts back entries into pendingOpsTable for those
files that were already fsynced.
5. mdsync starts over, goto 2.

The loop doesn't end until the test run is over, mdsync keeps fsyncing
the same over and over again.

My proposed fix is to make a copy of pendingOpsTable before entering the
loop. AbsorbFsyncRequest will put new requests to a fresh new
pendingOpsTable, while the mdsync loop will drain the copy. I'll write a
patch along those lines if there's no better ideas.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Checkpoint gets stuck in mdsync
Date:	2007-04-05 09:31:15
Message-ID:	20070405181527.04BD.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:

> Now that the CheckpointStartLock starvation has been taken care of, I'm
> seeing another problem with checkpoints in my test run: mdsync never
> finishes.
>
> My proposed fix is to make a copy of pendingOpsTable before entering the
> loop. AbsorbFsyncRequest will put new requests to a fresh new
> pendingOpsTable, while the mdsync loop will drain the copy. I'll write a
> patch along those lines if there's no better ideas.

Yeah, I'm also anxious about the stuck. I wrote a fix to use a copy of
pendingOpsTable as you said, when I implemented Load distributed checkpoint
patch. (http://momjian.us/mhonarc/patches/msg00025.html) It would make me
very happy if you review my patch and check whether my fix is proper.

There was another reason to fix it in my patch. I wanted to fsync files
only once for each file because bgwriter sleeps for each file in my patch.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Checkpoint gets stuck in mdsync
Date:	2007-04-05 09:35:33
Message-ID:	4614C2E5.1030304@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

ITAGAKI Takahiro wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>
>> Now that the CheckpointStartLock starvation has been taken care of, I'm
>> seeing another problem with checkpoints in my test run: mdsync never
>> finishes.
>>
>> My proposed fix is to make a copy of pendingOpsTable before entering the
>> loop. AbsorbFsyncRequest will put new requests to a fresh new
>> pendingOpsTable, while the mdsync loop will drain the copy. I'll write a
>> patch along those lines if there's no better ideas.
>
> Yeah, I'm also anxious about the stuck. I wrote a fix to use a copy of
> pendingOpsTable as you said, when I implemented Load distributed checkpoint
> patch. (http://momjian.us/mhonarc/patches/msg00025.html) It would make me
> very happy if you review my patch and check whether my fix is proper.
>
> There was another reason to fix it in my patch. I wanted to fsync files
> only once for each file because bgwriter sleeps for each file in my patch.

Ah, I see. I looked at the patch briefly a few days ago, and wondered
why there was so many changes to mdsync. I didn't realize there was a
fix to the "getting stuck" problem in there as well.

I'll take a closer look, and try to write a patch to just fix the
"getting stuck" problem, but in a way that anticipates the load
distributed checkpoint patch so that it doesn't need to be rewritten again.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Checkpoint gets stuck in mdsync
Date:	2007-04-05 10:58:40
Message-ID:	4614D660.5080503@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I just posted a patch to pgsql-patches that fixes the issue along the
lines of your Load distributed checkpoint patch. Load distributed
checkpoint patch now just needs to add the "calculate total file length"
and the nap delay to mdsync.

Thanks for the patch!

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Checkpoint gets stuck in mdsync
Date:	2007-04-05 14:39:38
Message-ID:	27041.1175783978@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> My proposed fix is to make a copy of pendingOpsTable before entering the
> loop. AbsorbFsyncRequest will put new requests to a fresh new
> pendingOpsTable, while the mdsync loop will drain the copy. I'll write a
> patch along those lines if there's no better ideas.

That sounds pretty ugly. Perhaps better is a "cycle ID" field added to
the table entries, assigned from a counter that's bumped before entering
the fsync loop. Then you could distinguish entries made before starting
the loop from those made after. One fine point is to not let
AbsorbFsyncRequest change the cycle ID on a pre-existing entry ...

regards, tom lane