Re: Reviewing freeze map code

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reviewing freeze map code
Date: 2016-07-07 07:04:24
Message-ID: CAD21AoCQQ3aywTOuz7CbmiLEiAToua8UXc7Wf4N7aNbjqJLjtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Than you for reviewing!

On Thu, Jul 7, 2016 at 7:06 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-07-05 23:37:59 +0900, Masahiko Sawada wrote:
>> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
>> index 57da57a..fd66527 100644
>> --- a/src/backend/access/heap/heapam.c
>> +++ b/src/backend/access/heap/heapam.c
>> @@ -3923,6 +3923,17 @@ l2:
>>
>> if (need_toast || newtupsize > pagefree)
>> {
>> + /*
>> + * To prevent data corruption due to updating old tuple by
>> + * other backends after released buffer
>
> That's not really the reason, is it? The prime problem is crash safety /
> replication. The row-lock we're faking (by setting xmax to our xid),
> prevents concurrent updates until this backend died.

Fixed.

>> , we need to emit that
>> + * xmax of old tuple is set and clear visibility map bits if
>> + * needed before releasing buffer. We can reuse xl_heap_lock
>> + * for this purpose. It should be fine even if we crash midway
>> + * from this section and the actual updating one later, since
>> + * the xmax will appear to come from an aborted xid.
>> + */
>> + START_CRIT_SECTION();
>> +
>> /* Clear obsolete visibility flags ... */
>> oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
>> oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
>> @@ -3936,6 +3947,46 @@ l2:
>> /* temporarily make it look not-updated */
>> oldtup.t_data->t_ctid = oldtup.t_self;
>> already_marked = true;
>> +
>> + /* Clear PD_ALL_VISIBLE flags */
>> + if (PageIsAllVisible(BufferGetPage(buffer)))
>> + {
>> + all_visible_cleared = true;
>> + PageClearAllVisible(BufferGetPage(buffer));
>> + visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
>> + vmbuffer);
>> + }
>> +
>> + MarkBufferDirty(buffer);
>> +
>> + if (RelationNeedsWAL(relation))
>> + {
>> + xl_heap_lock xlrec;
>> + XLogRecPtr recptr;
>> +
>> + /*
>> + * For logical decoding we need combocids to properly decode the
>> + * catalog.
>> + */
>> + if (RelationIsAccessibleInLogicalDecoding(relation))
>> + log_heap_new_cid(relation, &oldtup);
>
> Hm, I don't see that being necessary here. Row locks aren't logically
> decoded, so there's no need to emit this here.

Fixed.

>
>> + /* Clear PD_ALL_VISIBLE flags */
>> + if (PageIsAllVisible(page))
>> + {
>> + Buffer vmbuffer = InvalidBuffer;
>> + BlockNumber block = BufferGetBlockNumber(*buffer);
>> +
>> + all_visible_cleared = true;
>> + PageClearAllVisible(page);
>> + visibilitymap_pin(relation, block, &vmbuffer);
>> + visibilitymap_clear(relation, block, vmbuffer);
>> + }
>> +
>
> That clears all-visible unnecessarily, we only need to clear all-frozen.
>

Fixed.

>
>> @@ -8694,6 +8761,23 @@ heap_xlog_lock(XLogReaderState *record)
>> }
>> HeapTupleHeaderSetXmax(htup, xlrec->locking_xid);
>> HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
>> +
>> + /* The visibility map need to be cleared */
>> + if ((xlrec->infobits_set & XLHL_ALL_VISIBLE_CLEARED) != 0)
>> + {
>> + RelFileNode rnode;
>> + Buffer vmbuffer = InvalidBuffer;
>> + BlockNumber blkno;
>> + Relation reln;
>> +
>> + XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
>> + reln = CreateFakeRelcacheEntry(rnode);
>> +
>> + visibilitymap_pin(reln, blkno, &vmbuffer);
>> + visibilitymap_clear(reln, blkno, vmbuffer);
>> + PageClearAllVisible(page);
>> + }
>> +
>
>
>> PageSetLSN(page, lsn);
>> MarkBufferDirty(buffer);
>> }
>> diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
>> index a822d0b..41b3c54 100644
>> --- a/src/include/access/heapam_xlog.h
>> +++ b/src/include/access/heapam_xlog.h
>> @@ -242,6 +242,7 @@ typedef struct xl_heap_cleanup_info
>> #define XLHL_XMAX_EXCL_LOCK 0x04
>> #define XLHL_XMAX_KEYSHR_LOCK 0x08
>> #define XLHL_KEYS_UPDATED 0x10
>> +#define XLHL_ALL_VISIBLE_CLEARED 0x20
>
> Hm. We can't easily do that in the back-patched version; because a
> standby won't know to check for the flag . That's kinda ok, since we
> don't yet need to clear all-visible yet at that point of
> heap_update. But that better means we don't do so on the master either.
>

Attached latest version patch.
I changed visibilitymap_clear function so that it allows to specify
bits being cleared.
The function that needs to clear the only all-frozen bit on visibility
map calls visibilitymap_clear_extended function to clear particular
bit.
Other function can call visibilitymap_clear function to clear all bits
for one page.

Instead of adding XLHL_ALL_VISIBLE_CLEARED, we do vm loop up for back branches.
To reduce unnecessary looking up visibility map, I changed it so that
we check the PD_ALL_VISIBLE on heap page, and then look up all-frozen
bit on visibility map if necessary.

Regards,

--
Masahiko Sawada

Attachment Content-Type Size
emit_wal_already_marked_true_case_v4.patch text/x-diff 6.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-07-07 07:07:13 Re: Don't include MMAP DSM's transient files in basebackup
Previous Message Amit Langote 2016-07-07 06:59:30 Re: pg_xlogfile_name_offset() et al and recovery