WIP patch for hint bit i/o mitigation

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: WIP patch for hint bit i/o mitigation
Date: 2012-11-06 23:55:54
Message-ID: CAHyXU0zn5emePLedoZUGrAQiF92F-YjvFr-P5vUh6n0WpKZ6PQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Following the sig is a first cut at a patch (written by Atri) that
attempts to mitigate hint bit i/o penalty when many pages worth of
tuples are sequentially written out with the same transaction id.
There have been other attempts to deal with this problem that fit
niche cases (especially those that create the table in the same
transaction as the one inserting) that work but don't really solve the
problem generally.

I previously attacked this problem ([1], [2]) and came up with a patch
that cached hint bits inside tqual.c. The patch was pulled for a few
reasons:

1) a lot of complexity without proper justification
2) sketchy cache replacement algorithm
3) I manged to misspell 'committed' just about everywhere
4) invalidation?

Issues 1-3 could have been worked out but #4 was making me think the
problem was a nonstarter, or at least, 'too much too soon'. The tuple
visibility routines are in a very tight code path and having to deal
with various things in the backend that could cause the xid to become
stale were making me nervous. A smaller, simpler patch might be the
ticket.

What's happening here is that we're putting in 'last xid' cache in the
same manner and style as in transam.c. Unlike transam.c though, we're
caching the infomask bit only and only if we're safe per
XLogNeedsFlush(). While useful in its own right, the transam.c 'last
xid' cache is not sufficient because:

1) While it does guard against clog fetch, it does nothing to mitigate
hint bit penalties which are just as bad if not worse since you get a
page write+page header lock during a broad scan
2) In the very tight loops that crank HeapTupleSatisfiesMVCC, removing
the minimal non-clog based processing (TransactionIdIsInProgress, etc)
helps a little bit in cpu terms when scanning the table for the very
first time.

So given that -- the patch simple adds an extra check when/where hint
bit status is checked in the visibility routines (currently, only
HeapTupleSatisfiesMVCC is done but all the applicable visibility
routines should be done). Basically, the way it works is like this:

*) is hint bit set?
*) if not? does the examined xid match the last examined one?
*) if so, and the cached hint bit matches the one want, proceeed as if
hint bit was set

Basically, this adds two tests to visibility check in the worst case
(zero if hint bit was set, one if xid does not match). Pretty cheap.
If you fault through the cached xid and dirty the page, then you're
performance shouldn't be worse than the status quo. Also, dealing
with 'last xid' is immune from invalidation issues [3] which is a nice
property.

The most logic place to *set* the cached xid/bit seemed to be in
SetHintBits(). Possible improvements might be to:
*) set the hint bit if the page is dirty
and/or
*) set the hint bit but do not dirty the page upon cache (although
this might have bad interplay with 'all hint bits must be WAL logged'
page checksum proposal)

Atri benched the patch and came up with the following:

atri(at)atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench
atri -n -f bench.sql -c 8 -T 120
transaction type: Custom query
number of transactions actually processed: 1433
tps = 11.900653 (including connections establishing)
tps = 11.903127 (excluding connections establishing)

atri(at)atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench
atri -n -f bench.sql -c 8 -T 120
transaction type: Custom query
number of transactions actually processed: 1769
tps = 14.669422 (including connections establishing)
tps = 14.673167 (excluding connections establishing)

bench.sql:
create temp table foo as select v from generate_series(1,100000) v;
select * from foo;
drop table foo;

merlin

notes:
[1] http://archives.postgresql.org/pgsql-hackers/2011-03/msg01765.php
[2] http://archives.postgresql.org/message-id/BANLkTinr1h1+rKX1UOukn-rAONBTEHQvew@mail.gmail.com)
[3] http://postgresql.1045698.n5.nabble.com/Is-cachedFetchXidStatus-provably-valid-td5712454.html

patch:

*** tqual1.c 2012-09-20 03:17:58.000000000 +0530
--- tqual.c 2012-11-06 00:50:39.769409077 +0530
***************
*** 72,81 ****
SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};

/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

-
/*
* SetHintBits()
*
--- 72,83 ----
SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};

+ static TransactionId cachedVisibilityXid;
+ static uint16 cachedVisibilityXidStatus;
+
/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

/*
* SetHintBits()
*
***************
*** 117,122 ****
--- 119,127 ----

if (XLogNeedsFlush(commitLSN) && BufferIsPermanent(buffer))
return; /* not flushed yet, so don't set hint */
+
+ cachedVisibilityXid = xid;
+ cachedVisibilityXidStatus = infomask;
}

tuple->t_infomask |= infomask;
***************
*** 904,910 ****
HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
--- 909,917 ----
HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
*** 1008,1014 ****
return true;
}

! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
{
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
--- 1015,1023 ----
return true;
}

! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
{
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-11-07 00:37:00 Re: Update obsolete text in indexam.sgml
Previous Message Tom Lane 2012-11-06 23:32:36 Re: RFC: Timing Events