Quick Links

Re: WIP patch for hint bit i/o mitigation

Lists:	pgsql-hackers

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject:	WIP patch for hint bit i/o mitigation
Date:	2012-11-06 23:55:54
Message-ID:	CAHyXU0zn5emePLedoZUGrAQiF92F-YjvFr-P5vUh6n0WpKZ6PQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Following the sig is a first cut at a patch (written by Atri) that
attempts to mitigate hint bit i/o penalty when many pages worth of
tuples are sequentially written out with the same transaction id.
There have been other attempts to deal with this problem that fit
niche cases (especially those that create the table in the same
transaction as the one inserting) that work but don't really solve the
problem generally.

I previously attacked this problem ([1], [2]) and came up with a patch
that cached hint bits inside tqual.c. The patch was pulled for a few
reasons:

1) a lot of complexity without proper justification
2) sketchy cache replacement algorithm
3) I manged to misspell 'committed' just about everywhere
4) invalidation?

Issues 1-3 could have been worked out but #4 was making me think the
problem was a nonstarter, or at least, 'too much too soon'. The tuple
visibility routines are in a very tight code path and having to deal
with various things in the backend that could cause the xid to become
stale were making me nervous. A smaller, simpler patch might be the
ticket.

What's happening here is that we're putting in 'last xid' cache in the
same manner and style as in transam.c. Unlike transam.c though, we're
caching the infomask bit only and only if we're safe per
XLogNeedsFlush(). While useful in its own right, the transam.c 'last
xid' cache is not sufficient because:

1) While it does guard against clog fetch, it does nothing to mitigate
hint bit penalties which are just as bad if not worse since you get a
page write+page header lock during a broad scan
2) In the very tight loops that crank HeapTupleSatisfiesMVCC, removing
the minimal non-clog based processing (TransactionIdIsInProgress, etc)
helps a little bit in cpu terms when scanning the table for the very
first time.

So given that -- the patch simple adds an extra check when/where hint
bit status is checked in the visibility routines (currently, only
HeapTupleSatisfiesMVCC is done but all the applicable visibility
routines should be done). Basically, the way it works is like this:

*) is hint bit set?
*) if not? does the examined xid match the last examined one?
*) if so, and the cached hint bit matches the one want, proceeed as if
hint bit was set

Basically, this adds two tests to visibility check in the worst case
(zero if hint bit was set, one if xid does not match). Pretty cheap.
If you fault through the cached xid and dirty the page, then you're
performance shouldn't be worse than the status quo. Also, dealing
with 'last xid' is immune from invalidation issues [3] which is a nice
property.

The most logic place to *set* the cached xid/bit seemed to be in
SetHintBits(). Possible improvements might be to:
*) set the hint bit if the page is dirty
and/or
*) set the hint bit but do not dirty the page upon cache (although
this might have bad interplay with 'all hint bits must be WAL logged'
page checksum proposal)

Atri benched the patch and came up with the following:

atri(at)atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench
atri -n -f bench.sql -c 8 -T 120
transaction type: Custom query
number of transactions actually processed: 1433
tps = 11.900653 (including connections establishing)
tps = 11.903127 (excluding connections establishing)

atri(at)atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench
atri -n -f bench.sql -c 8 -T 120
transaction type: Custom query
number of transactions actually processed: 1769
tps = 14.669422 (including connections establishing)
tps = 14.673167 (excluding connections establishing)

bench.sql:
create temp table foo as select v from generate_series(1,100000) v;
select * from foo;
drop table foo;

merlin

notes:
[1] http://archives.postgresql.org/pgsql-hackers/2011-03/msg01765.php
[2] http://archives.postgresql.org/message-id/BANLkTinr1h1+rKX1UOukn-rAONBTEHQvew@mail.gmail.com)
[3] http://postgresql.1045698.n5.nabble.com/Is-cachedFetchXidStatus-provably-valid-td5712454.html

patch:

*** tqual1.c 2012-09-20 03:17:58.000000000 +0530
--- tqual.c 2012-11-06 00:50:39.769409077 +0530
***************
*** 72,81 ****
SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};

/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

-
/*
* SetHintBits()
*
--- 72,83 ----
SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
SnapshotData SnapshotToastData = {HeapTupleSatisfiesToast};

+ static TransactionId cachedVisibilityXid;
+ static uint16 cachedVisibilityXidStatus;
+
/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

/*
* SetHintBits()
*
***************
*** 117,122 ****
--- 119,127 ----

if (XLogNeedsFlush(commitLSN) && BufferIsPermanent(buffer))
return; /* not flushed yet, so don't set hint */
+
+ cachedVisibilityXid = xid;
+ cachedVisibilityXidStatus = infomask;
}

tuple->t_infomask |= infomask;
***************
*** 904,910 ****
HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
--- 909,917 ----
HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
*** 1008,1014 ****
return true;
}

! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
{
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
--- 1015,1023 ----
return true;
}

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Merlin Moncure'" <mmoncure(at)gmail(dot)com>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Cc:	"'Atri Sharma'" <atri(dot)jiit(at)gmail(dot)com>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-07 10:16:46
Message-ID:	006901cdbcd0$fed84c20$fc88e460$@kapila@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Merlin Moncure
> Sent: Wednesday, November 07, 2012 5:26 AM
> To: PostgreSQL-development
> Cc: Atri Sharma
> Subject: [HACKERS] WIP patch for hint bit i/o mitigation
>
> Following the sig is a first cut at a patch (written by Atri) that
> attempts to mitigate hint bit i/o penalty when many pages worth of
> tuples are sequentially written out with the same transaction id.
> There have been other attempts to deal with this problem that fit
> niche cases (especially those that create the table in the same
> transaction as the one inserting) that work but don't really solve the
> problem generally.
>
> I previously attacked this problem ([1], [2]) and came up with a patch
> that cached hint bits inside tqual.c. The patch was pulled for a few
> reasons:
>
> 1) a lot of complexity without proper justification
> 2) sketchy cache replacement algorithm
> 3) I manged to misspell 'committed' just about everywhere
> 4) invalidation?
>
> Issues 1-3 could have been worked out but #4 was making me think the
> problem was a nonstarter, or at least, 'too much too soon'. The tuple
> visibility routines are in a very tight code path and having to deal
> with various things in the backend that could cause the xid to become
> stale were making me nervous. A smaller, simpler patch might be the
> ticket.

About invalidation, I think the cached xid can become invalid due to xid
wraparound.
So for that one way could be to invalidate it through Vacuum.

Though I am not sure what all other things can make cached id as invalid,
but I think once we
can think what other ways can make cached id invalid, then we can see if
there is a solution to address
them.

With Regards,
Amit Kapila.

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-07 10:31:37
Message-ID:	8E053B80-CA01-44F8-A1BD-350598EB21FF@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 07-Nov-2012, at 15:46, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:

>> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
>> owner(at)postgresql(dot)org] On Behalf Of Merlin Moncure
>> Sent: Wednesday, November 07, 2012 5:26 AM
>> To: PostgreSQL-development
>> Cc: Atri Sharma
>> Subject: [HACKERS] WIP patch for hint bit i/o mitigation
>>
>> Following the sig is a first cut at a patch (written by Atri) that
>> attempts to mitigate hint bit i/o penalty when many pages worth of
>> tuples are sequentially written out with the same transaction id.
>> There have been other attempts to deal with this problem that fit
>> niche cases (especially those that create the table in the same
>> transaction as the one inserting) that work but don't really solve the
>> problem generally.
>>
>> I previously attacked this problem ([1], [2]) and came up with a patch
>> that cached hint bits inside tqual.c. The patch was pulled for a few
>> reasons:
>>
>> 1) a lot of complexity without proper justification
>> 2) sketchy cache replacement algorithm
>> 3) I manged to misspell 'committed' just about everywhere
>> 4) invalidation?
>>
>> Issues 1-3 could have been worked out but #4 was making me think the
>> problem was a nonstarter, or at least, 'too much too soon'. The tuple
>> visibility routines are in a very tight code path and having to deal
>> with various things in the backend that could cause the xid to become
>> stale were making me nervous. A smaller, simpler patch might be the
>> ticket.
>
> About invalidation, I think the cached xid can become invalid due to xid
> wraparound.
> So for that one way could be to invalidate it through Vacuum.
>
> Though I am not sure what all other things can make cached id as invalid,
> but I think once we
> can think what other ways can make cached id invalid, then we can see if
> there is a solution to address
> them.
>
>
> With Regards,
> Amit Kapila.
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

As we are now dealing with only the last xid(please refer to the details of the patch attached), the invalidation issues are not significant any more.

Regards,

Atri

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Atri Sharma'" <atri(dot)jiit(at)gmail(dot)com>
Cc:	"'Merlin Moncure'" <mmoncure(at)gmail(dot)com>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-07 12:01:27
Message-ID:	006a01cdbcdf$9e3429c0$da9c7d40$@kapila@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> -----Original Message-----
> From: Atri Sharma [mailto:atri(dot)jiit(at)gmail(dot)com]
> Sent: Wednesday, November 07, 2012 4:02 PM
> To: Amit Kapila
> Cc: Merlin Moncure; PostgreSQL-development
> Subject: Re: [HACKERS] WIP patch for hint bit i/o mitigation
>
> On 07-Nov-2012, at 15:46, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
>
> >> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> >> owner(at)postgresql(dot)org] On Behalf Of Merlin Moncure
> >> Sent: Wednesday, November 07, 2012 5:26 AM
> >> To: PostgreSQL-development
> >> Cc: Atri Sharma
> >> Subject: [HACKERS] WIP patch for hint bit i/o mitigation
> >>
> >> Following the sig is a first cut at a patch (written by Atri) that
> >> attempts to mitigate hint bit i/o penalty when many pages worth of
> >> tuples are sequentially written out with the same transaction id.
> >> There have been other attempts to deal with this problem that fit
> >> niche cases (especially those that create the table in the same
> >> transaction as the one inserting) that work but don't really solve
> the
> >> problem generally.
>
> As we are now dealing with only the last xid(please refer to the details
> of the patch attached), the invalidation issues are not significant any
> more.

I think you are right, but please check below the scenario I have in mind
due to which I got confused:

Session -1
1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
inside SetHinbits and set it to xid of tuple which is let's say = 708
2. now for all consecutive tuples which have same xmin (708), it can
directly refer
cached id and cached status, even if hint bit is not set.

Other Sessions
3. now from other sessions, large operations happened on all other tables
except this table.
4. The situation can reach where xid can wrap around.

Session -1
5. It again tries to query the same table, at this point it will compare
the xid in tuple with cachedVisibilityXid.

Now if all tuple's xid's for that particular table are frozen in all cases
then it seems to be okay, otherwise it might be problem.
I am not fully aware about this wrap around and frozed xid concept, thats
why I had doubted
that it might create problem, on further thought, it appears that I was
wrong.

With Regards,
Amit Kapila.

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-07 13:10:10
Message-ID:	CAOeZVifXP7UQPh-PNbCyVxa_OLYuLr1Gvs7Z-4baDCHdCZBG4w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 7, 2012 at 5:31 PM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:

>
>
> > -----Original Message-----
> > From: Atri Sharma [mailto:atri(dot)jiit(at)gmail(dot)com]
> > Sent: Wednesday, November 07, 2012 4:02 PM
> > To: Amit Kapila
> > Cc: Merlin Moncure; PostgreSQL-development
> > Subject: Re: [HACKERS] WIP patch for hint bit i/o mitigation
> >
> > On 07-Nov-2012, at 15:46, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> >
> > >> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> > >> owner(at)postgresql(dot)org] On Behalf Of Merlin Moncure
> > >> Sent: Wednesday, November 07, 2012 5:26 AM
> > >> To: PostgreSQL-development
> > >> Cc: Atri Sharma
> > >> Subject: [HACKERS] WIP patch for hint bit i/o mitigation
> > >>
> > >> Following the sig is a first cut at a patch (written by Atri) that
> > >> attempts to mitigate hint bit i/o penalty when many pages worth of
> > >> tuples are sequentially written out with the same transaction id.
> > >> There have been other attempts to deal with this problem that fit
> > >> niche cases (especially those that create the table in the same
> > >> transaction as the one inserting) that work but don't really solve
> > the
> > >> problem generally.
> >
> > As we are now dealing with only the last xid(please refer to the details
> > of the patch attached), the invalidation issues are not significant any
> > more.
>
> I think you are right, but please check below the scenario I have in mind
> due to which I got confused:
>
> Session -1
> 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
> inside SetHinbits and set it to xid of tuple which is let's say = 708
> 2. now for all consecutive tuples which have same xmin (708), it can
> directly refer
> cached id and cached status, even if hint bit is not set.
>
> Other Sessions
> 3. now from other sessions, large operations happened on all other tables
> except this table.
> 4. The situation can reach where xid can wrap around.
>
> Session -1
> 5. It again tries to query the same table, at this point it will compare
> the xid in tuple with cachedVisibilityXid.
>
> Now if all tuple's xid's for that particular table are frozen in all cases
> then it seems to be okay, otherwise it might be problem.
> I am not fully aware about this wrap around and frozed xid concept, thats
> why I had doubted
> that it might create problem, on further thought, it appears that I was
> wrong.
>
> With Regards,
> Amit Kapila.
>
>
AFAIK, xid are managed by reference xids, that have a range of +- 2 billion
xids. Once this limit is reached, then reference xids are moved forward,
and the xids that do not fall in the reference xid +- 2 billion are
freezed.Hence, in the given scenario, I believe once the wrap around
happens, since the xmin is same for all the tuples in session-1's table,
there should no be no problem and all tuple's xid for that particular table
will be frozen.

Atri

--
Regards,

Atri
*l'apprenant*

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-07 16:17:23
Message-ID:	CAHyXU0xk-eFA62jV4rqGC6cT3U_PrnFueC69kYU2vKetV0mOHw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
>> >> Following the sig is a first cut at a patch (written by Atri) that
>> >> attempts to mitigate hint bit i/o penalty when many pages worth of
>> >> tuples are sequentially written out with the same transaction id.
>> >> There have been other attempts to deal with this problem that fit
>> >> niche cases (especially those that create the table in the same
>> >> transaction as the one inserting) that work but don't really solve
>> the
>> >> problem generally.
>>
>> As we are now dealing with only the last xid(please refer to the details
>> of the patch attached), the invalidation issues are not significant any
>> more.
>
> I think you are right, but please check below the scenario I have in mind
> due to which I got confused:
>
> Session -1
> 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
> inside SetHinbits and set it to xid of tuple which is let's say = 708
> 2. now for all consecutive tuples which have same xmin (708), it can
> directly refer
> cached id and cached status, even if hint bit is not set.
>
> Other Sessions
> 3. now from other sessions, large operations happened on all other tables
> except this table.
> 4. The situation can reach where xid can wrap around.
>
> Session -1
> 5. It again tries to query the same table, at this point it will compare
> the xid in tuple with cachedVisibilityXid.
>
> Now if all tuple's xid's for that particular table are frozen in all cases
> then it seems to be okay, otherwise it might be problem.
> I am not fully aware about this wrap around and frozed xid concept, thats
> why I had doubted
> that it might create problem, on further thought, it appears that I was
> wrong.

Well there's that. But more to the point for the cache to fail you'd
have to have a scenario where a table didn't scan any records for 1
billion+ transactions. See note [3] above for reasoning why this is
implausible. Also we're already relying on this effect in transam.c.

merlin

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-14 19:12:27
Message-ID:	CAOeZViey8vMcZasnvGVMLNQMcz-xAj-=QzkTYV4h==EgYw1ZrQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
> wrote:
> >> >> Following the sig is a first cut at a patch (written by Atri) that
> >> >> attempts to mitigate hint bit i/o penalty when many pages worth of
> >> >> tuples are sequentially written out with the same transaction id.
> >> >> There have been other attempts to deal with this problem that fit
> >> >> niche cases (especially those that create the table in the same
> >> >> transaction as the one inserting) that work but don't really solve
> >> the
> >> >> problem generally.
> >>
> >> As we are now dealing with only the last xid(please refer to the details
> >> of the patch attached), the invalidation issues are not significant any
> >> more.
> >
> > I think you are right, but please check below the scenario I have in mind
> > due to which I got confused:
> >
> > Session -1
> > 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it go
> > inside SetHinbits and set it to xid of tuple which is let's say = 708
> > 2. now for all consecutive tuples which have same xmin (708), it can
> > directly refer
> > cached id and cached status, even if hint bit is not set.
> >
> > Other Sessions
> > 3. now from other sessions, large operations happened on all other tables
> > except this table.
> > 4. The situation can reach where xid can wrap around.
> >
> > Session -1
> > 5. It again tries to query the same table, at this point it will compare
> > the xid in tuple with cachedVisibilityXid.
> >
> > Now if all tuple's xid's for that particular table are frozen in all
> cases
> > then it seems to be okay, otherwise it might be problem.
> > I am not fully aware about this wrap around and frozed xid concept, thats
> > why I had doubted
> > that it might create problem, on further thought, it appears that I was
> > wrong.
>
> Well there's that. But more to the point for the cache to fail you'd
> have to have a scenario where a table didn't scan any records for 1
> billion+ transactions. See note [3] above for reasoning why this is
> implausible. Also we're already relying on this effect in transam.c.
>
> merlin
>

PFA below the sig the updated patch for the same.It maintains a cache
cachedVisibilityXid which holds the results of clog visibility check.
cachedVisibilityXidStatus holds the commit status of the XID in
cachedVisibilityXid.

In each visibility function (except HeapTupleSatisfiesVacuum() ), an
addition check has been added to check if the commit status of Xmin or Xmax
of a tuple can be retrieved from the cache.

So, in place of

if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))

the condition is now

if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
&& !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
&& cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))

This checks if the commit status can be known from the cache or not before
proceeding.

I will be posting the patch to commit fest.

Thoughts/Feedback?

Atri

--
Regards,

Atri
*l'apprenant

*patch:

*** tqual--unchanged.c 2012-09-20 03:17:58.000000000 +0530
--- tqual.c 2012-11-14 23:27:30.470499857 +0530
***************
*** 75,80 ****
--- 75,88 ----
/* local functions */
static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);

+ /*
+ * Single-item cache for results of clog visibility check. It's worth
having
+ * such a cache to help reduce the amount of hint bit traffic when
+ * many sequentially touched tuples have the same XID.
+ */
+ static TransactionId cachedVisibilityXid;
+ /* Visibility status cache stores the commit status of the XID in
cachedVisibilityXid */
+ static uint16 cachedVisibilityXidStatus;

/*
* SetHintBits()
***************
*** 117,122 ****
--- 125,133 ----

if (XLogNeedsFlush(commitLSN) && BufferIsPermanent(buffer))
return; /* not flushed yet, so don't set hint
*/
+
+ cachedVisibilityXid = xid;
+ cachedVisibilityXidStatus = infomask;
}

tuple->t_infomask |= infomask;
***************
*** 164,170 ****
bool
HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer
buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
--- 175,183 ----
bool
HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer
buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
*** 247,253 ****
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
aborted */
return true;

! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
--- 260,268 ----
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
aborted */
return true;

! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
--- 433,441 ----
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
aborted */
return true;

! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
***************
*** 493,499 ****
HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
--- 512,520 ----
HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
*** 574,580 ****
HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return HeapTupleInvisible;
--- 595,603 ----
HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
Buffer buffer)
{
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return HeapTupleInvisible;
***************
*** 663,669 ****
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
aborted */
return HeapTupleMayBeUpdated;

! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return HeapTupleMayBeUpdated;
--- 686,694 ----
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
aborted */
return HeapTupleMayBeUpdated;

! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return HeapTupleMayBeUpdated;
***************
*** 743,749 ****
{
snapshot->xmin = snapshot->xmax = InvalidTransactionId;

! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
--- 768,776 ----
{
snapshot->xmin = snapshot->xmax = InvalidTransactionId;

! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
{
if (tuple->t_infomask & HEAP_XMIN_INVALID)
return false;
***************
*** 830,836 ****
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
aborted */
return true;

! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
{
if (tuple->t_infomask & HEAP_IS_LOCKED)
return true;
--- 857,865 ----
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
aborted */
return true;

! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
{
if
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
--- 1039,1047 ----
return true;
}

! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
{
if
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
{
***************
*** 1240,1246 ****
* invalid, then we assume it's still alive (since the presumption is
that
* all relevant hint bits were just set moments ago).
*/
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
return (tuple->t_infomask & HEAP_XMIN_INVALID) != 0 ? true :
false;

/*
--- 1273,1281 ----
* invalid, then we assume it's still alive (since the presumption is
that
* all relevant hint bits were just set moments ago).
*/
! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
return (tuple->t_infomask & HEAP_XMIN_INVALID) != 0 ? true :
false;

/*
***************
*** 1253,1259 ****
return false;

/* If deleter isn't known to have committed, assume it's still
running. */
! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
return false;

/* Deleter committed, so tuple is dead if the XID is old enough. */
--- 1288,1296 ----
return false;

/* If deleter isn't known to have committed, assume it's still
running. */
! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
! && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
return false;

/* Deleter committed, so tuple is dead if the XID is old enough. */

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-14 20:31:53
Message-ID:	CAOeZVid7zd9V-9WsEtf9wRCd0H4yw_1OgCLm7=CdvGPCxZcJJw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 15, 2012 at 12:42 AM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:

>
>
>
> On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>
>> On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
>> wrote:
>> >> >> Following the sig is a first cut at a patch (written by Atri) that
>> >> >> attempts to mitigate hint bit i/o penalty when many pages worth of
>> >> >> tuples are sequentially written out with the same transaction id.
>> >> >> There have been other attempts to deal with this problem that fit
>> >> >> niche cases (especially those that create the table in the same
>> >> >> transaction as the one inserting) that work but don't really solve
>> >> the
>> >> >> problem generally.
>> >>
>> >> As we are now dealing with only the last xid(please refer to the
>> details
>> >> of the patch attached), the invalidation issues are not significant any
>> >> more.
>> >
>> > I think you are right, but please check below the scenario I have in
>> mind
>> > due to which I got confused:
>> >
>> > Session -1
>> > 1. let's say for tup-1 on page, cachedVisibilityXid is not set, and it
>> go
>> > inside SetHinbits and set it to xid of tuple which is let's say = 708
>> > 2. now for all consecutive tuples which have same xmin (708), it can
>> > directly refer
>> > cached id and cached status, even if hint bit is not set.
>> >
>> > Other Sessions
>> > 3. now from other sessions, large operations happened on all other
>> tables
>> > except this table.
>> > 4. The situation can reach where xid can wrap around.
>> >
>> > Session -1
>> > 5. It again tries to query the same table, at this point it will compare
>> > the xid in tuple with cachedVisibilityXid.
>> >
>> > Now if all tuple's xid's for that particular table are frozen in all
>> cases
>> > then it seems to be okay, otherwise it might be problem.
>> > I am not fully aware about this wrap around and frozed xid concept,
>> thats
>> > why I had doubted
>> > that it might create problem, on further thought, it appears that I was
>> > wrong.
>>
>> Well there's that. But more to the point for the cache to fail you'd
>> have to have a scenario where a table didn't scan any records for 1
>> billion+ transactions. See note [3] above for reasoning why this is
>> implausible. Also we're already relying on this effect in transam.c.
>>
>> merlin
>>
>
> PFA below the sig the updated patch for the same.It maintains a cache
> cachedVisibilityXid which holds the results of clog visibility check.
> cachedVisibilityXidStatus holds the commit status of the XID in
> cachedVisibilityXid.
>
> In each visibility function (except HeapTupleSatisfiesVacuum() ), an
> addition check has been added to check if the commit status of Xmin or Xmax
> of a tuple can be retrieved from the cache.
>
> So, in place of
>
> if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
>
> the condition is now
>
>
> if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
> && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
> && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
>
> This checks if the commit status can be known from the cache or not before
> proceeding.
>
> I will be posting the patch to commit fest.
>
> Thoughts/Feedback?
>
> Atri
>
> --
> Regards,
>
> Atri
> *l'apprenant
>
>
> *patch:
>
> *** tqual--unchanged.c 2012-09-20 03:17:58.000000000 +0530
> --- tqual.c 2012-11-14 23:27:30.470499857 +0530
> ***************
> *** 75,80 ****
> --- 75,88 ----
>
> /* local functions */
> static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
>
> + /*
> + * Single-item cache for results of clog visibility check. It's worth
> having
> + * such a cache to help reduce the amount of hint bit traffic when
> + * many sequentially touched tuples have the same XID.
> + */
> + static TransactionId cachedVisibilityXid;
> + /* Visibility status cache stores the commit status of the XID in
> cachedVisibilityXid */
> + static uint16 cachedVisibilityXidStatus;
>
> /*
> * SetHintBits()
> ***************
> *** 117,122 ****
> --- 125,133 ----
>
>
> if (XLogNeedsFlush(commitLSN) && BufferIsPermanent(buffer))
> return; /* not flushed yet, so don't set hint
> */
> +
> + cachedVisibilityXid = xid;
> + cachedVisibilityXidStatus = infomask;
> }
>
> tuple->t_infomask |= infomask;
> ***************
> *** 164,170 ****
> bool
> HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer
> buffer)
>
> {
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_XMIN_INVALID)
> return false;
> --- 175,183 ----
> bool
> HeapTupleSatisfiesSelf(HeapTupleHeader tuple, Snapshot snapshot, Buffer
> buffer)
>
> {
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
> ! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
> ! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_XMIN_INVALID)
> return false;
> ***************
> *** 247,253 ****
> if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
> aborted */
> return true;
>
> ! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
> {
> if (tuple->t_infomask & HEAP_IS_LOCKED)
> return true;
> --- 260,268 ----
> if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
> aborted */
> return true;
>
> ! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
> ! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
> ! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_IS_LOCKED)
> return true;
> ***************
> *** 327,333 ****
> bool
> HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer
> buffer)
>
> {
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_XMIN_INVALID)
> return false;
> --- 342,350 ----
> bool
> HeapTupleSatisfiesNow(HeapTupleHeader tuple, Snapshot snapshot, Buffer
> buffer)
>
> {
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
> ! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
> ! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_XMIN_INVALID)
> return false;
> ***************
> *** 416,422 ****
> if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
> aborted */
> return true;
>
> ! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
> {
> if (tuple->t_infomask & HEAP_IS_LOCKED)
> return true;
> --- 433,441 ----
> if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
> aborted */
> return true;
>
> ! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
> ! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
> ! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_IS_LOCKED)
> return true;
> ***************
> *** 493,499 ****
> HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
>
> Buffer buffer)
> {
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_XMIN_INVALID)
> return false;
> --- 512,520 ----
> HeapTupleSatisfiesToast(HeapTupleHeader tuple, Snapshot snapshot,
>
> Buffer buffer)
> {
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
> ! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
> ! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_XMIN_INVALID)
> return false;
> ***************
> *** 574,580 ****
> HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
>
> Buffer buffer)
> {
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_XMIN_INVALID)
> return HeapTupleInvisible;
> --- 595,603 ----
> HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
>
> Buffer buffer)
> {
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
> ! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
> ! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_XMIN_INVALID)
> return HeapTupleInvisible;
> ***************
> *** 663,669 ****
> if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
> aborted */
> return HeapTupleMayBeUpdated;
>
> ! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
> {
> if (tuple->t_infomask & HEAP_IS_LOCKED)
> return HeapTupleMayBeUpdated;
> --- 686,694 ----
> if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
> aborted */
> return HeapTupleMayBeUpdated;
>
> ! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
> ! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
> ! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_IS_LOCKED)
> return HeapTupleMayBeUpdated;
> ***************
> *** 743,749 ****
> {
> snapshot->xmin = snapshot->xmax = InvalidTransactionId;
>
>
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_XMIN_INVALID)
> return false;
> --- 768,776 ----
> {
> snapshot->xmin = snapshot->xmax = InvalidTransactionId;
>
>
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
> ! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
> ! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_XMIN_INVALID)
> return false;
> ***************
> *** 830,836 ****
> if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
> aborted */
> return true;
>
> ! if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
> {
> if (tuple->t_infomask & HEAP_IS_LOCKED)
> return true;
> --- 857,865 ----
> if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid or
> aborted */
> return true;
>
> ! if ((tuple->t_infomask & HEAP_XMAX_COMMITTED)
> ! || (cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
> ! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_IS_LOCKED)
> return true;
>
> ***************
> *** 904,910 ****
> HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
> Buffer buffer)
> {
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_XMIN_INVALID)
> return false;
> --- 933,941 ----
>
> HeapTupleSatisfiesMVCC(HeapTupleHeader tuple, Snapshot snapshot,
> Buffer buffer)
> {
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
> ! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
> ! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
> {
> if (tuple->t_infomask & HEAP_XMIN_INVALID)
> return false;
> ***************
> *** 1008,1014 ****
> return true;
> }
>
> ! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
> {
> if
> (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
> {
> --- 1039,1047 ----
>
> return true;
> }
>
> ! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
> ! && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
> ! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
> {
> if
> (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
> {
> ***************
> *** 1240,1246 ****
> * invalid, then we assume it's still alive (since the presumption
> is that
> * all relevant hint bits were just set moments ago).
> */
>
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
> return (tuple->t_infomask & HEAP_XMIN_INVALID) != 0 ? true :
> false;
>
> /*
> --- 1273,1281 ----
> * invalid, then we assume it's still alive (since the presumption
> is that
> * all relevant hint bits were just set moments ago).
> */
>
> ! if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED)
> ! && !(cachedVisibilityXid == HeapTupleHeaderGetXmin(tuple)
> ! && cachedVisibilityXidStatus == HEAP_XMIN_COMMITTED))
> return (tuple->t_infomask & HEAP_XMIN_INVALID) != 0 ? true :
> false;
>
> /*
> ***************
> *** 1253,1259 ****
> return false;
>
> /* If deleter isn't known to have committed, assume it's still
> running. */
>
> ! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
> return false;
>
> /* Deleter committed, so tuple is dead if the XID is old enough. */
> --- 1288,1296 ----
> return false;
>
> /* If deleter isn't known to have committed, assume it's still
> running. */
>
> ! if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED)
> ! && !(cachedVisibilityXid == HeapTupleHeaderGetXmax(tuple)
> ! && cachedVisibilityXidStatus == HEAP_XMAX_COMMITTED))
> return false;
>
> /* Deleter committed, so tuple is dead if the XID is old enough. */
>
>
Attached is the patch for review.

Atri

--
Regards,

Atri
*l'apprenant*

Attachment	Content-Type	Size
tqual_xid_cache_v3_1.patch	application/octet-stream	8.7 KB

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Atri Sharma'" <atri(dot)jiit(at)gmail(dot)com>, "'Merlin Moncure'" <mmoncure(at)gmail(dot)com>
Cc:	"'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-15 10:39:42
Message-ID:	00ac01cdc31d$85f92ce0$91eb86a0$@kapila@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday, November 15, 2012 2:02 AM Atri Sharma wrote:
On Thu, Nov 15, 2012 at 12:42 AM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:

On Wed, Nov 7, 2012 at 9:47 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

On Wed, Nov 7, 2012 at 6:01 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
>> >> Following the sig is a first cut at a patch (written by Atri) that

>PFA below the sig the updated patch for the same.It maintains a cache
cachedVisibilityXid which holds the results of clog visibility check.
cachedVisibilityXidStatus

>holds the commit status of the XID in cachedVisibilityXid.

>In each visibility function (except HeapTupleSatisfiesVacuum() ), an
addition check has been added to check if the commit status of Xmin or Xmax
of a tuple can be >retrieved from the cache.

1. From your explanation and code, it is quite clear that it will
certainly give performance benefits in the scenario's mentioned by you.

I can once validate the performance numbers again and do the code
review for this patch during CF-3.

However I am just not very sure about the use case, such that whether
it is a sufficient use case.

So I would like to ask opinion of other people as well.

2. After this patch, tuple hint bit is not set by Select operations after
data populated by one transaction.

This appears to be good as it will save many ops (page dirty followed
by flush , clog inquiry).

Though it is no apparent, however we should see whether it can cause
any other impact due to this:

a. like may be now VACUUM needs set the hint bit which may cause more
I/O during Vacuum.

Hackers, any opinion/suggestions about the use case?

With Regards,

Amit Kapila.

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-15 15:57:22
Message-ID:	CAHyXU0zL8B0n2hF+4iaGtRRDsjruPpDOcHYLx6RHBxonUAeKBw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 15, 2012 at 4:39 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
>>In each visibility function (except HeapTupleSatisfiesVacuum() ), an
>> addition check has been added to check if the commit status of Xmin or Xmax
>> of a tuple can be >retrieved from the cache.
>
>
>
> 1. From your explanation and code, it is quite clear that it will
> certainly give performance benefits in the scenario’s mentioned by you.
>
> I can once validate the performance numbers again and do the code
> review for this patch during CF-3.
>
> However I am just not very sure about the use case, such that whether
> it is a sufficient use case.
>
> So I would like to ask opinion of other people as well.

sure. I'd like to note though that hint bit i/o is a somewhat common
complaint. it tends to most affect OLAP style workloads. in
pathological workloads, it can really burn you -- it's not fun when
you are i/o starved via sequential scan. This can still happen when
sweeping dead records (which this patch doesn't deal with, though
maybe it should).

> 2. After this patch, tuple hint bit is not set by Select operations after
> data populated by one transaction.
>
> This appears to be good as it will save many ops (page dirty followed
> by flush , clog inquiry).

Technically it does not save clog fetch as transam.c has a very
similar cache mechanism. However, it does save a page write i/o and a
lock on the page header, as well as a couple of other minor things.
In the best case, the page write is completely masked as the page gets
dirty for other reasons. I think this is going to become more
important in checksum enabled scenarios.

> Though it is no apparent, however we should see whether it can cause
> any other impact due to this:
>
> a. like may be now VACUUM needs set the hint bit which may cause more
> I/O during Vacuum.

IMNSHO. deferring non-critical i/o from foreground process to
background process is generally good. VACUUM has nice features like
i/o throttling and in place cancel so latent management of internal
page optimization flags really belong there ideally. Also, the longer
you defer such I/O the more opportunity there is for it to be masked
off by some other page dirtying operation (again, this is more
important in the face of having to log hint bit changes).

There could be some good rebuttal analysis though.

merlin

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Merlin Moncure'" <mmoncure(at)gmail(dot)com>
Cc:	"'Atri Sharma'" <atri(dot)jiit(at)gmail(dot)com>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-15 16:25:05
Message-ID:	00eb01cdc34d$c608b550$521a1ff0$@kapila@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday, November 15, 2012 9:27 PM Merlin Moncure wrote:
> On Thu, Nov 15, 2012 at 4:39 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
> wrote:
> >>In each visibility function (except HeapTupleSatisfiesVacuum() ), an
> >> addition check has been added to check if the commit status of Xmin
> or Xmax
> >> of a tuple can be >retrieved from the cache.
> >
> >
> >
> > 1. From your explanation and code, it is quite clear that it will
> > certainly give performance benefits in the scenario's mentioned by
> you.
> >
> > I can once validate the performance numbers again and do the
> code
> > review for this patch during CF-3.
> >
> > However I am just not very sure about the use case, such that
> whether
> > it is a sufficient use case.
> >
> > So I would like to ask opinion of other people as well.
>
> sure. I'd like to note though that hint bit i/o is a somewhat common
> complaint. it tends to most affect OLAP style workloads. in
> pathological workloads, it can really burn you -- it's not fun when
> you are i/o starved via sequential scan. This can still happen when
> sweeping dead records (which this patch doesn't deal with, though
> maybe it should).
>
> > 2. After this patch, tuple hint bit is not set by Select operations
> after
> > data populated by one transaction.
> >
> > This appears to be good as it will save many ops (page dirty
> followed
> > by flush , clog inquiry).
>
> Technically it does not save clog fetch as transam.c has a very
> similar cache mechanism. However, it does save a page write i/o and a
> lock on the page header, as well as a couple of other minor things.
> In the best case, the page write is completely masked as the page gets
> dirty for other reasons. I think this is going to become more
> important in checksum enabled scenarios.
>
> > Though it is no apparent, however we should see whether it can
> cause
> > any other impact due to this:
> >
> > a. like may be now VACUUM needs set the hint bit which may
> cause more
> > I/O during Vacuum.
>
> IMNSHO. deferring non-critical i/o from foreground process to
> background process is generally good.

Yes, in regard of deferring you are right.
However in this case may be when foreground process has to mark page dirty
due to hint-bit, it was already dirty so no extra I/O, but when it is done
by VACUUM, page may not be dirty.

Also due to below points, doing it in VACUUM may cost more:
a. VACUUM has ring-buffer of fixed size and if such pages are many then
write of page needs to be done by VACUUM to replace existing page
in ring.
b. Considering sometimes people want VACUUM to run when system is not busy,
the chances of generating more overall I/O in system can be
more.

Why we can't avoid setting hint-bit in VACUUM?
Is it due to reason that it has to be done in some way, so that hint-bits
are properly set.
Or may be I am missing something trivial?

Though the case VACUUM, I am talking occurs very less in practical, but the
point came to my mind,
so I thought of sharing with you.

> VACUUM has nice features like
> i/o throttling and in place cancel so latent management of internal
> page optimization flags really belong there ideally. Also, the longer
> you defer such I/O the more opportunity there is for it to be masked
> off by some other page dirtying operation (again, this is more
> important in the face of having to log hint bit changes).
>
> There could be some good rebuttal analysis though.

With Regards,
Amit Kapila.

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-15 16:48:51
Message-ID:	CAHyXU0z9+x7Y-0w9-Wut501Rmiw6S2mEyCCckANecf0WWRvRYw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
>> IMNSHO. deferring non-critical i/o from foreground process to
>> background process is generally good.
>
> Yes, in regard of deferring you are right.
> However in this case may be when foreground process has to mark page dirty
> due to hint-bit, it was already dirty so no extra I/O, but when it is done
> by VACUUM, page may not be dirty.

Yeah. We can try to be smart and set the hint bits in that case. Not
sure that will work out with checksum having to wal log hint bits
though (which by reading the checksum threads seems likely to happen).

> Also due to below points, doing it in VACUUM may cost more:
> a. VACUUM has ring-buffer of fixed size and if such pages are many then
> write of page needs to be done by VACUUM to replace existing page
> in ring.

Sure, although in scans we are using ring buffer as well so in
practical sense the results are pretty close.

> b. Considering sometimes people want VACUUM to run when system is not busy,
> the chances of generating more overall I/O in system can be
> more.

It's hard to imagine overall i/o load increasing. However, longer
vacuum times should be considered. A bigger issue is that we are
missing an early opportunity to set the all visible bit. vacuum is
doing:

if (all_visible)
{
TransactionId xmin;

if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))
{
all_visible = false;
break;
}

assuming the cache is working and vacuum rolls around after a scan,
you lost the opportunity to set all_visible flag where previously it
would have been set, thereby dismantling the positive effect of an
index only scan. AFAICT, this is the only case where vaccum is
directly interfacing with hint bits. This could be construed as a
violation of heapam API? Maybe if that's an issue we could proxy that
check to a heaptuple/tqual.c maintained function (in the same manner
as HeapTupleSetHintBits) so that the cache bit could be uniformly
checked.

All other *setting* of hint bits is running through SetHintBits(), so
I think we are safe from vacuum point of view. That's another thing
to test for though.

merlin

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-16 00:42:15
Message-ID:	1353026535.14335.89.camel@sussancws0025
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2012-11-06 at 17:55 -0600, Merlin Moncure wrote:
> So given that -- the patch simple adds an extra check when/where hint
> bit status is checked in the visibility routines (currently, only
> HeapTupleSatisfiesMVCC is done but all the applicable visibility
> routines should be done). Basically, the way it works is like this:
>
> *) is hint bit set?
> *) if not? does the examined xid match the last examined one?
> *) if so, and the cached hint bit matches the one want, proceeed as if
> hint bit was set

Can you clarify the difference between this and
cachedFetchXid/cachedFetchXidStatus? Do we need to keep those if your
patch is accepted?

Regards,
Jeff Davis

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Merlin Moncure'" <mmoncure(at)gmail(dot)com>
Cc:	"'Atri Sharma'" <atri(dot)jiit(at)gmail(dot)com>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-16 10:32:31
Message-ID:	00a001cdc3e5$afc43150$0f4c93f0$@kapila@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote:
> On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
> wrote:

>
> Sure, although in scans we are using ring buffer as well so in
> practical sense the results are pretty close.
>
> > b. Considering sometimes people want VACUUM to run when system is not
> busy,
> > the chances of generating more overall I/O in system can be
> > more.
>
> It's hard to imagine overall i/o load increasing. However, longer
> vacuum times should be considered. A bigger issue is that we are
> missing an early opportunity to set the all visible bit. vacuum is
> doing:
>
> if (all_visible)
> {
> TransactionId xmin;
>
> if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))
> {
> all_visible = false;
> break;
> }
>
> assuming the cache is working and vacuum rolls around after a scan,
> you lost the opportunity to set all_visible flag where previously it
> would have been set, thereby dismantling the positive effect of an
> index only scan. AFAICT, this is the only case where vaccum is
> directly interfacing with hint bits. This could be construed as a
> violation of heapam API? Maybe if that's an issue we could proxy that
> check to a heaptuple/tqual.c maintained function (in the same manner
> as HeapTupleSetHintBits) so that the cache bit could be uniformly
> checked.

I think we need to think of some tests to check if Vacuum or any other
impact has not been created due to this change.
I will devise tests during review of this patch.
However if you have more ideas then share the same which will make tests of
this patch more strong.
For functional/performance test of this patch, one of my colleague Hari Babu
will also work along with me on it.

With Regards,
Amit Kapila.

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-16 14:03:04
Message-ID:	CAHyXU0y=Mvm2XzdLcjy=4=Hfw1O1UN2Z4qdWYFaEM2zk7ZOO+A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Nov 16, 2012 at 4:32 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote:
>> On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
>> wrote:
>
>>
>> Sure, although in scans we are using ring buffer as well so in
>> practical sense the results are pretty close.
>>
>> > b. Considering sometimes people want VACUUM to run when system is not
>> busy,
>> > the chances of generating more overall I/O in system can be
>> > more.
>>
>> It's hard to imagine overall i/o load increasing. However, longer
>> vacuum times should be considered. A bigger issue is that we are
>> missing an early opportunity to set the all visible bit. vacuum is
>> doing:
>>
>> if (all_visible)
>> {
>> TransactionId xmin;
>>
>> if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))
>> {
>> all_visible = false;
>> break;
>> }
>>
>> assuming the cache is working and vacuum rolls around after a scan,
>> you lost the opportunity to set all_visible flag where previously it
>> would have been set, thereby dismantling the positive effect of an
>> index only scan. AFAICT, this is the only case where vaccum is
>> directly interfacing with hint bits. This could be construed as a
>> violation of heapam API? Maybe if that's an issue we could proxy that
>> check to a heaptuple/tqual.c maintained function (in the same manner
>> as HeapTupleSetHintBits) so that the cache bit could be uniformly
>> checked.
>
> I think we need to think of some tests to check if Vacuum or any other
> impact has not been created due to this change.
> I will devise tests during review of this patch.
> However if you have more ideas then share the same which will make tests of
> this patch more strong.
> For functional/performance test of this patch, one of my colleague Hari Babu
> will also work along with me on it.

Thanks. So far, Atri ran some quick n dirty tests to see if there
were any regressions. He benched a large scan followed by vacuum. So
far, results are inconclusive so better testing methodologies will
definitely be greatly appreciated. One of the challenges with working
in this part of the code is that it's quite difficult to make changes
without impacting at least some workloads.

merlin

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-16 14:26:06
Message-ID:	CAHyXU0xM1_8TQnYx9avy_OPtcLhXzcNY_LbJU+n8gmVwEcDqxg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 15, 2012 at 6:42 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Tue, 2012-11-06 at 17:55 -0600, Merlin Moncure wrote:
>> So given that -- the patch simple adds an extra check when/where hint
>> bit status is checked in the visibility routines (currently, only
>> HeapTupleSatisfiesMVCC is done but all the applicable visibility
>> routines should be done). Basically, the way it works is like this:
>>
>> *) is hint bit set?
>> *) if not? does the examined xid match the last examined one?
>> *) if so, and the cached hint bit matches the one want, proceeed as if
>> hint bit was set
>
> Can you clarify the difference between this and
> cachedFetchXid/cachedFetchXidStatus? Do we need to keep those if your
> patch is accepted?

Very little, except:
*) transam.c managed cache is unable to influence the specific code
path through the visibility routines, at least not without significant
refactoring -- everything that happens in tqual.c should be inlined.
I played with doing it all in transam.c a while back and didn't much
like how it turned out. That doesn't mean it can't work though.

*) There are a couple of important looking code paths that communicate
directly with transam.c. For example, in procarray.c
ProcArrayApplyRecoveryInfo(). Removing transam.c cache could turn
into fairly significant regression if that code is performance
sensitive -- that would have to be studied before doing that.

Maybe abstracting 'last xid cache' along with hint bit management out
of both transam.c and tqual.c into something like 'hints.c' is
appropriate, but that's a more invasive change.

merlin

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-16 19:42:26
Message-ID:	CAOeZVifze2OxHfG+_JaSTQ4F=4uND5CopKknrns=Qi3U4iCBwg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Nov 16, 2012 at 7:33 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Fri, Nov 16, 2012 at 4:32 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
> wrote:
> > On Thursday, November 15, 2012 10:19 PM Merlin Moncure wrote:
> >> On Thu, Nov 15, 2012 at 10:25 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
> >> wrote:
> >
> >>
> >> Sure, although in scans we are using ring buffer as well so in
> >> practical sense the results are pretty close.
> >>
> >> > b. Considering sometimes people want VACUUM to run when system is not
> >> busy,
> >> > the chances of generating more overall I/O in system can be
> >> > more.
> >>
> >> It's hard to imagine overall i/o load increasing. However, longer
> >> vacuum times should be considered. A bigger issue is that we are
> >> missing an early opportunity to set the all visible bit. vacuum is
> >> doing:
> >>
> >> if (all_visible)
> >> {
> >> TransactionId xmin;
> >>
> >> if (!(tuple.t_data->t_infomask & HEAP_XMIN_COMMITTED))
> >> {
> >> all_visible = false;
> >> break;
> >> }
> >>
> >> assuming the cache is working and vacuum rolls around after a scan,
> >> you lost the opportunity to set all_visible flag where previously it
> >> would have been set, thereby dismantling the positive effect of an
> >> index only scan. AFAICT, this is the only case where vaccum is
> >> directly interfacing with hint bits. This could be construed as a
> >> violation of heapam API? Maybe if that's an issue we could proxy that
> >> check to a heaptuple/tqual.c maintained function (in the same manner
> >> as HeapTupleSetHintBits) so that the cache bit could be uniformly
> >> checked.
> >
> > I think we need to think of some tests to check if Vacuum or any other
> > impact has not been created due to this change.
> > I will devise tests during review of this patch.
> > However if you have more ideas then share the same which will make tests
> of
> > this patch more strong.
> > For functional/performance test of this patch, one of my colleague Hari
> Babu
> > will also work along with me on it.
>
> Thanks. So far, Atri ran some quick n dirty tests to see if there
> were any regressions. He benched a large scan followed by vacuum. So
> far, results are inconclusive so better testing methodologies will
> definitely be greatly appreciated. One of the challenges with working
> in this part of the code is that it's quite difficult to make changes
> without impacting at least some workloads.
>
> merlin
>

Thanks a ton Amit and your colleague Hari for volunteering to review the
patch.

I ran some benchmarks and came up with the following results:

With our code

atri(at)atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench atri1
-n -f bench2.sql -c 8 -T 300
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 1
duration: 300 s
number of transactions actually processed: 412
tps = 1.366142 (including connections establishing)
tps = 1.366227 (excluding connections establishing)

Without our code

atri(at)atri-Aspire-4740:~/postgresql-9.2.1/contrib/pgbench$ ./pgbench atri1
-n -f bench2.sql -c 8 -T 300
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 1
duration: 300 s
number of transactions actually processed: 378
tps = 1.244333 (including connections establishing)
tps = 1.244447 (excluding connections establishing)

The SQL file is attached.

Please let us know if you need any more details.

Atri
--
Regards,

Atri
*l'apprenant*

Attachment	Content-Type	Size
bench2.sql	application/octet-stream	124 bytes

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-21 21:30:05
Message-ID:	50AD47DD.10006@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/16/12 9:03 AM, Merlin Moncure wrote:
> Atri ran some quick n dirty tests to see if there
> were any regressions. He benched a large scan followed by vacuum. So
> far, results are inconclusive so better testing methodologies will
> definitely be greatly appreciated. One of the challenges with working
> in this part of the code is that it's quite difficult to make changes
> without impacting at least some workloads.

I'm adding something to pgbench-tools to test for some types of vacuum
regressions that I've seen before. And the checksum benchmarking I've
already signed up for overlaps with this one quite a bit. I'd suggest
reviewers here focus on code quality, correctness, and targeted
optimization testing. I'm working heavily on a larger benchmarking
framework that both this and checksums will fit into right now.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-11-28 21:07:50
Message-ID:	20121128210749.GE4333@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Merlin Moncure escribió:

> Maybe abstracting 'last xid cache' along with hint bit management out
> of both transam.c and tqual.c into something like 'hints.c' is
> appropriate, but that's a more invasive change.

It would be good to have such a patch to measure/compare performance of
both approaches. It does seem like the more invasive change might be
more expensive for both the transam.c and the tqual.c uses, though; and
it's not clear that there's anything to be gained by having them be
together in a single module.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Greg Smith'" <greg(at)2ndQuadrant(dot)com>, "'Merlin Moncure'" <mmoncure(at)gmail(dot)com>, "'Atri Sharma'" <atri(dot)jiit(at)gmail(dot)com>
Cc:	"'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>, <haribabu(dot)kommi(at)huawei(dot)com>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-12-06 09:59:48
Message-ID:	00d301cdd398$6e3fff30$4abffd90$@kapila@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday, November 22, 2012 3:00 AM Greg Smith wrote:
> On 11/16/12 9:03 AM, Merlin Moncure wrote:
> > Atri ran some quick n dirty tests to see if there
> > were any regressions. He benched a large scan followed by vacuum. So
> > far, results are inconclusive so better testing methodologies will
> > definitely be greatly appreciated. One of the challenges with working
> > in this part of the code is that it's quite difficult to make changes
> > without impacting at least some workloads.
>
> I'm adding something to pgbench-tools to test for some types of vacuum
> regressions that I've seen before. And the checksum benchmarking I've
> already signed up for overlaps with this one quite a bit. I'd suggest
> reviewers here focus on code quality, correctness, and targeted
> optimization testing. I'm working heavily on a larger benchmarking
> framework that both this and checksums will fit into right now.

We are planning below performance tests for hint-bit I/O mitigation patch:

Test case -1: Select performance in sequential scan and vacuum operation
with I/O statistics
Bulk operations are happened in multiple transactions.
1. Stop the auto-vacuum.
2. Create table
3. Insert 10000 records in one transaction for 1000 times.
4A. Use pgbench to select all the records using sequential scan for 5
min - 8 threads.
4B. Record the IO statistics.
5. After completion of test-case check VACUUM performance.

Test case -2:
Select performance in index scan and vacuum operation with I/O
statistics
Same as testcase - 1 change the 4A as follows
4A. Use pgbench with -F option to select random records using
index scan for 5 min - 8 threads.

Test case -3:
Select performance in sequential scan and vacuum operation with I/O
statistics
When bulk operations are happened in one transaction.
1. Stop the auto-vacuum.
2. Create table
3. Insert 10,000,000 times.
4A. Use pgbench to select all the records using sequential scan for 5
min - 8 threads.
4B. Record the IO statistics.
5. After completion of test-case check VACUUM performance.

Test case -4:
Select performance in index scan and vacuum operation with I/O
statistics
Same as testcase - 3 change the 4A as follows
4A. Use pgbench to select random records using index scan for 5
min - 8 threads.

Test case -5:
Check original pgbench performance & vacuum operation
1. For select only and tcp_b performance suites with scale factor of
75 & 150, threads 8 &16

Test case -6:(Vacuum I/O may increase if vacuum need to make the page dirty
only for setting the hit bits. )
1. Session - 1 : Open a some long transaction

2. Session - 2 : Insert some records & commit
Do the select - all the tuples.
Checkpoint;
Vacuum the table
Checkpoint;
3. Record the IO statistics & time taken for Vacuum & 2nd
Checkpoint.

Test case -7: (This is also to check Vacuum I/O)
1. Have replication setup.
2. Insert some records & commit
3. Vacuum the table
4. Upgrade the standby.
5. Select the all the tuples on new master
6. Vacuum the tbl on new master.
6B. Record the IO statistics & time taken for Vacuum on new
master.

Suggestions/Feedback

With Regards,
Amit Kapila.

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, haribabu(dot)kommi(at)huawei(dot)com
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-12-06 15:22:08
Message-ID:	CAHyXU0wU69c2D8Y_bE0EgMvAYgwSmqMpErihkCUa++FB9oQHvQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 6, 2012 at 3:59 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> On Thursday, November 22, 2012 3:00 AM Greg Smith wrote:
>> On 11/16/12 9:03 AM, Merlin Moncure wrote:
>> > Atri ran some quick n dirty tests to see if there
>> > were any regressions. He benched a large scan followed by vacuum. So
>> > far, results are inconclusive so better testing methodologies will
>> > definitely be greatly appreciated. One of the challenges with working
>> > in this part of the code is that it's quite difficult to make changes
>> > without impacting at least some workloads.
>>
>> I'm adding something to pgbench-tools to test for some types of vacuum
>> regressions that I've seen before. And the checksum benchmarking I've
>> already signed up for overlaps with this one quite a bit. I'd suggest
>> reviewers here focus on code quality, correctness, and targeted
>> optimization testing. I'm working heavily on a larger benchmarking
>> framework that both this and checksums will fit into right now.
>
> We are planning below performance tests for hint-bit I/O mitigation patch:
>
> Test case -1: Select performance in sequential scan and vacuum operation
> with I/O statistics
> Bulk operations are happened in multiple transactions.
> 1. Stop the auto-vacuum.
> 2. Create table
> 3. Insert 10000 records in one transaction for 1000 times.
> 4A. Use pgbench to select all the records using sequential scan for 5
> min - 8 threads.
> 4B. Record the IO statistics.
> 5. After completion of test-case check VACUUM performance.
>
> Test case -2:
> Select performance in index scan and vacuum operation with I/O
> statistics
> Same as testcase - 1 change the 4A as follows
> 4A. Use pgbench with -F option to select random records using
> index scan for 5 min - 8 threads.
>
> Test case -3:
> Select performance in sequential scan and vacuum operation with I/O
> statistics
> When bulk operations are happened in one transaction.
> 1. Stop the auto-vacuum.
> 2. Create table
> 3. Insert 10,000,000 times.
> 4A. Use pgbench to select all the records using sequential scan for 5
> min - 8 threads.
> 4B. Record the IO statistics.
> 5. After completion of test-case check VACUUM performance.
>
> Test case -4:
> Select performance in index scan and vacuum operation with I/O
> statistics
> Same as testcase - 3 change the 4A as follows
> 4A. Use pgbench to select random records using index scan for 5
> min - 8 threads.
>
> Test case -5:
> Check original pgbench performance & vacuum operation
> 1. For select only and tcp_b performance suites with scale factor of
> 75 & 150, threads 8 &16
>
> Test case -6:(Vacuum I/O may increase if vacuum need to make the page dirty
> only for setting the hit bits. )
> 1. Session - 1 : Open a some long transaction
>
> 2. Session - 2 : Insert some records & commit
> Do the select - all the tuples.
> Checkpoint;
> Vacuum the table
> Checkpoint;
> 3. Record the IO statistics & time taken for Vacuum & 2nd
> Checkpoint.
>
> Test case -7: (This is also to check Vacuum I/O)
> 1. Have replication setup.
> 2. Insert some records & commit
> 3. Vacuum the table
> 4. Upgrade the standby.
> 5. Select the all the tuples on new master
> 6. Vacuum the tbl on new master.
> 6B. Record the IO statistics & time taken for Vacuum on new
> master.

Thanks for that -- that's fairly comprehensive I'd say. I'm quite
interested in that benchmarking framework as well. Do you need help
setting up the scripts?

merlin

From:	Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>
To:	"'Merlin Moncure'" <mmoncure(at)gmail(dot)com>, "'Amit Kapila'" <amit(dot)kapila(at)huawei(dot)com>
Cc:	"'Greg Smith'" <greg(at)2ndquadrant(dot)com>, "'Atri Sharma'" <atri(dot)jiit(at)gmail(dot)com>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-12-07 14:26:12
Message-ID:	002501cdd486$cf5d5940$6e180bc0$@kommi@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 6, 2012 at 8:52 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>Thanks for that -- that's fairly comprehensive I'd say. I'm quite
>interested in that benchmarking framework as well. Do you need help
>setting up the scripts?

Presently I am testing with pgbench custom query option & taking IO & VM
statistics in parallel.

Following way I had written the test script for case -1.

./psql -f bench_test_1_init.sql postgres
iostat -t 1 -d > hint.test1.iostat.reading_3.txt &
vmstat -n 1 > hint.test1.vmstat.reading_3.txt &
./pgbench -f bench_test_1.sql -T 300 -c 8 -j 8 -n postgres
killall -s SIGINT iostat
killall -s SIGINT vmstat

Where the sql files are as follows:

-- bench_test_1.sql
select count(*) from bench where f1 is not null;

--bench_test_1_init.sql

drop table if exists bench;
create table bench(f0 int primary key, f1 char(50));
insert into bench values (generate_series(1, 100000), 'a');
insert into bench values (generate_series(100001, 200000), 'a');
...
insert into bench values (generate_series(9800001, 9900000), 'a');
insert into bench values (generate_series(9900001, 10000000), 'a');
checkpoint;

I will provide the test results later.

Any suggestions/comments?

Regards,
Hari babu.

From:	Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>
To:	"'Merlin Moncure'" <mmoncure(at)gmail(dot)com>
Cc:	"'Greg Smith'" <greg(at)2ndquadrant(dot)com>, "'Atri Sharma'" <atri(dot)jiit(at)gmail(dot)com>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>, "'Amit Kapila'" <amit(dot)kapila(at)huawei(dot)com>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-12-13 13:06:58
Message-ID:	000301cdd932$bd48f600$37dae200$@kommi@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 7, 2012 at 7:56 PM, Hari babu
<haribabu(dot)kommi(at)Huawei(dot)com> wrote:

>>On Thu, Dec 6, 2012 at 8:52 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

>>Thanks for that -- that's fairly comprehensive I'd say. I'm quite

>>interested in that benchmarking framework as well. Do you need help

>>setting up the scripts?

>Presently I am testing with pgbench custom query option & taking IO & VM
statistics in parallel.

>Following way I had written the test script for case -1.

>./psql -f bench_test_1_init.sql postgres iostat -t 1 -d >
hint.test1.iostat.reading_3.txt & vmstat -n 1 >
>hint.test1.vmstat.reading_3.txt & ./pgbench -f bench_test_1.sql -T 300 -c 8
-j 8 -n postgres killall -s SIGINT iostat >killall -s SIGINT vmstat

>Where the sql files are as follows:

>-- bench_test_1.sql

>select count(*) from bench where f1 is not null;

>--bench_test_1_init.sql

>drop table if exists bench;

>create table bench(f0 int primary key, f1 char(50)); insert into bench
values (generate_series(1, 100000), 'a'); insert >into bench values
(generate_series(100001, 200000), 'a'); ...

>insert into bench values (generate_series(9800001, 9900000), 'a'); insert
into bench values (generate_series(9900001, >10000000), 'a'); checkpoint;

>I will provide the test results later.

Please find the review of the patch.

Basic stuff:
------------
- Patch applies with offsets.
- Compiles cleanly with no warnings
- Regression Test pass.

Code Review:
-------------
1. Better to set the hint bits for the tuples in a page, if the page
is already dirty.

Test cases:
----------
Test cases are already described in the following link.

<http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@k
apila(at)huawei(dot)com>
http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@ka
pila(at)huawei(dot)com

Test Results:
------------
Machine details:
CPU cores : 4
RAM : 24GB
OS : Suse Linux 10 SP2

Configuration:
shared_buffers = 500MB
autovacuum = off
checkpoint_segments = 256
checkpoint_timeout = 10min

Following result are average of 3 runs each run is of 5min through pgbench
custom query.

Test case - 1:
Init:
None
Run:
create temp table atri1 as select v from
generate_series(1,1000000) v;
select * from atri1;
VACUUM atri1;
DROP TABLE atri1;

Test case - 2:
Init:
create table atri1 as select v from
generate_series(1,1000000) v;
Run:
select * from atri1;

Test case - 3: (without pgbench)
connect two sessions s1, s2
s1 : start transaction;
s2 :
create table atri1 as select v from
generate_series(1,1000000) v;
\timing
select * from atri1;
VACUUM atri1;

Results:
9.3devel(Tps) HintbitIO(Tps)
---------------------------------------------------
Test case - 1: 1.762946 1.922219
Test case - 2: 4.038673 4.044356

Pgbench without vacuum scale factor 75, 8 threads & 8 client. (refer reports
attached)
Default tables select : 64980.736149 64550.118693
Unlogged tables select: 64874.974334 64550.118693

Multiple transaction bulk inserts: Select performance (refer script -1 & 2
which attached)
sequential scan: 6.492680 6.382014
Index scan: 1.386851 1.36234

Single transaction bulk inserts: Select performance (refer script - 3 & 4
which attached)
sequential scan: 6.49319 6.3800147
Index scan: 1.384121 1.3615277

Long transaction open then Vacuum & select performance in milli seconds.
(refer reports output)
Testcase - 3:
Single Vacuum Perf : 128.302 ms 181.292 ms
Single select perf : 214.107 ms 177.057 ms
Total : 342.409 ms 358.349 ms

I was not able to find the reason why in some of cases results are low so
please use the attached scripts to validate the same.

I was not able to provide the IO statistics as IOSTAT & VMSTAT was giving
the current snapshot
but not the cumulative from start to end of test execution.

Documentation:
-------------
No user visible changes, so no documentation is need to update.

Regards,

Hari babu.

Attachment	Content-Type	Size
results.tar.gz	application/octet-stream	1.6 KB
scripts.tar.gz	application/octet-stream	8.0 KB

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-12-13 13:21:45
Message-ID:	CAOeZVidNvqiVLrdN+NXfwMsRdgMN_FOo3JtrYMH9uL7N0yz4MA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 13, 2012 at 6:36 PM, Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>wrote:

> On Thu, Dec 7, 2012 at 7:56 PM, Hari babu
> <haribabu(dot)kommi(at)Huawei(dot)com> wrote:****
>
> >>On Thu, Dec 6, 2012 at 8:52 PM, Merlin Moncure <mmoncure(at)gmail(dot)com>
> wrote:****
>
> >>Thanks for that -- that's fairly comprehensive I'd say. I'm quite ****
>
> ** **
>
> >>interested in that benchmarking framework as well. Do you need help ***
> *
>
> >>setting up the scripts?****
>
> ** **
>
> >Presently I am testing with pgbench custom query option & taking IO & VM
> statistics in parallel. ****
>
> >Following way I had written the test script for case -1.****
>
> ** **
>
> >./psql -f bench_test_1_init.sql postgres iostat -t 1 -d >
> hint.test1.iostat.reading_3.txt & vmstat -n 1 >
> >hint.test1.vmstat.reading_3.txt & ./pgbench -f bench_test_1.sql -T 300 -c
> 8 -j 8 -n postgres killall -s SIGINT iostat >killall -s SIGINT vmstat****
>
> ** **
>
> >Where the sql files are as follows:****
>
> >-- bench_test_1.sql****
>
> >select count(*) from bench where f1 is not null;****
>
> ** **
>
> >--bench_test_1_init.sql****
>
> >drop table if exists bench;****
>
> >create table bench(f0 int primary key, f1 char(50)); insert into bench
> values (generate_series(1, 100000), 'a'); insert >into bench values
> (generate_series(100001, 200000), 'a'); ...****
>
> >insert into bench values (generate_series(9800001, 9900000), 'a'); insert
> into bench values (generate_series(9900001, >10000000), 'a'); checkpoint;*
> ***
>
> ** **
>
> >I will provide the test results later.****
>
> ** **
>
> Please find the review of the patch.
>
> *Basic stuff:*
> ------------
> - Patch applies with offsets.
> - Compiles cleanly with no warnings
> - Regression Test pass.
>
> *Code Review:*
> -------------
> 1. Better to set the hint bits for the tuples in a page, if the
> page is already dirty.
>
> *Test cases:*
> *----------*
> * *Test cases are already described in the following link.
>
> http://archives.postgresql.org/message-id/00d301cdd398$6e3fff30$4abffd90$@kapila@huawei.com
>
> *Test Results:*
> *------------*
> Machine details:
> CPU cores : 4
> RAM : 24GB
> OS : Suse Linux 10 SP2
>
> ****
>
> Configuration:
> shared_buffers = 500MB
> autovacuum = off
> checkpoint_segments = 256
> checkpoint_timeout = 10min
>
> Following result are average of 3 runs each run is of 5min through pgbench
> custom query.
>
> *Test case - 1**:*
> Init:
> None
> Run:
> create temp table atri1 as select v from
> generate_series(1,1000000) v;
> select * from atri1;
> VACUUM atri1;
> DROP TABLE atri1;
>
> *Test case - 2**:*
> Init:
> create table atri1 as select v from
> generate_series(1,1000000) v;
> Run:
> select * from atri1;
>
> *Test case - 3**: (without pgbench)*
> connect two sessions s1, s2
> s1 : start transaction;
> s2 :
> create table atri1 as select v from
> generate_series(1,1000000) v;
> \timing
> select * from atri1;
> VACUUM atri1;
>
> *Results:*
> 9.3devel(Tps) HintbitIO(Tps)
> ---------------------------------------------------
> Test case - 1: 1.762946 1.922219
> Test case - 2: 4.038673 4.044356
>
> *Pgbench without vacuum scale factor 75, 8 threads & 8 client. (refer
> reports attached)*
> Default tables select : 64980.736149 64550.118693
> Unlogged tables select: 64874.974334 64550.118693
>
> *Multiple transaction bulk inserts: Select performance (refer script -1 &
> 2 which attached)*
> sequential scan: 6.492680 6.382014
> Index scan: 1.386851 1.36234
>
> *Single transaction bulk inserts: Select performance (refer script - 3 &
> 4 which attached)*
> sequential scan: 6.49319 6.3800147
> Index scan: 1.384121 1.3615277
>
> *Long transaction open then Vacuum & select performance in milli seconds.
> (refer reports output)*
> Testcase - 3:
> Single Vacuum Perf : 128.302 ms 181.292 ms
> Single select perf : 214.107 ms 177.057 ms
> Total : 342.409 ms 358.349 ms
>
> I was not able to find the reason why in some of cases results are low so
> please use the attached scripts to validate the same.
>
> I was not able to provide the IO statistics as IOSTAT & VMSTAT was giving
> the current snapshot
> but not the cumulative from start to end of test execution.
>
> *Documentation:*
> *-------------*
> No user visible changes, so no documentation is need to update.
>
>
> ****
>
> Regards,****
>
> Hari babu.****
>

Thanks for the review and tests.

The remarkable difference between 9.3devel and Hint Bit IO is present only
in test -3,right? I have the feeling that the original case we
discussed(vacuum setting the hint bits) is taking place here and hence the
decrease in performance.

Atri

--
Regards,

Atri
*l'apprenant*

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-12-13 14:32:21
Message-ID:	CAHyXU0zQtDsAzG-1D10p5cgQ6rx0Gf8RsZ8NSU55Yw-60cssbw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 13, 2012 at 7:06 AM, Hari Babu <haribabu(dot)kommi(at)huawei(dot)com> wrote:
> Please find the review of the patch.

Thanks for detailed review!

> Basic stuff:
> ------------
> - Patch applies with offsets.
> - Compiles cleanly with no warnings
> - Regression Test pass.
>
> Code Review:
> -------------
> 1. Better to set the hint bits for the tuples in a page, if the page
> is already dirty.

This is true today but likely less true if/when page checksums come
out. Also it complicates the code a little bit.

> Default tables select : 64980.736149 64550.118693
> Unlogged tables select: 64874.974334 64550.118693

So it looks like the extra tests visibility routines are causing 0.7%
performance hit.

> Multiple transaction bulk inserts: Select performance (refer script -1 & 2
> which attached)
> sequential scan: 6.492680 6.382014
> Index scan: 1.386851 1.36234
>
> Single transaction bulk inserts: Select performance (refer script - 3 & 4
> which attached)
> sequential scan: 6.49319 6.3800147
> Index scan: 1.384121 1.3615277

The performance hit is higher here. Almost 2%. This is troubling.

> Long transaction open then Vacuum & select performance in milli seconds.
> (refer reports output)
> Testcase - 3:
> Single Vacuum Perf : 128.302 ms 181.292 ms
> Single select perf : 214.107 ms 177.057 ms
> Total : 342.409 ms 358.349 ms
>
> I was not able to find the reason why in some of cases results are low so
> please use the attached scripts to validate the same.

I need to validate the vacuum results. It's possible that this is
solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
the question stands: do the results justify the change? I'd argue
'maybe' -- I'd like to see the bulk insert performance hit reduced if
possible. Let's see what we can do in the short term (and, if no
improvement can be found, I think this patch should be marked
'returned with feedback').

merlin

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Merlin Moncure'" <mmoncure(at)gmail(dot)com>, "'Hari Babu'" <haribabu(dot)kommi(at)huawei(dot)com>
Cc:	"'Greg Smith'" <greg(at)2ndquadrant(dot)com>, "'Atri Sharma'" <atri(dot)jiit(at)gmail(dot)com>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2012-12-14 13:57:53
Message-ID:	003101cdda03$04437820$0cca6860$@kapila@huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday, December 13, 2012 8:02 PM Merlin Moncure wrote:
> On Thu, Dec 13, 2012 at 7:06 AM, Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>
> wrote:
> > Please find the review of the patch.
>
>
> Thanks for detailed review!
>
> > Basic stuff:
> > ------------
> > - Patch applies with offsets.
> > - Compiles cleanly with no warnings
> > - Regression Test pass.
> >
> > Code Review:
> > -------------
> > 1. Better to set the hint bits for the tuples in a page, if
> the page
> > is already dirty.
>
> This is true today but likely less true if/when page checksums come
> out. Also it complicates the code a little bit.
>
> > Default tables select : 64980.736149 64550.118693
> > Unlogged tables select: 64874.974334 64550.118693
>
> So it looks like the extra tests visibility routines are causing 0.7%
> performance hit.
>
> > Multiple transaction bulk inserts: Select performance (refer script -1
> & 2
> > which attached)
> > sequential scan: 6.492680 6.382014
> > Index scan: 1.386851 1.36234
> >
> > Single transaction bulk inserts: Select performance (refer script - 3
> & 4
> > which attached)
> > sequential scan: 6.49319 6.3800147
> > Index scan: 1.384121 1.3615277
>
> The performance hit is higher here. Almost 2%. This is troubling.
>
> > Long transaction open then Vacuum & select performance in milli
> seconds.
> > (refer reports output)
> > Testcase - 3:
> > Single Vacuum Perf : 128.302 ms 181.292 ms
> > Single select perf : 214.107 ms 177.057 ms
> > Total : 342.409 ms 358.349 ms
> >
> > I was not able to find the reason why in some of cases results are low
> so
> > please use the attached scripts to validate the same.
>
> I need to validate the vacuum results. It's possible that this is
> solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
> the question stands: do the results justify the change? I'd argue
> 'maybe'

We can try with change (assuming change is small) and see if the performance
gain is good, then discuss whether it really justifies.
I think the main reason for Vacuum performance hit is that in the test pages
are getting dirty only due to setting of hint bit
by Vacuum.

>-- I'd like to see the bulk insert performance hit reduced if
> possible.

I think if we can improve performance for bulk-insert case, then this patch
has much more value.

With Regards,
Amit Kapila.

From:	Craig Ringer <craig(at)2ndQuadrant(dot)com>
To:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc:	"'Merlin Moncure'" <mmoncure(at)gmail(dot)com>, "'Hari Babu'" <haribabu(dot)kommi(at)huawei(dot)com>, "'Greg Smith'" <greg(at)2ndquadrant(dot)com>, "'Atri Sharma'" <atri(dot)jiit(at)gmail(dot)com>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2013-01-18 11:34:55
Message-ID:	50F9335F.7040003@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/14/2012 09:57 PM, Amit Kapila wrote:
>>
>> I need to validate the vacuum results. It's possible that this is
>> solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
>> the question stands: do the results justify the change? I'd argue
>> 'maybe'
> We can try with change (assuming change is small) and see if the performance
> gain is good, then discuss whether it really justifies.
> I think the main reason for Vacuum performance hit is that in the test pages
> are getting dirty only due to setting of hint bit
> by Vacuum.
>
>> -- I'd like to see the bulk insert performance hit reduced if
>> possible.
> I think if we can improve performance for bulk-insert case, then this patch
> has much more value.
Has there been any movement in this - more benchmarks and data showing
that it really does improve performance, or that it really isn't helpful?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2013-01-18 14:16:05
Message-ID:	CAHyXU0y0+q2OPWGuN54t+3k7i5Bq4Zq0eBx-ju7GEv=N2iqJvw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 18, 2013 at 5:34 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 12/14/2012 09:57 PM, Amit Kapila wrote:
>>>
>>> I need to validate the vacuum results. It's possible that this is
>>> solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
>>> the question stands: do the results justify the change? I'd argue
>>> 'maybe'
>> We can try with change (assuming change is small) and see if the performance
>> gain is good, then discuss whether it really justifies.
>> I think the main reason for Vacuum performance hit is that in the test pages
>> are getting dirty only due to setting of hint bit
>> by Vacuum.
>>
>>> -- I'd like to see the bulk insert performance hit reduced if
>>> possible.
>> I think if we can improve performance for bulk-insert case, then this patch
>> has much more value.
> Has there been any movement in this - more benchmarks and data showing
> that it really does improve performance, or that it really isn't helpful?

Atri is working on that. I don't know if he's going to pull it out
though in time so we may have to pull the patch from this fest. My
take on the current patch is that the upside case is pretty clear but
the bulk insert performance impact needs to be figured out and
mitigated (that's what Atri is working on).

merlin

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Craig Ringer <craig(at)2ndQuadrant(dot)com>
Cc:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2013-01-18 14:57:04
Message-ID:	0CE06BEF-DDB8-4DE4-A666-5E40240E7894@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 18-Jan-2013, at 17:04, Craig Ringer <craig(at)2ndQuadrant(dot)com> wrote:

> On 12/14/2012 09:57 PM, Amit Kapila wrote:
>>>
>>> I need to validate the vacuum results. It's possible that this is
>>> solvable by tweaking xmin check inside vacuum. Assuming that's fixed,
>>> the question stands: do the results justify the change? I'd argue
>>> 'maybe'
>> We can try with change (assuming change is small) and see if the performance
>> gain is good, then discuss whether it really justifies.
>> I think the main reason for Vacuum performance hit is that in the test pages
>> are getting dirty only due to setting of hint bit
>> by Vacuum.
>>
>>> -- I'd like to see the bulk insert performance hit reduced if
>>> possible.
>> I think if we can improve performance for bulk-insert case, then this patch
>> has much more value.
> Has there been any movement in this - more benchmarks and data showing
> that it really does improve performance, or that it really isn't helpful?
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

Hello all,

Sorry for the delay in updating the hackers list with the current status.

I recently did some profiling using perf on PostgreSQL 9.2 with and without our patch.

I noticed that maximum time is being spent on heapgettup function. Further investigation and a bit of a hunch leads me to believe that we may be adversely affecting the visibility map optimisation that directly interact with the visibility functions, that our patch straight away affects.

If this is the case, we may really need to get down to the design of our patch, and maybe see which visibility function/functions we are affecting, and see if we can mitigate the affect.

Please let me know your inputs on this.

Regards,

Atri

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2013-01-18 15:36:03
Message-ID:	CAHyXU0z67dQr-SbQqBpvhPrfF9vZj8O4qwDZKG_QGSB2SGy7WA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 18, 2013 at 8:57 AM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:
>
> Hello all,
>
> Sorry for the delay in updating the hackers list with the current status.
>
> I recently did some profiling using perf on PostgreSQL 9.2 with and without our patch.
>
> I noticed that maximum time is being spent on heapgettup function. Further investigation and a bit of a hunch leads me to believe that we may be adversely affecting the visibility map optimisation that directly interact with the visibility functions, that our patch straight away affects.
>
> If this is the case, we may really need to get down to the design of our patch, and maybe see which visibility function/functions we are affecting, and see if we can mitigate the affect.
>
> Please let me know your inputs on this.

Any scenario that involves non-trivial amount of investigation or
development should result in us pulling the patch for rework and
resubmission in later 'fest....it's closing time as they say :-).

merlin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2013-01-18 15:50:27
Message-ID:	CA+TgmoYmWcLZz+dXMY-+v3=AkC2ft1BSTe-vs6DPP5E6vE+P9w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 18, 2013 at 10:36 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> Any scenario that involves non-trivial amount of investigation or
> development should result in us pulling the patch for rework and
> resubmission in later 'fest....it's closing time as they say :-).

Amen.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Craig Ringer <craig(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for hint bit i/o mitigation
Date:	2013-01-21 01:13:03
Message-ID:	50FC961F.8020109@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/18/2013 11:50 PM, Robert Haas wrote:
> On Fri, Jan 18, 2013 at 10:36 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> Any scenario that involves non-trivial amount of investigation or
>> development should result in us pulling the patch for rework and
>> resubmission in later 'fest....it's closing time as they say :-).
> Amen.
>
OK, bumped to the next CF.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services