Do we need so many hint bits?

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Do we need so many hint bits?
Date: 2012-11-16 00:42:57
Message-ID: 1353026577.14335.91.camel@sussancws0025
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Related to discussion here:
http://archives.postgresql.org/message-id/CAHyXU0zn5emePLedoZUGrAQiF92F-YjvFr-P5vUh6n0WpKZ6PQ@mail.gmail.com

It occurred to me recently that many of the hint bits aren't terribly
important (at least it's not obvious to me). HEAP_XMIN_COMMITTED clearly
has a purpose, and we'd expect it to be used many times following the
initial CLOG lookup.

But the other tuple hint bits seem to be there just for symmetry,
because they shouldn't last long. If HEAP_XMIN_INVALID or
HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed
soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it
should just be changed to InvalidTransactionId.

Removing those 3 hints would give us 3 more flag bits (eventually, after
we are sure they aren't just leftover), and it would also reduce the
chance that a page is dirtied for no other reason than to set them. It
might even take a few cycles out of the tqual.c routines, or at least
reduce the code size. Not a huge win, but I don't see much downside
either.

Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced
in the visibility map patch, apparently as a way to know when to clear
the VM bit when doing an update. It was then also used for scans, which
showed a significant speedup. But I wonder: why not just use the
visibilitymap directly from those places? It can be used for the scan
because it is crash safe now (not possible before). And since it's only
one lookup per scanned page, then I don't think it would be a measurable
performance loss there. Inserts/updates/deletes also do a significant
amount of work, so again, I doubt it's a big drop in performance there
-- maybe under a lot of concurrency or something.

The benefit of removing PD_ALL_VISIBLE would be significantly higher.
It's quite common to load a lot of data, and then do some reads for a
while (setting hint bits and flushing them to disk), and then do a
VACUUM a while later, setting PD_ALL_VISIBLE and writing all of the
pages again. Also, if I remember correctly, Robert went to significant
effort when making the VM crash-safe to keep the PD_ALL_VISIBLE and VM
bits consistent. Maybe this was all discussed before?

All of these hint bits will have a bit more of a performance impact
after checksums are introduced (for those that use them in conjunction
with large data loads), so I'm looking for some simple ways to mitigate
those effects. What kind of worst-case tests could I construct to see if
there are worrying performance effects to removing these hint bits?

Regards,
Jeff Davis

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2012-11-16 01:20:51 Re: Doc patch making firm recommendation for setting the value of commit_delay
Previous Message Jeff Davis 2012-11-16 00:42:15 Re: WIP patch for hint bit i/o mitigation