Re: btreecheck extension

Lists: pgsql-hackers
From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: btreecheck extension
Date: 2014-06-17 01:47:30
Message-ID: CAM3SWZRtV+xmRWLWq6c-x7czvwavFdwFi4St1zz4dDgFH4yN4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

As discussed at the developer meeting at pgCon, I think that there is
a lot to be said for a tool that checks nbtree index invariants on
live systems.

Attached prototype patch adds contrib extension, btreecheck. This
extension provides SQL-callable functions for checking these
conditions on nbtree indexes on live systems. This code generally
works by using the same insertion ScanKey infrastructure that the
B-Tree code itself uses in servicing all kinds of index scans.

There are two major goals for this patch. In order of their importance
in my mind:

1. Improve testing around the B-Tree code. Give 9.4 beta testers the
tools needed to detect subtle errors in a semi-automated way. I'm
quite encouraged by recent efforts along these lines by Jeff Janes,
Heikki, and others. Beyond 9.4, I think that stress testing is likely
to become indispensable as more complex performance optimizations are
pursued. Personally, I see some good opportunities to improve the
performance of the B-Tree code, and if those ideas are ever to be put
into action, there is clearly a big need for automated stress-testing.
I see this tool as a start. I'm not just talking about my recent work
on sort support for text, and the idea of generalizing that; I think
that there are many techniques that we could pursue for decreasing the
I/O and CPU overhead of using B-Tree indexes that are just too risky
to pursue today because of the subtle interactions of many moving
parts. I'd like to lower the risk.

2. Allow users to sanity check indexes in production. This is not my
immediate goal here, though; I will not add this patch to the next
CommitFest. That's something to think about in more detail a bit
later. Clearly there is a big demand for this kind of thing, though.

Since I believe the B-Tree code changes of 9.4 are generally thought
to be the most plausible source of serious bugs in 9.4 purely because
of their subtlety and fundamental importance, it seemed valuable to
get this prototype into the hands of beta testers as soon as possible.
For that reason, I have not given some aspects of the design as much
thought as I would have preferred. I haven't really come up with a
plausible plan of attack for finding bugs in the B-Tree code. Rather,
I've more or less just considered what invariant conditions exist that
can be correctly verified without too much effort, and then
implemented checks for each. I've also stuck to a design that only has
the tool share lock a single B-Tree buffer at a time, and copy the
contents into local memory before operating on that (relation level
heavyweight locks are used in various different ways too), in a
similar manner to contrib/pageinspect. Copying the entire page into
local memory is probably the right thing to do even from a performance
perspective, since in each case the entire page is inspected, which
takes much longer than, say, a binary search. Even though it's
supposed to be possible to run this on a live system without causing
too much disruption, subtle buffer locking protocols were avoided
(although there is a rationale for why things are okay in the module,
that considers current buffer locking protocols particular to the
B-Tree code). This is a principle that I think I can stick to, and
would certainly prefer to stick to for obvious reasons.

The checks performed (the entire set of invariant conditions checked
by all SQL-callable functions in the patch) are currently:

* That data items on each B-Tree page are in logical order.

* That non-rightmost pages (which are the only pages that have a high
key) have only data items that respect that high key as an upper
bound.

* That a down-link data item in a parent page points to a child page
where the parent down-link data item is respected as a lower bound.

* That a down-link data item in a parent page points to a child pages
whose high key (if any) is equal to the next data item on the parent
page (if any), while taking into account incomplete page splits and
right-most-ness.

* That a down-link data item in a parent page points to a child page
whose right link (if any) points to the same block/page as pointed to
by the next data item on the parent page (if any). In other words, the
children and their parent are in agreement about the children being
siblings, while taking into account incomplete page splits and
right-most-ness. In other other words, the second phase of a page
split where a downlink is inserted into the parent was sane (so this
is similar to the previous invariant check listed).

The two prior checks may not be sufficient, though, if we want to
operate with weak, non-disruptive locks -- what about cousins and
grandparents, and even great-grandparents and second cousins? It's a
good start, though.

* That on each level, starting at the true root level and working down
the the leaf level (level 0), there is mutual agreement between
siblings that they are siblings (that is, their right and left links
comport).

I won't go into locking protocols and their correctness here. That's
something that I would like others to look over, though - there are
fairly detailed comments. I would also like to coordinate with anyone
who thinks they can come up with a good stress testing workload that
introduces a lot of variability for the B-Tree code to deal with. I
think it's appropriate that that general effort should mostly drive
the development of this tool, and not the other way around.

When you consider the difficulty of isolating the kind of bugs we're
looking for here, a lower lock strength is not just a nice-to-have. I
think it might actually be an important contribution to the
effectiveness of the tool generally. I envisage tests running with a
high amount of concurrency on big databases over days or even longer.

Thoughts?
--
Peter Geoghegan

Attachment Content-Type Size
btreecheck-prototype.2014_06_16.patch text/x-patch 28.0 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: btreecheck extension
Date: 2014-06-17 20:16:28
Message-ID: CA+TgmoZi06Mac504ftA8E6f0_x0kjx2ybz0Es_Kp90=GAXG-YA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 16, 2014 at 9:47 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> As discussed at the developer meeting at pgCon, I think that there is
> a lot to be said for a tool that checks nbtree index invariants on
> live systems.

Me too.

> Attached prototype patch adds contrib extension, btreecheck.

I don't feel qualified to comment on any of the substantive issues you
raise, so instead I'd like to bikeshed the name. I suggest that we
create one extension to be a repository for index-checking machinery
(and perhaps also heap-checking machinery) and view this as the first
of possibly several checkers to live there. Otherwise, we may
eventually end up with separate extensions for btreecheck, hashcheck,
gistcheck, gincheck, spgistcheck, minmaxcheck, vodkacheck, heapcheck,
toastcheck, etc. which seems like excessive namespace pollution.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: btreecheck extension
Date: 2014-06-17 21:10:15
Message-ID: CAM3SWZQ_69d+D+P1j=bafy3d8OpnmpgCbkWoma09OiTiP1exuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 17, 2014 at 1:16 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I don't feel qualified to comment on any of the substantive issues you
> raise, so instead I'd like to bikeshed the name. I suggest that we
> create one extension to be a repository for index-checking machinery
> (and perhaps also heap-checking machinery) and view this as the first
> of possibly several checkers to live there. Otherwise, we may
> eventually end up with separate extensions for btreecheck, hashcheck,
> gistcheck, gincheck, spgistcheck, minmaxcheck, vodkacheck, heapcheck,
> toastcheck, etc. which seems like excessive namespace pollution.

I agree.

I hope that we'll eventually be able to move code like this into each
AM, with something like an amverifyintegrity pg_am entry optionally
provided. There'd also be a utility statement that would perform this
kind of verification. It seems natural to do this, as the patch I've
posted arguably adds a big modularity violation. Besides, it seems
worthwhile to pepper the regular regression tests with calls like
these, at least in some places, and putting something in core is the
most convenient way to do that.

--
Peter Geoghegan


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: btreecheck extension
Date: 2014-06-18 00:11:38
Message-ID: CA+TgmoYsGfLCTTWx7+LA825fad_M69t1eigvkq-vgkEzA1piCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 17, 2014 at 5:10 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Tue, Jun 17, 2014 at 1:16 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I don't feel qualified to comment on any of the substantive issues you
>> raise, so instead I'd like to bikeshed the name. I suggest that we
>> create one extension to be a repository for index-checking machinery
>> (and perhaps also heap-checking machinery) and view this as the first
>> of possibly several checkers to live there. Otherwise, we may
>> eventually end up with separate extensions for btreecheck, hashcheck,
>> gistcheck, gincheck, spgistcheck, minmaxcheck, vodkacheck, heapcheck,
>> toastcheck, etc. which seems like excessive namespace pollution.
>
> I agree.
>
> I hope that we'll eventually be able to move code like this into each
> AM, with something like an amverifyintegrity pg_am entry optionally
> provided. There'd also be a utility statement that would perform this
> kind of verification. It seems natural to do this, as the patch I've
> posted arguably adds a big modularity violation. Besides, it seems
> worthwhile to pepper the regular regression tests with calls like
> these, at least in some places, and putting something in core is the
> most convenient way to do that.

I think there's something to be said for that, but I think at the
moment I like the idea of a functional interface better. The reason
is that I'm not sure we can predict all of the checks we're going to
want to add. For example, maybe somebody will come up with another
btree checker that's different from your btree checker, and maybe
there will be good reasons not to merge the two - e.g. different
locking requirements, or different runtimes, or whatever. Even more
likely, I think there will be things we want to do that fall under the
broad umbrella of integrity checking that we just can't predict now:
scan the table and check whether all the keys are present in every
index, scan the TOAST table and make sure that all the chunks are the
right size, etc. If we have a bunch of functions for this sort of
thing, it's easy and relatively uncontroversial to add more. If we
put it in core and give it real grammar support, then we've got to
fight with keywords and contemplate grammar bloat and, basically, I
think every change will be likely to get a higher level of scrutiny.
I'd rather not go there.

Now, we could. We could come up with an extensible syntax, like this:

CHECK relation [ USING { checktype [ '(' arg [, ...] '}' [, ...] ];

But frankly I'm kind of uncompelled by that. This isn't a feature
that seems to me to really need to be in core. It doesn't
particularly need grammar support, and it doesn't need WAL logging,
and not everyone needs it at all, and especially if we eventually end
up with a robust suite of tools in this area, not everyone may even
want it, if it means a bigger installed footprint or more security
concerns to worry about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: btreecheck extension
Date: 2014-06-18 00:57:23
Message-ID: CAM3SWZQeH0O==c+UXOTQh5n0JBOzdjpf4WLaNYnVKBxrMfxJ6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 17, 2014 at 5:11 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think there's something to be said for that, but I think at the
> moment I like the idea of a functional interface better. The reason
> is that I'm not sure we can predict all of the checks we're going to
> want to add.

That's true. Clearly this is very hand-wavy, and as I said I think
it's appropriate that the tool evolve in response to our testing
requirements, but I would like to figure out a way to have
verification tooling available for some kind of semi-standard tests,
possibly even including the standard regression tests. It's probably
too early to discuss, though.

> Now, we could. We could come up with an extensible syntax, like this:
>
> CHECK relation [ USING { checktype [ '(' arg [, ...] '}' [, ...] ];

That's what I had in mind. Using the same trick that you came up with
for EXPLAIN, to avoid grammar bloat and let the am figure out for
itself what to name the various check types, with a generic default
check.

--
Peter Geoghegan


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: btreecheck extension
Date: 2014-06-18 11:48:56
Message-ID: 20140618114856.GN16098@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Peter Geoghegan (pg(at)heroku(dot)com) wrote:
> On Tue, Jun 17, 2014 at 5:11 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > Now, we could. We could come up with an extensible syntax, like this:
> >
> > CHECK relation [ USING { checktype [ '(' arg [, ...] '}' [, ...] ];
>
> That's what I had in mind. Using the same trick that you came up with
> for EXPLAIN, to avoid grammar bloat and let the am figure out for
> itself what to name the various check types, with a generic default
> check.

I'm fine with having these start out as external tools which are doing
checks, but I've been specifically asked about (and have desired myself
from time-to-time...) an in-core capability to check index/heap/etc
validity. Folks coming from certain other RDBMS's find it amazing that
we don't have any support for that when what they really want is a
background worker which is just automatically going around doing these
checks.

Now, perhaps we could have the background worker without the syntax for
running these by hand, but I don't particularly like that idea. Being
able to run these checks by hand is extremely useful and I'd much prefer
to be able to force that than to have some mechanism where I have to
submit a request for a check to another process through a queue or
something along those lines.

Thanks,

Stephen


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: btreecheck extension
Date: 2014-06-18 22:08:01
Message-ID: CAM3SWZTN2yZWuv1j_5XdK9CVu4q3h6t+DNedpzd6mUDVTPisLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 18, 2014 at 4:48 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> I'm fine with having these start out as external tools which are doing
> checks, but I've been specifically asked about (and have desired myself
> from time-to-time...) an in-core capability to check index/heap/etc
> validity. Folks coming from certain other RDBMS's find it amazing that
> we don't have any support for that when what they really want is a
> background worker which is just automatically going around doing these
> checks.

Yes, I think that being able to verify the integrity of index and heap
relations is an important feature, and I think it's notable that we're
the only major RDBMS that lacks this support. I'm not quite sure what
that should entail just yet, but clearly it should be somewhat like
what I have here. I think the big open questions to make something
like this work are mostly around the cost/benefit ratio of each of the
checks I've outlined. Certainly, for that use case minimizing
disruption on a live system becomes even more important. I'll probably
look at a buffer access strategy, just to give an example of that off
the top of my head.

--
Peter Geoghegan


From: Bernd Helmle <mailings(at)oopsware(dot)de>
To: Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: btreecheck extension
Date: 2015-10-07 09:59:28
Message-ID: EB4A54B66956E59757F64E37@eje.credativ.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

--On 16. Juni 2014 18:47:30 -0700 Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> Attached prototype patch adds contrib extension, btreecheck. This
> extension provides SQL-callable functions for checking these
> conditions on nbtree indexes on live systems.

What's the current state of this module? I see you are interested in stress
testing, but i'm not sure how far this all is gone?

This tool actually served a very good job during identifying index
corruption due to collation issues[1].

[1]
<http://www.postgresql.org/message-id/CB4D1C6BAA80CF146CB0D4F2@eje.credativ.lan>

--
Thanks

Bernd


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Bernd Helmle <mailings(at)oopsware(dot)de>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: btreecheck extension
Date: 2015-10-07 10:06:35
Message-ID: CAM3SWZTiTBH3QW=mkjXf=MzLYGL2pFA88VFewfJcdUEx+3ZGdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 7, 2015 at 2:59 AM, Bernd Helmle <mailings(at)oopsware(dot)de> wrote:
> What's the current state of this module? I see you are interested in stress
> testing, but i'm not sure how far this all is gone?
>
> This tool actually served a very good job during identifying index
> corruption due to collation issues[1].

I tweaked it a bit. I actually don't really know if I'll pick it up
anytime soon. I'm glad that it helped you, but the goals for the tool
need to be thrashed out a bit more. We should discuss it again.

--
Peter Geoghegan