Re: WIP: Access method extendability

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: WIP: Access method extendability
Date: 2014-11-24 10:52:45
Message-ID: 54730DFD.2060703@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/10/2014 10:30 PM, Alexander Korotkov wrote:
> Don't allowing CREATE ACCESS METHOD command seems problematic for me. How
> could it work with pg_upgrade? pg_dump wouldn't dump extra pg_am records.
> So, pg_upgrade would break at creating operator classes on new cluster. So,
> I agree with dropping create am command only if we let pg_dump to dump
> extra pg_am records...

pg_dump would dump the CREATE EXTENSION command, and the extension's
installation script inserts the row to pg_am. pg_dump doesn't dump
objects that are part of an extension, so that's how this would work
with the CREATE ACCESS METHOD command, too.

Backtracking a bit, to summarize the discussion so far:

* It would be nice to have indexes that are not WAL-logged, but are
automatically re-created after a crash. But it's a completely different
and orthogonal feature, so there's no need to discuss that further in
this thread.

* If an extension is buggy, it can crash and corrupt the whole database.
There isn't really anything we can do about that, and this patch doesn't
make that any better or worse.

* CREATE ACCESS METHOD command isn't worth it.

Looking at the patch itself:

* It has bitrotted, thanks to my WAL format changes.

* The memcpy/memmove based API seems difficult to use correctly. Looking
at the bloom filter example, it seems that you need a lot more code to
implement WAL-logging with this, than you would with a rmgr-specific
redo function. I was hoping this to make it simpler.

I think the API has to be more high-level. Or at least also provide a
higher-level API, in addition to the low level one. Looking at the
indexams we have, almost all pages use the standard page layout, with
PageAddItem etc., plus a metapage, plus some indexam-specific metadata
in the special area. The proposed API makes it pretty complicated to
deal with that common case. After calling PageAddItem, you need intimate
knowledge of what PageAddItem did, to create a correct WAL record for
it. For PageIndexMultiDelete, it's next to impossible.

I think we'll have to accept that the WAL records with this generic API
are going to be much bulkier than ones tailored for the AM. We could
provide a compact representation for common operations like PageAddItem,
but for any more complicated operations, just WAL-log a full page image,
as it's too fiddly to track the changes at a finer level. For any
serious performance critical stuff, you'll just have to write an
old-fashioned rmgr.

(marking this as "returned with feedback" in the commitfest)

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-11-24 11:16:24 Re: Sequence Access Method WIP
Previous Message Jirayut Nimsaeng 2014-11-24 09:55:50 BDR duplicate key value violates unique constraint error