Re: GSoC proposal - "make an unlogged table logged"

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GSoC proposal - "make an unlogged table logged"
Date: 2014-03-04 17:54:02
Message-ID: CA+TgmoZPer6CXmujPx2YU4y6rn4JXE1rVWFHYm089fJ9Jvs33Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 4, 2014 at 9:50 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-03-04 09:47:08 -0500, Robert Haas wrote:
>> On Mon, Mar 3, 2014 at 12:08 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> > * Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
>> >> On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
>> >> <fabriziomello(at)gmail(dot)com> wrote:
>> >> > Is the TODO item "make an unlogged table logged" [1] a good GSoC project?
>> >>
>> >> I'm pretty sure we found some problems in that design that we couldn't
>> >> figure out how to solve. I don't have a pointer to the relevant
>> >> -hackers discussion off-hand, but I think there was one.
>> >
>> > ISTR the discussion going something along the lines of "we'd have to WAL
>> > log the entire table to do that, and if we have to do that, what's the
>> > point?".
>>
>> No, not really. The issue is more around what happens if we crash
>> part way through. At crash recovery time, the system catalogs are not
>> available, because the database isn't consistent yet and, anyway, the
>> startup process can't be bound to a database, let alone every database
>> that might contain unlogged tables. So the sentinel that's used to
>> decide whether to flush the contents of a table or index is the
>> presence or absence of an _init fork, which the startup process
>> obviously can see just fine. The _init fork also tells us what to
>> stick in the relation when we reset it; for a table, we can just reset
>> to an empty file, but that's not legal for indexes, so the _init fork
>> contains a pre-initialized empty index that we can just copy over.
>>
>> Now, to make an unlogged table logged, you've got to at some stage
>> remove those _init forks. But this is not a transactional operation.
>> If you remove the _init forks and then the transaction rolls back,
>> you've left the system an inconsistent state. If you postpone the
>> removal until commit time, then you have a problem if it fails,
>> particularly if it works for the first file but fails for the second.
>> And if you crash at any point before you've fsync'd the containing
>> directory, you have no idea which files will still be on disk after a
>> hard reboot.
>
> Can't that be solved by just creating the permanent relation in a new
> relfilenode? That's equivalent to a rewrite, yes, but we need to do that
> for anything but wal_level=minimal anyway.

Yes, that would work. I've tended to view optimizing away the
relfilenode copy as an indispensable part of this work, but that might
be wrongheaded. It would certainly be a lot easier to make this
happen if we didn't insist on that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-03-04 18:12:42 Re: Fwd: patch: make_timestamp function
Previous Message Andres Freund 2014-03-04 17:50:17 Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To: