Re: Hard limit on WAL space used (because PANIC sucks)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)heroku(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Hard limit on WAL space used (because PANIC sucks)
Date: 2014-01-22 00:23:57
Message-ID: 21402.1390350237@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-01-21 18:59:13 -0500, Tom Lane wrote:
>> Another thing to think about is whether we couldn't put a hard limit on
>> WAL record size somehow. Multi-megabyte WAL records are an abuse of the
>> design anyway, when you get right down to it. So for example maybe we
>> could split up commit records, with most of the bulky information dumped
>> into separate records that appear before the "real commit". This would
>> complicate replay --- in particular, if we abort the transaction after
>> writing a few such records, how does the replayer realize that it can
>> forget about those records? But that sounds probably surmountable.

> I think removing the list of subtransactions from commit records would
> essentially require not truncating pg_subtrans after a restart
> anymore.

I'm not suggesting that we stop providing that information! I'm just
saying that we perhaps don't need to store it all in one WAL record,
if instead we put the onus on WAL replay to be able to reconstruct what
it needs from a series of WAL records.

> We could relatively easily split of logging the dropped files from
> commit records and log them in groups afterwards, we already have
> several races allowing to leak files.

I was thinking the other way around: emit the subsidiary records before the
atomic commit or abort record, indeed before we've actually committed.
Part of the point is to reduce the risk that lack of WAL space would
prevent us from fully committing. Also, writing those records afterwards
increases the risk of a post-commit failure, which is a bad thing.

Replay would then involve either accumulating the subsidiary records in
memory, or being willing to go back and re-read them when the real commit
or abort record is seen.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-01-22 00:30:48 Re: Hard limit on WAL space used (because PANIC sucks)
Previous Message Andres Freund 2014-01-22 00:21:05 Re: Hard limit on WAL space used (because PANIC sucks)