Re: dynamic shared memory

From: Jim Nasby <jim(at)nasby(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dynamic shared memory
Date: 2013-09-04 22:38:53
Message-ID: 5227B67D.8090108@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/31/13 7:17 AM, Robert Haas wrote:
> On Thu, Aug 29, 2013 at 8:12 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>> On 8/13/13 8:09 PM, Robert Haas wrote:
>>> is removed, the segment automatically goes away (we could allow for
>>> server-lifespan segments as well with only trivial changes, but I'm
>>> not sure whether there are compelling use cases for that).
>>
>> To clarify... you're talking something that would intentionally survive
>> postmaster restart? I don't see use for that either...
>
> No, I meant something that would live as long as the postmaster and
> die when it dies.

ISTM that at some point we'll want to look at putting top-level shared memory into this system (ie: allowing dynamic resizing of GUCs that affect shared memory size).

But as you said, it'd be trivial to add that later.

>> Other comments...
>>
>> + * If the state file is empty or the contents are garbled, it probably
>> means
>> + * that the operating system rebooted before the data written by the
>> previous
>> + * postmaster made it to disk. In that case, we can just ignore it; any
>> shared
>> + * memory from before the reboot should be gone anyway.
>>
>> I'm a bit concerned about this; I know it was possible in older versions for
>> the global shared memory context to be left behind after a crash and needing
>> to clean it up by hand. Dynamic shared mem potentially multiplies that by
>> 100 or more. I think it'd be worth changing dsm_write_state_file so it
>> always writes a new file and then does an atomic mv (or something similar).
>
> I agree that the possibilities for leftover shared memory segments are
> multiplied with this new facility, and I've done my best to address
> that. However, I don't agree that writing the state file in a
> different way would improve anything.

Wouldn't it protect against a crash while writing the file? I realize the odds of that are pretty remote, but AFAIK it wouldn't cost that much to write a new file and do an atomic mv...

>> + * If some other backend exited uncleanly, it might have corrupted
>> the
>> + * control segment while it was dying. In that case, we warn and
>> ignore
>> + * the contents of the control segment. This may end up leaving
>> behind
>> + * stray shared memory segments, but there's not much we can do
>> about
>> + * that if the metadata is gone.
>>
>> Similar concern... in this case, would it be possible to always write
>> updates to an un-used slot and then atomically update a pointer? This would
>> be more work than what I suggested above, so maybe just a TODO for now...
>>
>> Though... is there anything a dying backend could do that would corrupt the
>> control segment to the point that it would screw up segments allocated by
>> other backends and not related to the dead backend? Like marking a slot as
>> not used when it is still in use and isn't associated with the dead backend?
>
> Sure. A messed-up backend can clobber the control segment just as it
> can clobber anything else in shared memory. There's really no way
> around that problem. If the control segment has been overwritten by a
> memory stomp, we can't use it to clean up. There's no way around that
> problem except to not the control segment, which wouldn't be better.

Are we trying to protect against "memory stomps" when we restart after a backend dies? I thought we were just trying to ensure that all shared data structures were correct and consistent. If that's the case, then I was thinking that by using a pointer that can be updated in a CPU-atomic fashion we know we'd never end up with a corrupted entry that was in use; the partial write would be to a slot with nothing pointing at it so it could be safely reused.

Like I said before though, it may not be worth worrying about this case right now.

>> Should dsm_impl_op sanity check the arguments after op? I didn't notice
>> checks in the type-specific code but I also didn't read all of it... are we
>> just depending on the OS to sanity-check?
>
> Sanity-check for what?

Presumably there's limits to what the arguments can be rationally set to. IIRC there's nothing down-stream that's checking them in our code, so I'm guessing we're just depending on the kernel to sanity-check.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-09-04 22:39:37 Re: INSERT...ON DUPLICATE KEY IGNORE
Previous Message arthernan 2013-09-04 22:14:02 De-normalization optimizer research project