Re: Dynamic Shared Memory stuff

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Dynamic Shared Memory stuff
Date: 2013-12-10 17:50:20
Message-ID: 52A7545C.4030405@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/10/2013 07:27 PM, Noah Misch wrote:
> On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote:
>> On 11/20/2013 09:58 PM, Robert Haas wrote:
>>> On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>>>> * As discussed in the "Something fishy happening on frogmouth" thread, I
>>>> don't like the fact that the dynamic shared memory segments will be
>>>> permanently leaked if you kill -9 postmaster and destroy the data directory.
>>>
>>> Your test elicited different behavior for the dsm code vs. the main
>>> shared memory segment because it involved running a new postmaster
>>> with a different data directory but the same port number on the same
>>> machine, and expecting that that new - and completely unrelated -
>>> postmaster would clean up the resources left behind by the old,
>>> now-destroyed cluster. I tend to view that as a defect in your test
>>> case more than anything else, but as I suggested previously, we could
>>> potentially change the code to use something like 1000000 + (port *
>>> 100) with a forward search for the control segment identifier, instead
>>> of using a state file, mimicking the behavior of the main shared
>>> memory segment. I'm not sure we ever reached consensus on whether
>>> that was overall better than what we have now.
>>
>> I really think we need to do something about it. To use your earlier
>> example of parallel sort, it's not acceptable to permanently leak a 512
>> GB segment on a system with 1 TB of RAM.
>
> I don't. Erasing your data directory after an unclean shutdown voids any
> expectations for a thorough, automatic release of system resources. Don't do
> that. The next time some new use of a persistent resource violates your hope
> for this scenario, there may be no remedy.

Well, the point of erasing the data directory is to release system
resources. I would normally expect "killall -9 <process>; rm -rf <data
dir>" to thorougly get rid of the running program and all the resources.
It's surprising enough that the regular shared memory segment is left
behind, but at least that one gets cleaned up when you start a new
server (on same port). Let's not add more cases like that, if we can
avoid it.

BTW, what if the data directory is seriously borked, and the server
won't start? Sure, don't do that, but it would be nice to have a way to
recover if you do anyway. (docs?)

>> One idea is to create the shared memory object with shm_open, and wait
>> until all the worker processes that need it have attached to it. Then,
>> shm_unlink() it, before using it for anything. That way the segment will
>> be automatically released once all the processes close() it, or die. In
>> particular, kill -9 will release it. (This is a variant of my earlier
>> idea to create a small number of anonymous shared memory file
>> descriptors in postmaster startup with shm_open(), and pass them down to
>> child processes with fork()). I think you could use that approach with
>> SysV shared memory as well, by destroying the segment with
>> sgmget(IPC_RMID) immediately after all processes have attached to it.
>
> That leaves a window in which we still leak the segment,

A small window is better than a large one.

Another refinement is to wait for all the processes to attach before
setting the segment's size with ftruncate(). That way, when the window
is open for leaking the segment, it's still 0-sized so leaking it is not
a big deal.

> and it is less
> general: not every use of DSM is conducive to having all processes attach in a
> short span of time.

Let's cross that bridge when we get there. AFAICS it fits all the use
cases discussed this far.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-12-10 18:38:32 Re: Errors on missing pg_subtrans/ files with 9.3
Previous Message Antonin Houska 2013-12-10 17:44:01 Re: Backup throttling