Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-06 04:57:17
Message-ID: 4CFC6D2D.8090803@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22/11/2010 7:37 PM, Magnus Hagander wrote:

> Finally getting to looking at this one - sorry about the very long delay.

Ditto, I'm afraid.

> I agree with Heikki's earlier comment that it's better to have this
> included in the backend - but that's obviously not going to happen for
> already-released versions. I'd therefor advocate a contrib module for
> existing versions, and then in core for 9.1+.

I think that was the conclusion the following discussion came to, as
well. It's easy to distribute a binary .dll for older versions and if
the EDB folks found this facility really useful they could even bundle
it in the installer. There's no point backpatching it given how few
win32 users will build from source.

> We should then have an option to turn it off (on by default). But we
> don't need to pay the overhead on every backend startup - we could
> just map the value into the parameter shared memory block, and require
> a full postmaster restart in order to change it.

As discussed later, implemented by testing for a 'crashdumps' directory.
If the directory exists, a dump is written. If the directory is missing
a log message is emitted to explain why no dump was written; as this'll
only happen when a backend dies, log noise isn't an issue and it'll help
save sysadmin head-scratching given the "magic" behaviour of turning
on/off based on directory existence.

> Do we want to backpatch it into contrib/? Adding a new module there
> seems kind of wrong - probably better to keep the source separate and
> just publish the DLL files for people who do debugging?

+1 to just publishing the DLLs

> Looking at the code:
> * Why do we need to look at differnt versions of dbghelp.dll? Can't we
> always use the one with Windows? And in that case, can't we just use
> compile-time linking, do we need to bother with DLL loading at all?

Newer versions than those shipped with an OS release contain updated and
improved functionality as well as bug fixes. We *can* use the version
shipped with the OS, as even in XP dbghelp.dll contains the required
functionality, but it'd be better to bundle it. That's microsoft's
recommendation when you use dbghelp.dll .

> * Per your comment about using elog() - a good idea in that case is to
> use write_stderr(). That will send the output to stderr if there is
> one, and otherwise the eventlog (when running as service).

Hm, ok. The attached patch doesn't do that yet. Given the following
comments, do you think the change still necessary?

> * And yes, in the crash handler, we should *definitely* not use elog().

Currently I'm actually using it, but only after the dump has been
written or we've decided not to write a dump. At this point, if elog()
fails we don't really care, we'd just be returning control to the OS's
fatal exception cleanup anyway. It's IMO desirable for output to go to
the same log as everything else, so admins can find out what's going on
and can associate the crash dump files with events that happened around
a certain time.

> * On Unix, the core file is dropped in the database directory, we
> don't have a separate directory for crashdumps. If we want to be
> consistent, we should do that here too. I do think that storing them
> in a directory like "crashdumps" is better, but I just wanted to raise
> the comment.

Given the recent discussion about using the dir as an on/off switch,
that'll be necessary.

> * However, when storing it in crashdumps, I think the code would need
> to create that directory if it does not exist, doesn't it?

Not now that we're using it an on/off switch, but of course it would've
made sense otherwise.

> * Right now, we overwrite old crashdumps. It is well known that PIDs
> are recycled pretty quickly on Windows - should we perhaps dump as
> postgres-<pid>-<sequence>.mdmp when there is a filename conflict?

Good idea. Rather than try to test for existing files I've just taken
the simple route (always good in code run during process crash) and
appended the system ticks to the dump name. System ticks are a 32-bit
quantity that wraps about every 40 days, so they should be easily unique
enough in combination with the pid. The logs will report the crashdump
filenames unless the backend is too confused to be able to elog(), and
all the crash dump files have file system timestamps so there isn't much
point trying to format a crash date/time into their names.

I could use wall clock seconds (or ms) instead, but favour system ticks
because they can be read with a simple int-returning call that doesn't
require any more stack allocation. On Windows, getting wall clock time
in seconds seems to involve a call to GetSystemTimeAsFileTime
(http://msdn.microsoft.com/en-us/library/ms724397(v=VS.85).aspx) to
populate a struct containing a fake-64-bit int as two 32-bit ints. It's
not clear that it's free of timezone calculations and generally looks
like something better avoided in a crash handler if it's not necessary.
As GetSystemTicks seems quite sufficient, I thought I'd stick with it.

--
Craig Ringer

Attachment Content-Type Size
0001-Crash-dump-support-for-win32.patch text/plain 0 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-12-06 05:54:24 Re: libpq changes for synchronous replication
Previous Message David E. Wheeler 2010-12-06 04:52:30 Re: Review: Extensions Patch