Proposed Windows-specific change: Enable crash dumps (like core files)

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-10-04 11:50:23
Message-ID: 4CA9BF7F.5060104@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all

After this recent "fun" trying to get a usable crash dump or stack trace
from a crashing autovacuum worker on Windows, I'd like to make a quick
proposal - one that'll be followed by a patch if there aren't any loud
NACKs.

Windows has a couple of built-in facilities to automatically generate
crash dumps, much like UNIX core files. None of them (JIT debugger
included) work well with PostgreSQL's process-based archiectecture and
use of a private service account. The autovacuum daemon's
launcher/worker split is especially tricky, because a problem worker
process may not live long before crashing, as in the case I'm currently
bashing my head against.

Some other mechanism is needed. Dbghelp.dll, a redistributible DLL from
the platform SDK, provides that mechanism. You can set a handler that's
invoked if the process encounters an unhandled exception (which in
Windows-land includes anything that'd be a fatal signal in UNIX, as well
as "real" C++ exceptions). This handler uses LoadLibrary to load
dbghelp.dll and generate a crash dump (core file) that can be debugged
with Visual Studio, windbg, etc on the generating machine or any other
with the right PostgreSQL binaries and debug symbols.

It's kind of like loading gdb in-process and writing out a core, with
the same obvious problem that it won't help you with problems caused by
running totally out of memory or certain types of memory corruption.
It's protected against recursive calls by the OS, though, so if your
handler fails too there's no real harm in it.

The big advantage is that, like when dealing with a core file on *nix,
people can email/ftp/whatever crash dump files for analsys anywhere else
that someone has the same binaries and debug symbols. That'd make it it
awfully useful even in cases where it *is* easy to predict the crash and
attach a debugger.

Because of the potential disk space use of crash dumps I think it'd be
undesirable to have it always enabled, so here's what I propose:

- Always compile the crash dump feature in builds, so every build that
goes to a user is capable of crash dumps. Because a suitable version
of dbghelp.dll has been included in Windows XP and above, it shouldn't
be necessary to ship the DLL with Pg, though the license permits that.

- Use a GUC option to control whether dumps are generated on a crash.
It'd be a string value, with permitted values "off", "normal" or
"withdata". (These match the "MiniDumpNormal" and
"MiniDumpWithDataSegs" calls in dbghelp.dll). The default is "off".

If "off" is set then no unhandled exception handler will be
registered and no crash dump will be generated. The crash dump code
has no effect beyond checking the GUC and seeing that it's set to
"off". With either of the other values a handler is registered; which
is chosen controls whether a minimal or full crash dump will be
generated.

It may even turn out to be easy to make this togglable at runtime
(superuser only) at least with a conf edit and reload, or
possibly even per-session. I'm unsure of that as yet, though.

- Hard-code the dump file output location, so there's no need to
try to read a GUC from within the crash handler. If people
want to change the dump file location, they can symlink/reparse/
whatever $DATADIR\crashdumps to their preferred location.

- If the crash dump handler is enabled by setting the GUC,
all backends register the handler during startup or (if it
proves practical) when the GUC is changed.

- When the handler is triggered by the OS trapping an unhandled
exception, it loads dbghelp.dll, writes the appropriate dump
format to the hardcoded path, and terminates the process.

Comments? Thoughts?

Would a patch along these lines have a chance of acceptance?

(BCC'd Dave Page because of his involvement in Windows & Windows Pg
support).

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2010-10-04 12:06:26 Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Previous Message Erik Rijkers 2010-10-04 11:46:23 Re: [HACKERS] top-level DML under CTEs