Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)

Lists: pgsql-hackers
From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-10-04 11:50:23
Message-ID: 4CA9BF7F.5060104@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all

After this recent "fun" trying to get a usable crash dump or stack trace
from a crashing autovacuum worker on Windows, I'd like to make a quick
proposal - one that'll be followed by a patch if there aren't any loud
NACKs.

Windows has a couple of built-in facilities to automatically generate
crash dumps, much like UNIX core files. None of them (JIT debugger
included) work well with PostgreSQL's process-based archiectecture and
use of a private service account. The autovacuum daemon's
launcher/worker split is especially tricky, because a problem worker
process may not live long before crashing, as in the case I'm currently
bashing my head against.

Some other mechanism is needed. Dbghelp.dll, a redistributible DLL from
the platform SDK, provides that mechanism. You can set a handler that's
invoked if the process encounters an unhandled exception (which in
Windows-land includes anything that'd be a fatal signal in UNIX, as well
as "real" C++ exceptions). This handler uses LoadLibrary to load
dbghelp.dll and generate a crash dump (core file) that can be debugged
with Visual Studio, windbg, etc on the generating machine or any other
with the right PostgreSQL binaries and debug symbols.

It's kind of like loading gdb in-process and writing out a core, with
the same obvious problem that it won't help you with problems caused by
running totally out of memory or certain types of memory corruption.
It's protected against recursive calls by the OS, though, so if your
handler fails too there's no real harm in it.

The big advantage is that, like when dealing with a core file on *nix,
people can email/ftp/whatever crash dump files for analsys anywhere else
that someone has the same binaries and debug symbols. That'd make it it
awfully useful even in cases where it *is* easy to predict the crash and
attach a debugger.

Because of the potential disk space use of crash dumps I think it'd be
undesirable to have it always enabled, so here's what I propose:

- Always compile the crash dump feature in builds, so every build that
goes to a user is capable of crash dumps. Because a suitable version
of dbghelp.dll has been included in Windows XP and above, it shouldn't
be necessary to ship the DLL with Pg, though the license permits that.

- Use a GUC option to control whether dumps are generated on a crash.
It'd be a string value, with permitted values "off", "normal" or
"withdata". (These match the "MiniDumpNormal" and
"MiniDumpWithDataSegs" calls in dbghelp.dll). The default is "off".

If "off" is set then no unhandled exception handler will be
registered and no crash dump will be generated. The crash dump code
has no effect beyond checking the GUC and seeing that it's set to
"off". With either of the other values a handler is registered; which
is chosen controls whether a minimal or full crash dump will be
generated.

It may even turn out to be easy to make this togglable at runtime
(superuser only) at least with a conf edit and reload, or
possibly even per-session. I'm unsure of that as yet, though.

- Hard-code the dump file output location, so there's no need to
try to read a GUC from within the crash handler. If people
want to change the dump file location, they can symlink/reparse/
whatever $DATADIR\crashdumps to their preferred location.

- If the crash dump handler is enabled by setting the GUC,
all backends register the handler during startup or (if it
proves practical) when the GUC is changed.

- When the handler is triggered by the OS trapping an unhandled
exception, it loads dbghelp.dll, writes the appropriate dump
format to the hardcoded path, and terminates the process.

Comments? Thoughts?

Would a patch along these lines have a chance of acceptance?

(BCC'd Dave Page because of his involvement in Windows & Windows Pg
support).

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-10-04 12:06:26
Message-ID: 4CA9C342.70501@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/04/2010 07:50 AM, Craig Ringer wrote:
>
> - If the crash dump handler is enabled by setting the GUC,
> all backends register the handler during startup or (if it
> proves practical) when the GUC is changed.
>
> - When the handler is triggered by the OS trapping an unhandled
> exception, it loads dbghelp.dll, writes the appropriate dump
> format to the hardcoded path, and terminates the process.
>
>

What is the performance impact of doing that? Specifically, how does it
affect backend startup time?

cheers

andrew


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-10-04 12:09:19
Message-ID: AANLkTimY_dQqOriYnDEL856k7kUbyn29d4_SCe+qbofy@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 4, 2010 at 12:50 PM, Craig Ringer
<craig(at)postnewspapers(dot)com(dot)au> wrote:

> - Always compile the crash dump feature in builds, so every build that
>  goes to a user is capable of crash dumps. Because a suitable version
>  of dbghelp.dll has been included in Windows XP and above, it shouldn't
>  be necessary to ship the DLL with Pg, though the license permits that.

We probably would want to - the version that comes with Windows is
somewhat cut down, and MSDN recommends using the more recent versions
from the debugging tools.

Any idea of the performance overhead?

> (BCC'd Dave Page because of his involvement in Windows & Windows Pg
> support).

That explains why it bypassed my filters :-)

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-10-04 12:09:50
Message-ID: 4CA9C40E.5070205@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/10/2010 8:06 PM, Andrew Dunstan wrote:
>
>
> On 10/04/2010 07:50 AM, Craig Ringer wrote:
>>
>> - If the crash dump handler is enabled by setting the GUC,
>> all backends register the handler during startup or (if it
>> proves practical) when the GUC is changed.
>>
>> - When the handler is triggered by the OS trapping an unhandled
>> exception, it loads dbghelp.dll, writes the appropriate dump
>> format to the hardcoded path, and terminates the process.
>>
>>
>
> What is the performance impact of doing that? Specifically, how does it
> affect backend startup time?

Without testing I can't say for sure.

My expection based on how the handler works would be: near-zero, about
as expensive as registering a signal handler, plus the cost of reading
the GUC and doing one string compare to test the value. When disabled,
it's just the GUC test.

Is there a better mechanism to use for features that're going to be
unused the great majority of the time? Perhaps something that does
require a server restart, but doesn't have any cost at all when disabled?

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-10-04 12:21:11
Message-ID: 4CA9C6B7.3060901@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/10/2010 8:09 PM, Dave Page wrote:

> Any idea of the performance overhead?

A GUC read and string compare when off, and (untested as yet) I expect
little more when on.

Thinking about it, a possibly better alternative is to ship it as a
library as is done with the pl/pgsql debugger, and use (I think)
shared_preload_libraries to load it when desired. That way there's zero
cost if disabled, though a somewhat higher cost if enabled.

Hmm. That'll let me put a test version together that'll work with 9.0 as
well, so that seems like a no-brainer really, it's just a better way to
do it. Time for a Pg-on-windows build, yay.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-10-04 12:27:33
Message-ID: 4CA9C835.6030500@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04.10.2010 15:21, Craig Ringer wrote:
> Thinking about it, a possibly better alternative is to ship it as a
> library as is done with the pl/pgsql debugger, and use (I think)
> shared_preload_libraries to load it when desired. That way there's zero
> cost if disabled, though a somewhat higher cost if enabled.
>
> Hmm. That'll let me put a test version together that'll work with 9.0 as
> well, so that seems like a no-brainer really, it's just a better way to
> do it. Time for a Pg-on-windows build, yay.

If it's not a lot of code, it's better to have it built-in always.
Loading extra libraries in debug-mode could lead to heisenbugs.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-10-04 12:32:13
Message-ID: 4CA9C94D.1000801@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/10/2010 8:27 PM, Heikki Linnakangas wrote:
> On 04.10.2010 15:21, Craig Ringer wrote:
>> Thinking about it, a possibly better alternative is to ship it as a
>> library as is done with the pl/pgsql debugger, and use (I think)
>> shared_preload_libraries to load it when desired. That way there's zero
>> cost if disabled, though a somewhat higher cost if enabled.
>>
>> Hmm. That'll let me put a test version together that'll work with 9.0 as
>> well, so that seems like a no-brainer really, it's just a better way to
>> do it. Time for a Pg-on-windows build, yay.
>
> If it's not a lot of code, it's better to have it built-in always.
> Loading extra libraries in debug-mode could lead to heisenbugs.

Good point. The built-in approach would also permit it to be turned on
in an already-running server, which would be nice as for those fun
only-shows-up-in-production bugs where you may not want to restart Pg.

I'll chuck together a library version first, though, so it can be tested
easily on existing installs of Pg 9.0. Compiling Pg on Windows isn't as
quick and easy as on *nix so that should help for testing/consideration.
If people are happy with the results I'll put together a patch to
integrate it directly instead of using a library.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-10-05 13:39:38
Message-ID: 4CAB2A9A.4060509@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/10/2010 8:27 PM, Heikki Linnakangas wrote:
> On 04.10.2010 15:21, Craig Ringer wrote:
>> Thinking about it, a possibly better alternative is to ship it as a
>> library as is done with the pl/pgsql debugger, and use (I think)
>> shared_preload_libraries to load it when desired. That way there's zero
>> cost if disabled, though a somewhat higher cost if enabled.
>>
>> Hmm. That'll let me put a test version together that'll work with 9.0 as
>> well, so that seems like a no-brainer really, it's just a better way to
>> do it. Time for a Pg-on-windows build, yay.
>
> If it's not a lot of code, it's better to have it built-in always.
> Loading extra libraries in debug-mode could lead to heisenbugs.

It turns out that the basic minidumps are small (21kb here) and very
fast to generate. It may well be worth leaving it enabled all the time
after all. I just need to time how long the handler registration takes -
though as LOAD of the current handler implemented as a contrib module
takes 2.1ms, and LOAD of an module with an empty _PG_init() also takes
2.1ms, I'm guessing "not long".

I've attached an early version for people to play with if anyone's
interested. It currently contains a bunch of elog() messages to report
on its progress - though I'm not at all sure elog() should be used in
the final version given that Pg is crashing at the time the handler is
called and there's no guarantee elog() is safe to call. It also doesn't
offer any way to set the dump type yet, it's always the minimal dump
with exception info, registers and stack only. Finally, a "crashme"
function is included to trigger a reliable unhandled exception on
demand, for initial testing.

Usage with Pg built from source on Windows:

- Drop crashdump.c and the Makefile in contrib/crashdump
- Run build.bat and install.bat
- Create a "crashdumps" directory inside your datadir, and make
sure the server has read/write/create permission on it.
- add 'crashdump' to shared_preload_libraries in postgresql.conf
- Start the server as normal
- Start psql and issue:
-- CREATE OR REPLACE FUNCTION crashdump_crashme() RETURNS void
AS '$libdir/crashdump','crashdump_crashme' LANGUAGE C;
-- SELECT crashdump_crashme();

Your backend should crash. In the error log (and, depending on how the
buffering works out, maybe in psql) you should see:

WARNING: loading dll
WARNING: loading dll try 1, 00000000
WARNING: loading dll try 2, 73880000
WARNING: pdump: 738C70D8
WARNING: Generating dumppath
WARNING: Generated dumppath
WARNING: dumping to path: crashdumps\backend.dmp
WARNING: outfile: 000000B0
WARNING: about to dump
NOTICE: crashdump: wrote crash dump to crashdumps\postgres-4341.mdmp

Once you have the dump file, you can fire up windbg from the Debugging
Tools for Windows and use File->Open Crash Dump to open it. The .excr
command will report what the exception that caused the crash was,
producing output like this:

> 0:000> .ecxr
> eax=00000000 ebx=00000000 ecx=02a1e290 edx=73831014 esi=02a1e188 edi=00c3f798
> eip=738313b2 esp=00c3f724 ebp=02a1e280 iopl=0 nv up ei pl zr na pe nc
> cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
> crashdump!crashdump_crashme+0x2:
> 738313b2 c70001000000 mov dword ptr [eax],1 ds:0023:00000000=????????

and if possible opening the postgresql source file the crash happened in
to the appropriate line:

Datum
crashdump_crashme(PG_FUNCTION_ARGS)
{
int * ptr = NULL;
*ptr = 1; <--- highlighted
return NULL;
}

If you're using Visual Studio Professional (not the free Express
edition, unfortunately) you should also be able to debug the crash dump
that way. I don't have it to test with.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Attachment Content-Type Size
crashdump.c text/plain 5.2 KB
Makefile text/plain 284 bytes

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-10-05 15:21:56
Message-ID: 4CAB4294.2070104@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

OK, it's pretty much ready for proper testing now. If a few people are
happy with the results and think it's a good idea I'll chuck it in the
commitfest app.

As built, the crash dump handler works with a stock PostgreSQL 9.0
(release build) as shipped in EDB's installer. Just drop crashdump.dll
in lib\, optionally pop the dbghelp.dll redist in bin\, add 'crashdump'
to shared_preload_libraries, and crash some backends however you feel
like doing so.

The current build of crashdump.dll for release versions of PostgreSQL
9.0 on 32-bit Windows is here:

http://www.postnewspapers.com.au/~craig/webfiles/crashdump.dll

If folks are happy with how this works, all it needs is:

- Consideration of whether elog should be used or not. I'm inclined to
suggest using elog to tell the user about the dump, but only after
the crash dump has been written successfully.

- Comments on whether this should be left as a loadable module, or
integerated into core so it loads early in backend startup. The latter
will permit crash dumping of early backend startup problems, and will
have tiny overhead because there's no DLL to find and load. OTOH, it's
harder to pull out if somehow something breaks.

I'd want to start with loadable module in shared_preload_libraries
and if that proves useful, only then consider integrating in core.
I'm way too bad a programmer to want my code anywhere near Pg's core
without plenty of real world testing.

- (Maybe) a method to configure the crash dump type. I'm not too sure
it's necessary given the set of dump flags I've landed up with,
so I'd leave this be unless it proves to be necessary in real-world
testing.

Remember that with these crash dumps the user only has to email the
(~4MB in my tests) crash dump. They don't need debugging tools or the
ability to use them, only the recipient does.

I'm using a tweaked set of minidump flags now, to generate considerably
bigger dumps (around 4MB for the configuration I'm testing) that contain
pretty much everything except shared memory contents, the contents of
memory mapped files, and the loaded read-only code segments of the
executables and DLLs. You can see the results of using it to debug that
autovac crash at the end of this mail.

When testing the script provided by Andrea Peri, when the autovacuum
worker crashes, the message:

> 2010-10-05 22:23:44 WST 2040 WARNING: crashdump: wrote crash dump to crashdumps\postgres-2040.mdmp

is emitted before the process resumes crashing out as it would've normally.

Opening and displaying that dump in windbg shows a useful stack trace
from the crashing process, and it's possible to examine the state of
local/global variables etc.

> 0:000> .ecxr
> eax=90fffffe ebx=040af140 ecx=68f08610 edx=040af248 esi=011f64d4 edi=040b001c
> eip=015d5267 esp=0055f1cc ebp=011f5bc8 iopl=0 nv up ei pl zr na pe nc
> cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
> postgres!pfree+0x7:
> 015d5267 8b5004 mov edx,dword ptr [eax+4] ds:0023:91000002=????????
> 0:000> ~*k
>
> . 0 Id: 7f8.7b8 Suspend: 0 Teb: 7ffdf000 Unfrozen
> ChildEBP RetAddr
> 0055e210 75b4c1ee ntdll!KiFastSystemCallRet
> 0055e250 76e7100e KERNELBASE!FindClose+0x93
> 0055e514 679160c3 kernel32!_SEH_epilog4_GS+0xa
> 0055e6cc 779734e0 dbghelp!Win32LiveSystemProvider::OpenMapping+0x2c3
> 0055e7a8 7797353a ntdll!RtlpAllocateHeap+0xab2
> 0055e828 77965edc ntdll!RtlAllocateHeap+0x23a
> 0055e840 76e7123a ntdll!ZwWriteFile+0xc
> 0055e874 67916861 kernel32!WriteFileImplementation+0x76
> 0055e8ac 67916910 dbghelp!Win32LiveSystemProvider::ReadVirtual+0x71
> 0055e8d8 67908f09 dbghelp!Win32LiveSystemProvider::ReadAllVirtual+0x30
> 0055e910 679094b4 dbghelp!WriteMemoryFromProcess+0xa9
> 0055e9a8 6790d522 dbghelp!WriteThreadList+0x184
> 0055e9c0 6790e65a dbghelp!WriteDumpData+0x102
> 0055eb58 6790e9cb dbghelp!MiniDumpProvideDump+0x3fa
> 0055ebd0 73231353 dbghelp!MiniDumpWriteDump+0x1cb
> 0055ed14 76e82c4a crashdump!crashDumpHandler+0x183 [c:\users\craig\developer\postgresql-9.0.0\contrib\crashdump\crashdump.c @ 159]
> 0055ed9c 77995af4 kernel32!UnhandledExceptionFilter+0x127
> 0055eda4 7793d964 ntdll!__RtlUserThreadStart+0x62
> 0055edb8 7793d7fc ntdll!_EH4_CallFilterFunc+0x12
> 0055ede0 779665f9 ntdll!_except_handler4+0x8e
> 0055ee04 779665cb ntdll!ExecuteHandler2+0x26
> 0055eeb4 77966457 ntdll!ExecuteHandler+0x24
> 0055eeb4 015d5267 ntdll!KiUserExceptionDispatcher+0xf
> 0055f1c8 0139eee7 postgres!pfree+0x7 [c:\pginstaller-repo\postgres.windows\src\backend\utils\mmgr\mcxt.c @ 591]
> 0055f1e0 0139f14a postgres!examine_attribute+0x207 [c:\pginstaller-repo\postgres.windows\src\backend\commands\analyze.c @ 877]
> 0055f284 0139f94c postgres!do_analyze_rel+0x24a [c:\pginstaller-repo\postgres.windows\src\backend\commands\analyze.c @ 357]
> 0055f2ac 013eb74a postgres!analyze_rel+0x1bc [c:\pginstaller-repo\postgres.windows\src\backend\commands\analyze.c @ 232]
> 0055f330 014b30c6 postgres!vacuum+0x1ea [c:\pginstaller-repo\postgres.windows\src\backend\commands\vacuum.c @ 248]
> 0055f368 014b3991 postgres!autovacuum_do_vac_analyze+0x86 [c:\pginstaller-repo\postgres.windows\src\backend\postmaster\autovacuum.c @ 2651]
> 0055f4f4 014b41c5 postgres!do_autovacuum+0x811 [c:\pginstaller-repo\postgres.windows\src\backend\postmaster\autovacuum.c @ 2231]
> 0055f588 014bed02 postgres!AutoVacWorkerMain+0x265 [c:\pginstaller-repo\postgres.windows\src\backend\postmaster\autovacuum.c @ 1611]
> 0055f7d8 01423ce8 postgres!SubPostmasterMain+0x442 [c:\pginstaller-repo\postgres.windows\src\backend\postmaster\postmaster.c @ 4093]
> 0055f7f0 015ee1ad postgres!main+0x1f8 [c:\pginstaller-repo\postgres.windows\src\backend\main\main.c @ 165]
> 0055f834 76e71194 postgres!__tmainCRTStartup+0x10f [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586]
> 0055f840 7797b495 kernel32!BaseThreadInitThunk+0xe
> 0055f880 7797b468 ntdll!__RtlUserThreadStart+0x70
> 0055f898 00000000 ntdll!_RtlUserThreadStart+0x1b

Attachment Content-Type Size
crashdump.c text/plain 6.4 KB
Makefile text/plain 284 bytes

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-10-20 22:15:08
Message-ID: 201010202215.o9KMF8Y07233@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer wrote:
> On 4/10/2010 8:06 PM, Andrew Dunstan wrote:
> >
> >
> > On 10/04/2010 07:50 AM, Craig Ringer wrote:
> >>
> >> - If the crash dump handler is enabled by setting the GUC,
> >> all backends register the handler during startup or (if it
> >> proves practical) when the GUC is changed.
> >>
> >> - When the handler is triggered by the OS trapping an unhandled
> >> exception, it loads dbghelp.dll, writes the appropriate dump
> >> format to the hardcoded path, and terminates the process.
> >>
> >>
> >
> > What is the performance impact of doing that? Specifically, how does it
> > affect backend startup time?
>
> Without testing I can't say for sure.
>
> My expection based on how the handler works would be: near-zero, about
> as expensive as registering a signal handler, plus the cost of reading
> the GUC and doing one string compare to test the value. When disabled,
> it's just the GUC test.
>
> Is there a better mechanism to use for features that're going to be
> unused the great majority of the time? Perhaps something that does
> require a server restart, but doesn't have any cost at all when disabled?

We definately had trouble producing crash dumps caused by the 128 return
code problem, so I can see great value in this, if it can be done
simply. I wonder if the 128-exit would have produced a crash file.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 11:37:39
Message-ID: AANLkTikUvW2dKWZhd28CVKf1NJUF7C4X968_Ari7kDMm@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 5, 2010 at 17:21, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
> OK, it's pretty much ready for proper testing now. If a few people are happy
> with the results and think it's a good idea I'll chuck it in the commitfest
> app.
>
> As built, the crash dump handler works with a stock PostgreSQL 9.0 (release
> build) as shipped in EDB's installer. Just drop crashdump.dll in lib\,
> optionally pop the dbghelp.dll redist in bin\, add 'crashdump' to
> shared_preload_libraries, and crash some backends however you feel like
> doing so.
>
> The current build of crashdump.dll for release versions of PostgreSQL 9.0 on
> 32-bit Windows is here:
>
>  http://www.postnewspapers.com.au/~craig/webfiles/crashdump.dll
>
> If folks are happy with how this works, all it needs is:
>
> - Consideration of whether elog should be used or not. I'm inclined to
>  suggest using elog to tell the user about the dump, but only after
>  the crash dump has been written successfully.
>
> - Comments on whether this should be left as a loadable module, or
>  integerated into core so it loads early in backend startup. The latter
>  will permit crash dumping of early backend startup problems, and will
>  have tiny overhead because there's no DLL to find and load. OTOH, it's
>  harder to pull out if somehow something breaks.
>
>  I'd want to start with loadable module in shared_preload_libraries
>  and if that proves useful, only then consider integrating in core.
>  I'm way too bad a programmer to want my code anywhere near Pg's core
>  without plenty of real world testing.
>
> - (Maybe) a method to configure the crash dump type. I'm not too sure
>  it's necessary given the set of dump flags I've landed up with,
>  so I'd leave this be unless it proves to be necessary in real-world
>  testing.
>
Finally getting to looking at this one - sorry about the very long delay.

I agree with Heikki's earlier comment that it's better to have this
included in the backend - but that's obviously not going to happen for
already-released versions. I'd therefor advocate a contrib module for
existing versions, and then in core for 9.1+.

We should then have an option to turn it off (on by default). But we
don't need to pay the overhead on every backend startup - we could
just map the value into the parameter shared memory block, and require
a full postmaster restart in order to change it.

Do we want to backpatch it into contrib/? Adding a new module there
seems kind of wrong - probably better to keep the source separate and
just publish the DLL files for people who do debugging?

Looking at the code:
* Why do we need to look at differnt versions of dbghelp.dll? Can't we
always use the one with Windows? And in that case, can't we just use
compile-time linking, do we need to bother with DLL loading at all?

* Per your comment about using elog() - a good idea in that case is to
use write_stderr(). That will send the output to stderr if there is
one, and otherwise the eventlog (when running as service).

* And yes, in the crash handler, we should *definitely* not use elog().

* On Unix, the core file is dropped in the database directory, we
don't have a separate directory for crashdumps. If we want to be
consistent, we should do that here too. I do think that storing them
in a directory like "crashdumps" is better, but I just wanted to raise
the comment.

* However, when storing it in crashdumps, I think the code would need
to create that directory if it does not exist, doesn't it?

* Right now, we overwrite old crashdumps. It is well known that PIDs
are recycled pretty quickly on Windows - should we perhaps dump as
postgres-<pid>-<sequence>.mdmp when there is a filename conflict?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 14:15:17
Message-ID: AANLkTi=hbz=HXfdSR_301km=k=+b6X06EowwyZ=ytVdd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 6:37 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> Do we want to backpatch it into contrib/? Adding a new module there
> seems kind of wrong - probably better to keep the source separate and
> just publish the DLL files for people who do debugging?

If this works without changes to core, I see little reason not to
back-patch it into contrib. Our primary concern with back-patching is
to avoid doing things that might break existing installations, but if
there aren't any core changes, I don't really see how that can be an
issue here. It seems to me that it's probably simpler for us and our
users to keep the debugging tools together with our main tree.

However, I am not clear what benefit we get from moving this into core
in 9.1. If it's still going to require a full postmaster restart, the
GUC you have to change may as well be shared_preload_libraries as a
new one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 14:49:17
Message-ID: 25727.1290437357@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> However, I am not clear what benefit we get from moving this into core
> in 9.1. If it's still going to require a full postmaster restart, the
> GUC you have to change may as well be shared_preload_libraries as a
> new one.

+1. I am not in favor of randomly repackaging functionality, unless
there's some clear benefit gained by doing so. In this case it seems
like something that could and should remain at arm's length forever,
so a contrib module is the ideal packaging.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 14:53:39
Message-ID: AANLkTi=Y2b47wugJOWwpObiYaM0gvhZFu=uKjwtYN4nM@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 9:49 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> However, I am not clear what benefit we get from moving this into core
>> in 9.1.  If it's still going to require a full postmaster restart, the
>> GUC you have to change may as well be shared_preload_libraries as a
>> new one.
>
> +1.  I am not in favor of randomly repackaging functionality, unless
> there's some clear benefit gained by doing so.  In this case it seems
> like something that could and should remain at arm's length forever,
> so a contrib module is the ideal packaging.

Now, if we could make this so low-overhead that it doesn't need a
switch, that would be a good argument for moving it into core, at
least to my way of thinking. But if it's something that needs to be
enabled with a postmaster restart anyway, meh.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 15:17:16
Message-ID: 1290439036.9606.1.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote:
> Do we want to backpatch it into contrib/?

It's not a bug fix or an upgrading aid, so no.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 15:33:00
Message-ID: 26722.1290439980@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> On mn, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote:
>> Do we want to backpatch it into contrib/?

> It's not a bug fix or an upgrading aid, so no.

I'm less than thrilled about back-patching this, too. It seems to fly
in the face of all our historical practice.

If you drop the bit about back-patching, then I don't particularly care
whether it goes into core or contrib for 9.1 --- whichever packaging
makes the most sense is fine.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 15:43:50
Message-ID: AANLkTi=DGBEP8O2=v0RUOE2Rvmk6+rNZTmRDrD6CRbPB@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 16:33, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
>> On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote:
>>> Do we want to backpatch it into contrib/?
>
>> It's not a bug fix or an upgrading aid, so no.
>
> I'm less than thrilled about back-patching this, too.  It seems to fly
> in the face of all our historical practice.
>
> If you drop the bit about back-patching, then I don't particularly care
> whether it goes into core or contrib for 9.1 --- whichever packaging
> makes the most sense is fine.

My suggestion in the first place was not to backpatch it - I just
wanted to get peoples opinions. I'm perfectly happy if we keep it
somewhere else for the time being - as long as we make the binaries
easily available. But that can go on the wiki for example.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 15:47:55
Message-ID: AANLkTikWN2v7haaL-ZsiMsNhDQp2hv0N14ODjVp5_Txe@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 15:15, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Nov 22, 2010 at 6:37 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> Do we want to backpatch it into contrib/? Adding a new module there
>> seems kind of wrong - probably better to keep the source separate and
>> just publish the DLL files for people who do debugging?
>
> If this works without changes to core, I see little reason not to
> back-patch it into contrib.  Our primary concern with back-patching is
> to avoid doing things that might break existing installations, but if
> there aren't any core changes, I don't really see how that can be an
> issue here.  It seems to me that it's probably simpler for us and our
> users to keep the debugging tools together with our main tree.
>
> However, I am not clear what benefit we get from moving this into core
> in 9.1.  If it's still going to require a full postmaster restart, the
> GUC you have to change may as well be shared_preload_libraries as a
> new one.

on-by-default is what we gain. I think that's fairly big...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 16:38:03
Message-ID: 28224.1290443883@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> on-by-default is what we gain. I think that's fairly big...

Only if that's actually what we want, which is far from clear in this
corner. There are good reasons why most Linux distros configure daemons
not to dump core by default. It's annoying when we are trying to debug
a Postgres crash, but that doesn't mean the reasons aren't good.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 17:05:58
Message-ID: AANLkTimqhSjeY=FyoLbLFjb3LwwxRnsN-poc_AX1-B+J@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 10:17 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On mån, 2010-11-22 at 12:37 +0100, Magnus Hagander wrote:
>> Do we want to backpatch it into contrib/?
>
> It's not a bug fix or an upgrading aid, so no.

I don't see why an upgrading aid would be worthy of back-patching, but
not a debugging aid. I'd certainly prioritize those in the other
order.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 17:30:20
Message-ID: 29474.1290447020@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I don't see why an upgrading aid would be worthy of back-patching, but
> not a debugging aid. I'd certainly prioritize those in the other
> order.

I think the sort of upgrading aid Peter has in mind is the kind where
it's entirely useless if it's not back-patched, because it has to run in
the pre-upgraded server. We've discussed such things before in the
context of in-place upgrade, though I believe there have been no actual
instances as yet.

I'm not really sure why we're even considering the notion of
back-patching this item. Doing so would not fit with any past practice
or agreed-on project management practices, not even under our lax
standards for contrib (and I keep hearing people claim that contrib
is or should be as trustworthy as core, anyway). Since when do we
back-patch significant features that have not been through a beta test
cycle?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 17:46:42
Message-ID: 29808.1290448002@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> * On Unix, the core file is dropped in the database directory, we
> don't have a separate directory for crashdumps. If we want to be
> consistent, we should do that here too. I do think that storing them
> in a directory like "crashdumps" is better, but I just wanted to raise
> the comment.

Just a note on that - it's by no means universal that Unix systems will
put the core files in $PGDATA. OS X likes to put them in /cores, which
I think is a convention shared with some other BSDish systems. On Linux
I believe it's possible to configure where the core goes via environment
settings.

> * However, when storing it in crashdumps, I think the code would need
> to create that directory if it does not exist, doesn't it?

If it didn't do so, then manual creation/removal of the directory could
be used as an on/off switch for the feature. Which would have a number
of advantages, not least that you don't need to have the crash dumper
dependent on GUC working. I haven't looked at the patch but this
discussion makes it sound like the dumper is dependent on an
uncomfortably large amount of backend code being functional. You need
to minimize the number of assumptions of that sort.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 17:47:56
Message-ID: AANLkTi=cfwomryq0H_c7hoefXLn8Rp6nv6Er_X4i7s3u@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 12:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I don't see why an upgrading aid would be worthy of back-patching, but
>> not a debugging aid.  I'd certainly prioritize those in the other
>> order.
>
> I think the sort of upgrading aid Peter has in mind is the kind where
> it's entirely useless if it's not back-patched, because it has to run in
> the pre-upgraded server.  We've discussed such things before in the
> context of in-place upgrade, though I believe there have been no actual
> instances as yet.
>
> I'm not really sure why we're even considering the notion of
> back-patching this item.  Doing so would not fit with any past practice
> or agreed-on project management practices, not even under our lax
> standards for contrib (and I keep hearing people claim that contrib
> is or should be as trustworthy as core, anyway).  Since when do we
> back-patch significant features that have not been through a beta test
> cycle?

I am as conservative about back-patching as anybody here, but
debugging on Windows is not an easy thing to do, and I strongly
suspect we are going to point people experiencing crashes on Windows
to this code whether it's part of our official distribution or not. I
don't see what we get out of insisting that people install it
separately. This is a tool that is only intended to be used when
PostgreSQL is CRASHING, so arguing that we shouldn't include the code
because it might not be stable doesn't carry much water AFAICS. As
far as I understand it, we don't back-patch new features because of
the risk of breaking things, but in this case refusing to back-patch
the code seems more likely to prevent adequate diagnosis of what is
already broken.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 17:54:41
Message-ID: 29987.1290448481@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I am as conservative about back-patching as anybody here, but
> debugging on Windows is not an easy thing to do, and I strongly
> suspect we are going to point people experiencing crashes on Windows
> to this code whether it's part of our official distribution or not. I
> don't see what we get out of insisting that people install it
> separately. This is a tool that is only intended to be used when
> PostgreSQL is CRASHING, so arguing that we shouldn't include the code
> because it might not be stable doesn't carry much water AFAICS. As
> far as I understand it, we don't back-patch new features because of
> the risk of breaking things, but in this case refusing to back-patch
> the code seems more likely to prevent adequate diagnosis of what is
> already broken.

Well, if we're going to hand out prebuilt DLLs to people, we can do that
without having back-patched the code officially. But more to the point
is that it's not clear that we're going to end up with a contrib module
at all going forward (a core feature would clearly be a lot more
reliable), and I really do not wish to get involved with maintaining two
independent versions of this code.

This argument seems to boil down to "we have to have this yesterday",
which I don't buy for a minute. If it's as critical as that, why did
it take this long for someone to write it? I do NOT agree that this
feature is important enough to justify a free pass around our normal
management and quality assurance processes.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 17:56:03
Message-ID: 4CEAAEB3.4070509@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22.11.2010 19:47, Robert Haas wrote:
> I am as conservative about back-patching as anybody here, but
> debugging on Windows is not an easy thing to do, and I strongly
> suspect we are going to point people experiencing crashes on Windows
> to this code whether it's part of our official distribution or not.

This whole thing makes me wonder: is there truly no reliable, simple
method to tell Windows to create a core dump on crash, without writing
custom code for it. I haven't seen one, but I find it amazing if there
isn't. We can't be alone with this.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 18:26:34
Message-ID: AANLkTik1rTxoMGEkZCuKgkThqRGiBsSmi9sekdL=H3XG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 18:46, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> * However, when storing it in crashdumps, I think the code would need
>> to create that directory if it does not exist, doesn't it?
>
> If it didn't do so, then manual creation/removal of the directory could
> be used as an on/off switch for the feature.  Which would have a number
> of advantages, not least that you don't need to have the crash dumper
> dependent on GUC working.  I haven't looked at the patch but this
> discussion makes it sound like the dumper is dependent on an
> uncomfortably large amount of backend code being functional.  You need
> to minimize the number of assumptions of that sort.

No, it's dependent on close to zero backend functionality.
Particularly if we take out the dependency on elog() (write_stderr is
much simpler). In fact, the calls to elog() are the *only* places
where it calls into the backend as it stands today.

And yes, ISTM it could work reasonably well to use the
creation/deletion of the directory as an on/off switch for it. Which
is the default is of course up to the packager then as well ;)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 18:28:40
Message-ID: AANLkTimHrHMwkdMjEYyacRdV6YY=ktEaxEO=S-0gCsGk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 18:54, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I am as conservative about back-patching as anybody here, but
>> debugging on Windows is not an easy thing to do, and I strongly
>> suspect we are going to point people experiencing crashes on Windows
>> to this code whether it's part of our official distribution or not.  I
>> don't see what we get out of insisting that people install it
>> separately.  This is a tool that is only intended to be used when
>> PostgreSQL is CRASHING, so arguing that we shouldn't include the code
>> because it might not be stable doesn't carry much water AFAICS.  As
>> far as I understand it, we don't back-patch new features because of
>> the risk of breaking things, but in this case refusing to back-patch
>> the code seems more likely to prevent adequate diagnosis of what is
>> already broken.
>
> Well, if we're going to hand out prebuilt DLLs to people, we can do that
> without having back-patched the code officially.  But more to the point
> is that it's not clear that we're going to end up with a contrib module
> at all going forward (a core feature would clearly be a lot more
> reliable), and I really do not wish to get involved with maintaining two
> independent versions of this code.

I think the reasonable options are either "don't backpatch at all" or
"backpatch the same way as we put it in HEAD, which is probably
included in backend". I agree that sticking it in contrib is a
half-measure that we shouldn't do.

*IF* we go with a contrib module for 9.1 as well, we could backpatch
as contrib module, but I think that's the only case.

> This argument seems to boil down to "we have to have this yesterday",
> which I don't buy for a minute.  If it's as critical as that, why did
> it take this long for someone to write it?  I do NOT agree that this
> feature is important enough to justify a free pass around our normal
> management and quality assurance processes.

Agreed.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 18:33:33
Message-ID: AANLkTin8ArAOevOuQDqxhqf5Re0XNiBiC=S907SiHrYL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 18:56, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 22.11.2010 19:47, Robert Haas wrote:
>>
>> I am as conservative about back-patching as anybody here, but
>> debugging on Windows is not an easy thing to do, and I strongly
>> suspect we are going to point people experiencing crashes on Windows
>> to this code whether it's part of our official distribution or not.
>
> This whole thing makes me wonder: is there truly no reliable, simple method
> to tell Windows to create a core dump on crash, without writing custom code
> for it. I haven't seen one, but I find it amazing if there isn't. We can't
> be alone with this.

You can do it without custom code but it's quite hard. And AFAIK you
need to install the debugger tools package from microsoft (separate
download, and not something you'll normally find on a system).

There is DrWatson and the Error Reporting service you can use.
DrWatson is made for interactive programs (or was the last time I
looked at it). The error reporting service requires setting up a local
error reporting service and configure it for reports, if you want them
sent anywhere other than to Microsoft. Neither one of them is a
particularly good choie. The minidumps Craig's patch does are a lot
more useful.

We intentionally *disable* drwatson, because it's part of what pops up
an error message you have to click Ok on before your server will
continue running...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 18:39:46
Message-ID: 10775.1290451186@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Mon, Nov 22, 2010 at 18:46, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... I haven't looked at the patch but this
>> discussion makes it sound like the dumper is dependent on an
>> uncomfortably large amount of backend code being functional.

> No, it's dependent on close to zero backend functionality.
> Particularly if we take out the dependency on elog() (write_stderr is
> much simpler). In fact, the calls to elog() are the *only* places
> where it calls into the backend as it stands today.

Well, in the contrib-module guise, it's dependent on
shared_preload_libraries or local_preload_libraries, which at least
involves guc and dfmgr working pretty well (and not only in the
postmaster, but during backend startup).

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 18:50:02
Message-ID: AANLkTim_XnmVY8kGU96p3nzWcF_4OLDTv5U0-SBOOyJy@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 19:39, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Mon, Nov 22, 2010 at 18:46, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> ... I haven't looked at the patch but this
>>> discussion makes it sound like the dumper is dependent on an
>>> uncomfortably large amount of backend code being functional.
>
>> No, it's dependent on close to zero backend functionality.
>> Particularly if we take out the dependency on elog() (write_stderr is
>> much simpler). In fact, the calls to elog() are the *only* places
>> where it calls into the backend as it stands today.
>
> Well, in the contrib-module guise, it's dependent on
> shared_preload_libraries or local_preload_libraries, which at least
> involves guc and dfmgr working pretty well (and not only in the
> postmaster, but during backend startup).

Yes, sorry. My mindset was in "the version that'll go into 9.1".

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 19:24:03
Message-ID: 1290453843.471.1.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2010-11-22 at 19:56 +0200, Heikki Linnakangas wrote:
> This whole thing makes me wonder: is there truly no reliable, simple
> method to tell Windows to create a core dump on crash, without writing
> custom code for it. I haven't seen one, but I find it amazing if there
> isn't. We can't be alone with this.

Well, there is no reliable and simple method to rename a file on
Windows, so what can you expect ... ;-)


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-22 19:27:43
Message-ID: AANLkTincyEtxP3=qmyoEusrH-SFsMg3j62XEdZskyWpe@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 1:28 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> I think the reasonable options are either "don't backpatch at all" or
> "backpatch the same way as we put it in HEAD, which is probably
> included in backend". I agree that sticking it in contrib is a
> half-measure that we shouldn't do.
>
> *IF* we go with a contrib module for 9.1 as well, we could backpatch
> as contrib module, but I think that's the only case.

I agree with this, certainly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-23 14:02:01
Message-ID: 4CEBC959.9080904@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> However, I am not clear what benefit we get from moving this into core
>> in 9.1. If it's still going to require a full postmaster restart, the
>> GUC you have to change may as well be shared_preload_libraries as a
>> new one.

There's no reason it should require a postmaster restart. New backends
spawned after the handler is turned on would enable it, and existing
backends could be signalled to enable it as well.

> on-by-default is what we gain. I think that's fairly big...

More than that. If a crash handler is in core, then:

- It can be inited much earlier, catching crashes that happen during
loading of libraries and during later backend init; and

- It's basically free when the cost of shared library loading is
removed, so it can be left on in production or even be on by default. I
need to do some testing on this, but given the apparently near-zero cost
of initing the handler I won't be surprised if testing a GUC to see if
the handler should be on or not costs more than loading it does.

I still wouldn't support on-by-default because you'd need an automatic
process to weed out old crash dumps or limit the number stored. That's a
bigger job. So I think the admin should have to turn it on, but it'd be
good to make it easy to turn on in production without a restart; I don't
see why that'd be hard.

I'll try to put together a patch that does just that, time permitting.
Things are kind of hectic ATM.

--
Craig Ringer


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-23 14:05:01
Message-ID: 4CEBCA0D.6030203@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/23/2010 01:30 AM, Tom Lane wrote:

> I'm not really sure why we're even considering the notion of
> back-patching this item. Doing so would not fit with any past practice
> or agreed-on project management practices, not even under our lax
> standards for contrib (and I keep hearing people claim that contrib
> is or should be as trustworthy as core, anyway). Since when do we
> back-patch significant features that have not been through a beta test
> cycle?

I see no advantage to back-patching. It's easy to provide a drop-in
binary DLL for older versions of Pg on Windows, who're the only people
this will work for.

If the EDB folks saw value in it, they could bundle the DLL with updated
point releases of the installer for Windows. No back-patching is necessary.

--
Craig Ringer


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-23 14:09:35
Message-ID: 4CEBCB1F.1080405@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/23/2010 01:46 AM, Tom Lane wrote:

>> * However, when storing it in crashdumps, I think the code would need
>> to create that directory if it does not exist, doesn't it?
>
> If it didn't do so, then manual creation/removal of the directory could
> be used as an on/off switch for the feature.

Yep. That's how I'd want to do it in 9.1 - test for the directory and
use that to decide whether to set the handler during early backend
startup. That way you don't need a GUC, and should be able to load it
*very* early in backend startup.

> I haven't looked at the patch but this
> discussion makes it sound like the dumper is dependent on an
> uncomfortably large amount of backend code being functional. You need
> to minimize the number of assumptions of that sort.

It doesn't need to have any backend code working, really; all it needs
is a usable stack and the ability to allocate off the heap. It won't
save you in total OOM situations, stack exhaustion, or severe stack
corruption, but should work pretty much any other time.

The crash dump facility in dbghelp.dll offers *optional* features where
apps can stream in additional app-specific data like recent log
excerpts, etc. I didn't try to implement anything like that in the
initial module partly because I want to minimize the amount of the
backend that must be working for a crash dump to succeed.

--
Craig Ringer


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-23 14:12:04
Message-ID: 4CEBCBB4.7000107@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/23/2010 01:56 AM, Heikki Linnakangas wrote:
> On 22.11.2010 19:47, Robert Haas wrote:
>> I am as conservative about back-patching as anybody here, but
>> debugging on Windows is not an easy thing to do, and I strongly
>> suspect we are going to point people experiencing crashes on Windows
>> to this code whether it's part of our official distribution or not.
>
> This whole thing makes me wonder: is there truly no reliable, simple
> method to tell Windows to create a core dump on crash, without writing
> custom code for it. I haven't seen one, but I find it amazing if there
> isn't. We can't be alone with this.

Search for 'dbghelp.dll' on your average Windows system and you'll be
surprised how many apps use it. Steam (the software distribution system)
does, as does the Adobe Creative Suite on my machine.

If you're running in interactive mode with access to the user's display
you can use Windows error reporting. Services running in isolated user
accounts don't seem to be able to use that. In any case, windows error
reporting only collects *tiny* dumps with barely anything beyond the
stack contents; they're a nightmare to use, and really require psychic
powers and deep knowledge of scary Windows APIs for effective debugging.

--
Craig Ringer


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-23 21:16:37
Message-ID: AANLkTimM50fAukjW6-CEJN2ebAkf6zktfxFWB=X5mrz6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 23, 2010 at 15:02, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
>>> However, I am not clear what benefit we get from moving this into core
>>> in 9.1.  If it's still going to require a full postmaster restart, the
>>> GUC you have to change may as well be shared_preload_libraries as a
>>> new one.
>
> There's no reason it should require a postmaster restart. New backends
> spawned after the handler is turned on would enable it, and existing
> backends could be signalled to enable it as well.

I think that came off my comment that we could store the on/off in the
startup shared memory block. It'd then be the only way to get it into
any existing backends.

>> on-by-default is what we gain. I think that's fairly big...
>
> More than that. If a crash handler is in core, then:
>
> - It can be inited much earlier, catching crashes that happen during loading
> of libraries and during later backend init; and

Yeah.

> - It's basically free when the cost of shared library loading is removed, so
> it can be left on in production or even be on by default. I need to do some
> testing on this, but given the apparently near-zero cost of initing the
> handler I won't be surprised if testing a GUC to see if the handler should
> be on or not costs more than loading it does.

I'm fairly certain it does. The GUC would be there to be able to turn
the whole thing off because you don't want the dumps, not for
performance reasons.

> I still wouldn't support on-by-default because you'd need an automatic
> process to weed out old crash dumps or limit the number stored. That's a
> bigger job. So I think the admin should have to turn it on, but it'd be good
> to make it easy to turn on in production without a restart; I don't see why
> that'd be hard.

I think the admin should deal with that - just like the admin has to
clear out the old logs.

> I'll try to put together a patch that does just that, time permitting.
> Things are kind of hectic ATM.

Let me know if you want me to look at adapting the patch for that - i
can do that if you prefer.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-23 21:18:23
Message-ID: AANLkTinwQRDR-=R=xMP_vfNghO-Q4fMNzWr8BLq9ye4e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 23, 2010 at 15:09, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
> On 11/23/2010 01:46 AM, Tom Lane wrote:
>
>>> * However, when storing it in crashdumps, I think the code would need
>>> to create that directory if it does not exist, doesn't it?
>>
>> If it didn't do so, then manual creation/removal of the directory could
>> be used as an on/off switch for the feature.
>
> Yep. That's how I'd want to do it in 9.1 - test for the directory and use
> that to decide whether to set the handler during early backend startup. That
> way you don't need a GUC, and should be able to load it *very* early in
> backend startup.

Or you set the handler always, and have the handler only actually
create the dump if the directory exists. That way you can add the
directory and still get a dump from both existing backends and the
postmaster itself without a restart.

> The crash dump facility in dbghelp.dll offers *optional* features where apps
> can stream in additional app-specific data like recent log excerpts, etc. I
> didn't try to implement anything like that in the initial module partly
> because I want to minimize the amount of the backend that must be working
> for a crash dump to succeed.

Yeah, we already have the logs in the logfile etc. Let's keep this
uncomplicated so that it stays working :-)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-23 21:42:01
Message-ID: 18077.1290548521@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Tue, Nov 23, 2010 at 15:09, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
>> Yep. That's how I'd want to do it in 9.1 - test for the directory and use
>> that to decide whether to set the handler during early backend startup. That
>> way you don't need a GUC, and should be able to load it *very* early in
>> backend startup.

> Or you set the handler always, and have the handler only actually
> create the dump if the directory exists.

+1 for that way.

regards, tom lane


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-11-23 23:28:52
Message-ID: 4CEC4E34.2040007@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/24/2010 05:18 AM, Magnus Hagander wrote:

> Or you set the handler always, and have the handler only actually
> create the dump if the directory exists. That way you can add the
> directory and still get a dump from both existing backends and the
> postmaster itself without a restart.

That's way smarter. No extra filesystem access during startup, even if
it is cheap.

--
Craig Ringer


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-05 05:12:19
Message-ID: 4CFB1F33.7060100@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer wrote:
> On 11/24/2010 05:18 AM, Magnus Hagander wrote:
>
>> Or you set the handler always, and have the handler only actually
>> create the dump if the directory exists. That way you can add the
>> directory and still get a dump from both existing backends and the
>> postmaster itself without a restart.
>
> That's way smarter. No extra filesystem access during startup, even if
> it is cheap.

I added a commenting referencing this bit to the CF entry so it doesn't
get forgotten. Magnus raised a few other issues in his earlier review
too. Discussion of this patch seems to have jumped the gun a bit into
talking about backpatching before the code for HEAD was completely done,
then stalled there. Are we going to see an updated patch that addresses
submitted feedback in this cycle?

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-06 04:57:17
Message-ID: 4CFC6D2D.8090803@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22/11/2010 7:37 PM, Magnus Hagander wrote:

> Finally getting to looking at this one - sorry about the very long delay.

Ditto, I'm afraid.

> I agree with Heikki's earlier comment that it's better to have this
> included in the backend - but that's obviously not going to happen for
> already-released versions. I'd therefor advocate a contrib module for
> existing versions, and then in core for 9.1+.

I think that was the conclusion the following discussion came to, as
well. It's easy to distribute a binary .dll for older versions and if
the EDB folks found this facility really useful they could even bundle
it in the installer. There's no point backpatching it given how few
win32 users will build from source.

> We should then have an option to turn it off (on by default). But we
> don't need to pay the overhead on every backend startup - we could
> just map the value into the parameter shared memory block, and require
> a full postmaster restart in order to change it.

As discussed later, implemented by testing for a 'crashdumps' directory.
If the directory exists, a dump is written. If the directory is missing
a log message is emitted to explain why no dump was written; as this'll
only happen when a backend dies, log noise isn't an issue and it'll help
save sysadmin head-scratching given the "magic" behaviour of turning
on/off based on directory existence.

> Do we want to backpatch it into contrib/? Adding a new module there
> seems kind of wrong - probably better to keep the source separate and
> just publish the DLL files for people who do debugging?

+1 to just publishing the DLLs

> Looking at the code:
> * Why do we need to look at differnt versions of dbghelp.dll? Can't we
> always use the one with Windows? And in that case, can't we just use
> compile-time linking, do we need to bother with DLL loading at all?

Newer versions than those shipped with an OS release contain updated and
improved functionality as well as bug fixes. We *can* use the version
shipped with the OS, as even in XP dbghelp.dll contains the required
functionality, but it'd be better to bundle it. That's microsoft's
recommendation when you use dbghelp.dll .

> * Per your comment about using elog() - a good idea in that case is to
> use write_stderr(). That will send the output to stderr if there is
> one, and otherwise the eventlog (when running as service).

Hm, ok. The attached patch doesn't do that yet. Given the following
comments, do you think the change still necessary?

> * And yes, in the crash handler, we should *definitely* not use elog().

Currently I'm actually using it, but only after the dump has been
written or we've decided not to write a dump. At this point, if elog()
fails we don't really care, we'd just be returning control to the OS's
fatal exception cleanup anyway. It's IMO desirable for output to go to
the same log as everything else, so admins can find out what's going on
and can associate the crash dump files with events that happened around
a certain time.

> * On Unix, the core file is dropped in the database directory, we
> don't have a separate directory for crashdumps. If we want to be
> consistent, we should do that here too. I do think that storing them
> in a directory like "crashdumps" is better, but I just wanted to raise
> the comment.

Given the recent discussion about using the dir as an on/off switch,
that'll be necessary.

> * However, when storing it in crashdumps, I think the code would need
> to create that directory if it does not exist, doesn't it?

Not now that we're using it an on/off switch, but of course it would've
made sense otherwise.

> * Right now, we overwrite old crashdumps. It is well known that PIDs
> are recycled pretty quickly on Windows - should we perhaps dump as
> postgres-<pid>-<sequence>.mdmp when there is a filename conflict?

Good idea. Rather than try to test for existing files I've just taken
the simple route (always good in code run during process crash) and
appended the system ticks to the dump name. System ticks are a 32-bit
quantity that wraps about every 40 days, so they should be easily unique
enough in combination with the pid. The logs will report the crashdump
filenames unless the backend is too confused to be able to elog(), and
all the crash dump files have file system timestamps so there isn't much
point trying to format a crash date/time into their names.

I could use wall clock seconds (or ms) instead, but favour system ticks
because they can be read with a simple int-returning call that doesn't
require any more stack allocation. On Windows, getting wall clock time
in seconds seems to involve a call to GetSystemTimeAsFileTime
(http://msdn.microsoft.com/en-us/library/ms724397(v=VS.85).aspx) to
populate a struct containing a fake-64-bit int as two 32-bit ints. It's
not clear that it's free of timezone calculations and generally looks
like something better avoided in a crash handler if it's not necessary.
As GetSystemTicks seems quite sufficient, I thought I'd stick with it.

--
Craig Ringer

Attachment Content-Type Size
0001-Crash-dump-support-for-win32.patch text/plain 0 bytes

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-06 05:56:46
Message-ID: 4CFC7B1E.2020206@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6/12/2010 12:57 PM, Craig Ringer wrote:
> On 22/11/2010 7:37 PM, Magnus Hagander wrote:
>
>> Finally getting to looking at this one - sorry about the very long delay.
>
> Ditto, I'm afraid.

Oh, I forgot to mention in the patch email: I'm not sure I've taken the
right approach in terms of how I've hooked the crash dump code into the
build system. I'm fairly happy with when it's actually inited, and with
the win32 portfile (ports/win32/crashdump.c) - what I'm not so sure
about is how I've handled the conditional compilation by platform.

Currently, src/backend/ports/win32/crashdump.c is compiled on win32. It
exports a generic win32-garbage-free function:

void
installCrashDumpHandler();

I was just going to create a no-op version of the same function in a
(new) file src/backend/ports/crashdump.c , but I was having problems
figuring out how to sensibly express "compile this file for everything
except these ports". In addition, doing it this way means there's still
a pointless no-op function call in each backend start and some useless
symbols - maybe not a big worry, but not ideal.

What I ended up doing was putting a conditionally compiled switch in
port.h that produces an inline no-op version of installCrashDumpHandler
for unsupported platforms, and an extern for supported platforms
(currently win32). That'll optimize out completely on unsupported
platforms, which is nice if unimportant.

I'm just not sure if port.h is really the right place and if it's only
used by the backend. I considered a new header included by postgres.h
(since it's really backend-only stuff) but as it seems Pg generally
prefers bigger headers over more little headers, I popped it in port.h .

Comments?

--
Craig Ringer


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-14 06:02:27
Message-ID: 4D070873.7070000@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've attached an updated patch that fixes a failure when compiling on
gcc/linux. The no-op inline installCrashDumpHandler() for unsupported
platforms was not declared static, so it was not being optimized out of
objects it wasn't used in and was causing symbol collisions during linkage.

"make check" now passes correctly on Linux and no warnings are generated
during compilation.
=======================
All 124 tests passed.
=======================

Sorry about this mess-up; I don't recall having to use "inline static"
when using inlines in headers with c++, but as that was a while ago (and
C++) my recollection can't be trusted anyway.

Your thoughts on the following questions from before would be much
appreciated:

> I'm not sure I've taken the
> right approach in terms of how I've hooked the crash dump code into the
> build system. I'm fairly happy with when it's actually inited, and with
> the win32 portfile (ports/win32/crashdump.c) - what I'm not so sure
> about is how I've handled the conditional compilation by platform.
>
> Currently, src/backend/ports/win32/crashdump.c is compiled on win32. It
> exports a generic win32-garbage-free function:
>
> void
> installCrashDumpHandler();
>
> I was just going to create a no-op version of the same function in a
> (new) file src/backend/ports/crashdump.c , but I was having problems
> figuring out how to sensibly express "compile this file for everything
> except these ports". In addition, doing it this way means there's still
> a pointless no-op function call in each backend start and some useless
> symbols - maybe not a big worry, but not ideal.
>
> What I ended up doing was putting a conditionally compiled switch in
> port.h that produces an inline no-op version of installCrashDumpHandler
> for unsupported platforms, and an extern for supported platforms
> (currently win32). That'll optimize out completely on unsupported
> platforms, which is nice if unimportant.
>
> I'm just not sure if port.h is really the right place and if it's only
> used by the backend. I considered a new header included by postgres.h
> (since it's really backend-only stuff) but as it seems Pg generally
> prefers bigger headers over more little headers, I popped it in port.h .

Attachment Content-Type Size
0001-Add-support-for-generating-a-crash-dump-on-backend-p.patch text/plain 0 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-14 17:01:30
Message-ID: 14734.1292346090@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
> I've attached an updated patch that fixes a failure when compiling on
> gcc/linux. The no-op inline installCrashDumpHandler() for unsupported
> platforms was not declared static, so it was not being optimized out of
> objects it wasn't used in and was causing symbol collisions during linkage.

Why in the world would you get involved in that portability mess for a
function that is called only once? There's no possible performance
justification for making it inline.

I'm also wondering why you have got conflicting declarations in
postgres.h and port.h, and why none of these declarations follow
ANSI C (write "(void)" not "()").

regards, tom lane


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-15 13:18:08
Message-ID: 4D08C010.6020409@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've updated this entry in the CommitFest app to note that Craig had
some implementation questions attached to his patch submission that I
haven't seen anyone address yet, and to include a reference to Tom's
latest question--which may make those questions moot, not sure. This
pretty clearly need to sit on the stove a little bit longer before it's
done regardless. I'm marking this one Returned With Feedback, and
hopefully Craig will continue hammering on this to clean up the
remaining details and resubmit in the next month.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-15 14:20:36
Message-ID: 4D08CEB4.9030607@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/15/2010 01:01 AM, Tom Lane wrote:
> Craig Ringer<craig(at)postnewspapers(dot)com(dot)au> writes:
>> I've attached an updated patch that fixes a failure when compiling on
>> gcc/linux. The no-op inline installCrashDumpHandler() for unsupported
>> platforms was not declared static, so it was not being optimized out of
>> objects it wasn't used in and was causing symbol collisions during linkage.
>
> Why in the world would you get involved in that portability mess for a
> function that is called only once? There's no possible performance
> justification for making it inline.

The main concern I heard voiced when first suggesting this was about
performance. Given that concern, if I could make it a no-op on
unix/linux I thought that worth doing.

I'm _much_ happier with a simple, non-ifdef'd extern function
declaration and compilation of an empty function body on unsupported
platforms. Given how concerned everyone was about *any* effect on
backend startup, though, I was concerned that'd be turned down as
unnecessary bloat.

I've done it a nicer way now, and will post the updated patch once I've
had a chance to re-test it on my Windows dev box.

> I'm also wondering why you have got conflicting declarations in
> postgres.h and port.h, and why none of these declarations follow
> ANSI C (write "(void)" not "()").

For postgres.h : that's a good question, as I thought I removed that. I
suspect it was reintroduced when reapplying the patch to my working tree
to revise it. Whoops.

As for the ansi C style - too much time with C++, though long ago now. I
think I got the PostgreSQL rules for code formatting right, but missed
the void param rule.

--
Craig Ringer


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-16 08:21:03
Message-ID: 4D09CBEF.3090007@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all

Updated patch attached. This one avoids the inline function stuff and
instead just decides whether to compile unix_crashdump.c or
win32_crashdump.c based on build system tests. It passes "make check" on
nix and tests on win32, both with and without the "crashdumps" directory
existing. The inadvertent duplication of the functon prototype is fixed.

I haven't added any SGML documentation yet, as I'm not sure (a) to what
level it should be documented, and (b) whether it should be in the main
docs or just the developer docs plus the crash debugging guide on the
wiki. Thoughts?

You can also find this patch at:

https://github.com/ringerc/postgres/tree/crashdump

ie git://github.com/ringerc/postgres.git, branch "crashdump" .

--
System & Network Administrator
POST Newspapers

Attachment Content-Type Size
0001-Add-support-for-generating-a-crash-dump-on-backend-p.patch text/x-patch 0 bytes

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-16 08:28:23
Message-ID: 4D09CDA7.2000405@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

BTW, related to the crash dump work:

I wanted to see if I could get any integration into Windows Error
Reporting for Pg, because it'd be nice to be able to use Microsoft's
servers to aggregate crash reports and collect data on problems. WER (on
Windows 7 and Vista) also provides automatic management of retained
local copies of crash dumps, automatic bundling of related files,
automatic out-of-process debugging, etc.

WER has been significantly improved in Windows Visa / 7 and may be worth
considering enabling its use when Pg is installed on those platforms. Do
you think there'd be any interest in re-enabling WER when installation
on win7 / vista is detected?

I've seen mention that the win32 installers "go to some lengths" to
disable WER because it tends to hang the server while waiting for a user
prompt. I was wondering if/how the installer disables WER for Pg? There
aren't any calls to AddERExcludedApplication() or
WerAddExcludedApplication() in the Pg sources. Pg doesn't appear under:

HKEY_CURRENT_USER\Software\Microsoft\Windows\Windows Error Reporting
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\Windows Error Reporting

within an ExcludedApplications key either.

WHY WINDOWS ERROR REPORTING?
============================

WER on Windows 7 permits things like an out-of-process debugger to be
invoked. Once the postmaster knows how to manage generic children (as
has been discussed to permit better integration of PgAgent among other
things) it could also become responsible for doing out-of-process crash
dumping and error reporting on crashed children rather than having to do
potentially unreliable self-debugging as is presently the case. It might
not even be necessary to do that much, as WER on win7/vista offers so
much more configurability.

On Win7/vista you can also set WER up not to block execution while it
waits for user input, and to prompt the user even when the user is
running under a different security level. It can automatically bundle
postgresql.conf, the most recent log file, etc when an error report is
sent. By contrast to app-generated minidumps, most importantly it
doesn't require the user to do anything more than click "yes" for an
error report to be collected and become available for analysys of most
frequent failures.

The downside is that it requires a *business* to register for access to
WER data, and that business must obtain a VeriSign code signing cert
(US$99 if obtained for WER purposes). This might be something the EDB
folks would be interested in, especially as it should also permit
shipping of signed Pg binaries, which is great for protecting against AV
software, making it easier to deploy in business environments, etc.

Note that Windows Error Reporting doens't eliminate the utility of the
crashdumps feature, as self-crashdumps are easily accessible to and
configurable by the local user/admin and work fine on Windows XP. The
current self-generated crash dumps also contain more information than
you'll usually get from a WER dump, though it may be possible to
customise WER so it includes the same information.

I'd really like to know more about past experience with Windows Error
Reporting, because I'd like to investigate whether it's worth using in
future with a few tweaks for win7 use.

--
System & Network Administrator
POST Newspapers


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-16 13:01:22
Message-ID: AANLkTik2yrEHBOFWKuK9Zw4e+2da9dmGa0E6BRGf6vwy@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 16, 2010 at 09:21, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
> Hi all
>
> Updated patch attached. This one avoids the inline function stuff and
> instead just decides whether to compile unix_crashdump.c or
> win32_crashdump.c based on build system tests. It passes "make check" on
> nix and tests on win32, both with and without the "crashdumps" directory
> existing. The inadvertent duplication of the functon prototype is fixed.

I've updated this patch in quite a few ways, and you can find the
latest on my github (will attach when I'm done with cleanups). Things
I've changed so far:

* I moved it all into backend/port/win32, and thus removed all the
autoconf things, since they are not necessary. This is only for win32
at this point, and AFAIK we don't expect it to be for other platforms.
If that's needed, we can move it back later, but for now let's keep it
simple.
* using elog() really isn't safe. We set the crash handler *long*
before we have loaded the elog subsystem. It's better to get a log to
eventlog than to not get it at all.
* Plus a number of smaller changes

One thing that I did notice, and I'm not entirely sure what to do with
- which is why I wanted to post this while I'm still working on the
cleanup:

We use the existance of the "crashdumps" directory as an indication we
want crashdumps. That's fine when the system is up. But what if we
crash *in the postmaster before we have done chdir()*?

Should we perhaps instead define a subdirectory of *where the .EXE
file is*, and dump the file there?

> I haven't added any SGML documentation yet, as I'm not sure (a) to what
> level it should be documented, and (b) whether it should be in the main
> docs or just the developer docs plus the crash debugging guide on the
> wiki. Thoughts?

It needs to be documented somewhere.Perhaps in "15.8 Platform Specific
Notes"? That's really about building it, but it might be reasonable
there anyway? It does hold a number of things today that aren't
related to building, for other platforms.

(And yes, it should go in the wiki as well, of course)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-16 14:07:33
Message-ID: AANLkTikoXtZ5krsP_1c3TA-pxR-anfqPpN2mwydgaLut@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 16, 2010 at 8:01 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> We use the existance of the "crashdumps" directory as an indication we
> want crashdumps. That's fine when the system is up. But what if we
> crash *in the postmaster before we have done chdir()*?
>
> Should we perhaps instead define a subdirectory of *where the .EXE
> file is*, and dump the file there?

That seems pretty ugly, for a pretty marginal case. The number of
crashes that are going to happen in the postmaster before we've done
chdir() should be extremely small, especially if we're talking about
crashes in the field rather than during development. How about adding
a command-line option to force a dump to be written in $CWD whether a
crashdumps directory exists or not?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-16 14:16:26
Message-ID: AANLkTimZrdjo_OPfkhqxqLHbf8NSh0j6aaDnPAGUQV6D@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 16, 2010 at 14:01, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Thu, Dec 16, 2010 at 09:21, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
>> Hi all
>>
>> Updated patch attached. This one avoids the inline function stuff and
>> instead just decides whether to compile unix_crashdump.c or
>> win32_crashdump.c based on build system tests. It passes "make check" on
>> nix and tests on win32, both with and without the "crashdumps" directory
>> existing. The inadvertent duplication of the functon prototype is fixed.
>
> I've updated this patch in quite a few ways, and you can find the
> latest on my github (will attach when I'm done with cleanups). Things
> I've changed so far:

Found another problem in it: when running with an older version of
dbghelp.dll (which I was), it simply didn't work. We need to grab the
version of dbghelp.dll at runtime and pick which things we're going to
dump based on that.

The attached version of the patch does that. Can you please test that
it still generates the full dump on your system, which supposedly has
a much more modern version of dbghelp.dll than mine?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment Content-Type Size
win32_crash.patch text/x-patch 9.2 KB

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-16 14:24:22
Message-ID: AANLkTi=DHf3+7Va6sVp78R2RU+8A_avk4=cSTbNZnA3C@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 16, 2010 at 15:07, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Dec 16, 2010 at 8:01 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> We use the existance of the "crashdumps" directory as an indication we
>> want crashdumps. That's fine when the system is up. But what if we
>> crash *in the postmaster before we have done chdir()*?
>>
>> Should we perhaps instead define a subdirectory of *where the .EXE
>> file is*, and dump the file there?
>
> That seems pretty ugly, for a pretty marginal case.  The number of
> crashes that are going to happen in the postmaster before we've done
> chdir() should be extremely small, especially if we're talking about
> crashes in the field rather than during development.  How about adding
> a command-line option to force a dump to be written in $CWD whether a
> crashdumps directory exists or not?

Is that less ugly? ;)

But yes, we are talking about in the field, so it's fairly small. But
any crash during guc loading for example would go there, I think?

We can do such a commandline. We don't have any platform-specific
commandline options today. Is that something we've intentionally
avoided, or just not needed before?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-16 14:45:04
Message-ID: AANLkTikfv67Cn+-5KcHB3c-KazU96Yb6y+FZiWadoe0B@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 16, 2010 at 9:24 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> Is that less ugly? ;)

Well, I thought it was or I would have suggested it, but it's
obviously open to interpretation.

> But yes, we are talking about in the field, so it's fairly small. But
> any crash during guc loading for example would go there, I think?

Probably. But how likely is that to happen only on Windows and only
in the field?

> We can do such a commandline. We don't have any platform-specific
> commandline options today. Is that something we've intentionally
> avoided, or just not needed before?

Beats me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-16 14:56:13
Message-ID: 20896.1292511373@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Dec 16, 2010 at 9:24 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> We can do such a commandline. We don't have any platform-specific
>> commandline options today. Is that something we've intentionally
>> avoided, or just not needed before?

> Beats me.

Yes, it's something we intentionally avoid: there is no convenient way
to have conditional entries in getopt()'s option-letters string.

I think the proposal for such a switch is unnecessary lily-gilding,
and is far more likely to create problems than solve any. Please
remember that a debugging facility has got to be minimal impact,
or it creates its own Heisenbugs.

(In the unlikely event that someone needed to debug such an early crash,
surely they could just create a crashdump subdirectory in whatever
directory they want to launch the postmaster from.)

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-16 15:16:17
Message-ID: AANLkTinBKtSzsOzwAZUvkJdu=onds5c9hupvkCt-wTg0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 16, 2010 at 15:56, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Dec 16, 2010 at 9:24 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> We can do such a commandline. We don't have any platform-specific
>>> commandline options today. Is that something we've intentionally
>>> avoided, or just not needed before?
>
>> Beats me.
>
> Yes, it's something we intentionally avoid: there is no convenient way
> to have conditional entries in getopt()'s option-letters string.
>
> I think the proposal for such a switch is unnecessary lily-gilding,
> and is far more likely to create problems than solve any.  Please
> remember that a debugging facility has got to be minimal impact,
> or it creates its own Heisenbugs.
>
> (In the unlikely event that someone needed to debug such an early crash,
> surely they could just create a crashdump subdirectory in whatever
> directory they want to launch the postmaster from.)

Or actually - run the postmaster from a debugger...

I think the thing we lose is the ability for the DBA to say
pre-emptively "if this crashes tomorrow on restart i want a
crashdump".

Hmm. What we could do is have pg_ctl chdir() into the data directory
on start. We don't set it to anything there now - so we rely on the
"If this parameter is NULL, the new process will have the same current
drive and directory as the calling process." Is that likely to cause
problems in other areas?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-16 15:54:44
Message-ID: 21781.1292514884@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Thu, Dec 16, 2010 at 15:56, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I think the proposal for such a switch is unnecessary lily-gilding,

> Hmm. What we could do is have pg_ctl chdir() into the data directory
> on start.

See above. You're solving a problem that probably doesn't exist,
and introducing a pile of new ones in the process --- for example,
this will break commands that use relative paths for either the
postmaster executable or the data directory itself.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-16 22:40:36
Message-ID: 15764.1292539236@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Found another problem in it: when running with an older version of
> dbghelp.dll (which I was), it simply didn't work. We need to grab the
> version of dbghelp.dll at runtime and pick which things we're going to
> dump based on that.

> The attached version of the patch does that. Can you please test that
> it still generates the full dump on your system, which supposedly has
> a much more modern version of dbghelp.dll than mine?

This version of the patch looks fairly sane in terms of how it connects
to the rest of the code; but I'm unqualified to comment on the
Windows-specific code.

One thought is maybe the call point should be in startup_hacks() rather
than the main line of main()? I'm not terribly set on it if you don't
like that.

Also, I notice that the comment for startup_hacks() claims that it isn't
executed in subprocesses, which is evidently an obsolete statement ---
that should get cleaned up independently of this.

regards, tom lane


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-17 07:01:56
Message-ID: 4D0B0AE4.2020209@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16/12/10 21:01, Magnus Hagander wrote:

> Found another problem in it: when running with an older version of
> dbghelp.dll (which I was), it simply didn't work. We need to grab the
> version of dbghelp.dll at runtime and pick which things we're going to
> dump based on that.

I was about to suggest dropping the fallback loading of the system
dbghelp.dll, and just bundle a suitable version with Pg, then I saw how
big the DLL is. It's 800kb (!!) of code that'll hopefully never get used
on any given system. With a footprint like that, bundling it seems
rather less attractive.

I think Magnus is right: test the dbghelp.dll version and fall back to
supported features - as far back as XP, anyway; who cares about anything
earlier than that. An updated version can always be put in place by the
user if the built-in one can't produce enough detail.

> * I moved it all into backend/port/win32, and thus removed all the
> autoconf things, since they are not necessary. This is only for win32
> at this point, and AFAIK we don't expect it to be for other platforms.
> If that's needed, we can move it back later, but for now let's keep it
> simple.

Yeah, fair call - especially as most other OSes have native crash dump
facilities (kernel-generated core dumps, etc) that render explicit crash
handling much less useful.

> * using elog() really isn't safe. We set the crash handler *long*
> before we have loaded the elog subsystem. It's better to get a log to
> eventlog than to not get it at all.

Good point. It made some sense when running as a loadable module, as the
crash dumper was only loaded quite late, and so long as it elog()'d only
after writing the dump that was fine. Now, not so much.

> We use the existance of the "crashdumps" directory as an indication we
> want crashdumps. That's fine when the system is up. But what if we
> crash *in the postmaster before we have done chdir()*?
>
> Should we perhaps instead define a subdirectory of *where the .EXE
> file is*, and dump the file there?

Well, on Windows it's pretty easy to find out where your executable is,
and the crash dumper already does that to determine where to load
dbghelp.dll from. It'd be no stretch to use a subdir of the postgresql
install dir for crash dumps. OTOH, there are all sorts of exciting
permissions issues to deal with then, as write permission for "postgres"
won't be inherited from the datadir. It's also generally a big ugly and
not very good practice to write to the application directory.

If not writing to the datadir, for non-services you'd use something like:

%LOCALAPPDATA%\postgres\crashdumps

... but for a service running under its own account that's not a good
option either. Consequently, IMO the datadir remains the best place for
crashdumps.

Is this going to be a real-world issue? One simple answer might be not
to load the crash dump handler until after the postmaster chdir()s, but
honestly I'm not sure it matters. If it can't find somewhere to write
dumps, it doesn't write a dump.

The only possible issue I can imagine would be if the postmaster started
with a cwd that a malicious user could write to, so they could create a
'crashdumps' dir, somehow cause an early start postmaster crash,
therefore somehow leak info from the protected "postgres" account files
that way. A quick look at the code suggests that the postmaster doesn't
do anything interesting in the datadir before chdir()ing in there, so I
don't see any danger in that. In any case, if a malicious user has write
access to the postmaster's startup cwd, with the design of Windows PATH
and library loading that means they can replace critical DLLs with their
own malicious versions, so there's a whole lot more to worry about than
leaking information via crashdumps.

So: I say just use the cwd, whatever it is, and don't worry about it. An
admin can manually create a "crashdumps" dir in the service start cwd
and set appropriate permissions in the unlikely event they're trying to
collect crash dumps for an early-crashing postmaster and they can't just
use a debugger directly.

> It needs to be documented somewhere.Perhaps in "15.8 Platform Specific
> Notes"? That's really about building it, but it might be reasonable
> there anyway? It does hold a number of things today that aren't
> related to building, for other platforms.

Seems reasonable.

--
System & Network Administrator
POST Newspapers


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-17 07:46:56
Message-ID: AANLkTi=7=FfvHg97QkNMeH0H1TEet1mM5Cw6zgo3CGzt@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 17, 2010 8:02 AM, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au> wrote:
>
> On 16/12/10 21:01, Magnus Hagander wrote:
>
> > Found another problem in it: when running with an older version of
> > dbghelp.dll (which I was), it simply didn't work. We need to grab the
> > version of dbghelp.dll at runtime and pick which things we're going to
> > dump based on that.
>
> I was about to suggest dropping the fallback loading of the system
> dbghelp.dll, and just bundle a suitable version with Pg, then I saw how
> big the DLL is. It's 800kb (!!) of code that'll hopefully never get used
> on any given system. With a footprint like that, bundling it seems
> rather less attractive.
>
> I think Magnus is right: test the dbghelp.dll version and fall back to
> supported features - as far back as XP, anyway; who cares about anything
> earlier than that. An updated version can always be put in place by the
> user if the built-in one can't produce enough detail.

Did you get a chance to test that it still produced a full dump on your
system?

> > We use the existance of the "crashdumps" directory as an indication we
> > want crashdumps. That's fine when the system is up. But what if we
> > crash *in the postmaster before we have done chdir()*?
> >
> > Should we perhaps instead define a subdirectory of *where the .EXE
> > file is*, and dump the file there?

...

> Is this going to be a real-world issue?

Nah, I'm with tom on the fact that this is probably not.

> > It needs to be documented somewhere.Perhaps in "15.8 Platform Specific
> > Notes"? That's really about building it, but it might be reasonable
> > there anyway? It does hold a number of things today that aren't
> > related to building, for other platforms.
>
> Seems reasonable.

I'll put something together there then.

/Magnus


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-17 11:08:23
Message-ID: 4D0B44A7.3050509@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17/12/2010 3:46 PM, Magnus Hagander wrote:
>
> On Dec 17, 2010 8:02 AM, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au
> <mailto:craig(at)postnewspapers(dot)com(dot)au>> wrote:
> >
> > On 16/12/10 21:01, Magnus Hagander wrote:
> >
> > > Found another problem in it: when running with an older version of
> > > dbghelp.dll (which I was), it simply didn't work. We need to grab the
> > > version of dbghelp.dll at runtime and pick which things we're going to
> > > dump based on that.
> >
> > I was about to suggest dropping the fallback loading of the system
> > dbghelp.dll, and just bundle a suitable version with Pg, then I saw how
> > big the DLL is. It's 800kb (!!) of code that'll hopefully never get used
> > on any given system. With a footprint like that, bundling it seems
> > rather less attractive.
> >
> > I think Magnus is right: test the dbghelp.dll version and fall back to
> > supported features - as far back as XP, anyway; who cares about anything
> > earlier than that. An updated version can always be put in place by the
> > user if the built-in one can't produce enough detail.
>
> Did you get a chance to test that it still produced a full dump on your
> system?

I've just tested that and it looks ok so far. The dumps produced are
smaller - 1.3MB instead of ~4MB for a dump produced by calling a simple
"crashme()" function from SQL, the source for which follows at the end
of this post. However, I'm getting reasonable stack traces, as shown by
the windb.exe output below.

That said, the dump appears to exclude the backend's private memory. I
can't resolve *fcinfo from PG_FUNCTION_ARGS; it looks like the direct
values of arguments are available as they're on the stack, as are
function local variables, but the process heap isn't dumped so you can't
see the contents of any pointers-to-struct. That'll make it harder to
track down many kinds of error - but, on the other hand, means that
dumps of processes with huge work_mem etc won't be massively bloated.

That might be a safer starting point than including the private process
memory, really. I'd like to offer a flag to turn on dumping of private
process memory but I'm not sure how best to go about it, given that we'd
want to avoid accessing GUCs etc if possible. "later", I think; this is
better for now.

I'm happy with the patch as it stands, with the one exception that
consideration of mingw is required before it can be committed.

> Symbol search path is: C:\Users\Craig\Developer\postgres\Release\postgres;SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols;C:\Program Files\PostgreSQL\9.0\symbols
> Executable search path is:
> Windows 7 Version 7600 MP (2 procs) Free x86 compatible
> Product: WinNt, suite: SingleUserTS
> Machine Name:
> Debug session time: Fri Dec 17 17:37:19.000 2010 (UTC + 8:00)
> System Uptime: not available
> Process Uptime: 0 days 0:01:09.000
> ................................
> This dump file has an exception of interest stored in it.
> The stored exception information can be accessed via .ecxr.
> (16b4.cd8): Access violation - code c0000005 (first/second chance not available)
> eax=000002e8 ebx=03210aa8 ecx=0055e700 edx=03210a68 esi=03210a68 edi=0055ea14
> eip=771464f4 esp=0055e6d4 ebp=0055e6e4 iopl=0 nv up ei pl zr na pe nc
> cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
> ntdll!KiFastSystemCallRet:
> 771464f4 c3 ret
> 0:000> .ecxr
> eax=00000000 ebx=00000000 ecx=0103fa88 edx=7001100f esi=0103fa78 edi=0103fab4
> eip=70011052 esp=0055f62c ebp=0103fa78 iopl=0 nv up ei pl zr na pe nc
> cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
> crashme!crashdump_crashme+0x2:
> 70011052 c70001000000 mov dword ptr [eax],1 ds:0023:00000000=????????
> 0:000> ~
> . 0 Id: 16b4.cd8 Suspend: 0 Teb: 7ffdf000 Unfrozen
> 1 Id: 16b4.1798 Suspend: 0 Teb: 7ffde000 Unfrozen
> 2 Id: 16b4.6f4 Suspend: 0 Teb: 7ffdd000 Unfrozen
> 0:000> k
> *** Stack trace for last set context - .thread/.cxr resets it
> ChildEBP RetAddr
> 0055f628 0142c797 crashme!crashdump_crashme+0x2 [c:\users\craig\developer\postgres\contrib\crashme\crashme.c @ 14]
> 0055f68c 0142c804 postgres!ExecMakeFunctionResult+0x427 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 1824]
> 0055f6b4 0142b760 postgres!ExecEvalFunc+0x34 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 2260]
> 0055f6e0 0142ba83 postgres!ExecTargetList+0x70 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5095]
> 0055f720 0143f074 postgres!ExecProject+0x173 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5312]
> 0055f734 01427e07 postgres!ExecResult+0x94 [c:\users\craig\developer\postgres\src\backend\executor\noderesult.c @ 157]
> 0055f744 01425ccd postgres!ExecProcNode+0x67 [c:\users\craig\developer\postgres\src\backend\executor\execprocnode.c @ 361]
> 0055f758 01426ace postgres!ExecutePlan+0x2d [c:\users\craig\developer\postgres\src\backend\executor\execmain.c @ 1236]
> 0055f788 0152ec7d postgres!standard_ExecutorRun+0x8e [c:\users\craig\developer\postgres\src\backend\executor\execmain.c @ 288]
> 0055f7ac 0152f290 postgres!PortalRunSelect+0x6d [c:\users\craig\developer\postgres\src\backend\tcop\pquery.c @ 953]
> 0055f834 0152c2b2 postgres!PortalRun+0x190 [c:\users\craig\developer\postgres\src\backend\tcop\pquery.c @ 803]
> 0055f8e8 0152cbe5 postgres!exec_simple_query+0x3a2 [c:\users\craig\developer\postgres\src\backend\tcop\postgres.c @ 1067]
> 0055f96c 014f2bfc postgres!PostgresMain+0x575 [c:\users\craig\developer\postgres\src\backend\tcop\postgres.c @ 3935]
> 0055f98c 014f58c9 postgres!BackendRun+0x19c [c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c @ 3562]
> 0055fb30 014575bc postgres!SubPostmasterMain+0x2f9 [c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c @ 4058]
> 0055fb48 0162847d postgres!main+0x1ec [c:\users\craig\developer\postgres\src\backend\main\main.c @ 173]
> 0055fb8c 76ab1194 postgres!__tmainCRTStartup+0x10f [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586]
> 0055fb98 7715b495 kernel32!BaseThreadInitThunk+0xe
> 0055fbd8 7715b468 ntdll!__RtlUserThreadStart+0x70
> 0055fbf0 00000000 ntdll!_RtlUserThreadStart+0x1b

/* Condensed source for crash-test contrib module. */
#include "postgres.h"
#include "fmgr.h"
PG_MODULE_MAGIC;
extern Datum crashdump_crashme(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(crashdump_crashme);
Datum
crashdump_crashme(PG_FUNCTION_ARGS)
{
int * ptr = NULL;
*ptr = 1;
return NULL;
}

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-17 11:17:37
Message-ID: AANLkTi=7FkH3w02nTidEsnLQz2qfhbXsXHuDRqfjThK0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 17, 2010 at 12:08, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
> On 17/12/2010 3:46 PM, Magnus Hagander wrote:
>>
>> On Dec 17, 2010 8:02 AM, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au
>> <mailto:craig(at)postnewspapers(dot)com(dot)au>> wrote:
>>  >
>>  > On 16/12/10 21:01, Magnus Hagander wrote:
>>  >
>>  > > Found another problem in it: when running with an older version of
>>  > > dbghelp.dll (which I was), it simply didn't work. We need to grab the
>>  > > version of dbghelp.dll at runtime and pick which things we're going
>> to
>>  > > dump based on that.
>>  >
>>  > I was about to suggest dropping the fallback loading of the system
>>  > dbghelp.dll, and just bundle a suitable version with Pg, then I saw how
>>  > big the DLL is. It's 800kb (!!) of code that'll hopefully never get
>> used
>>  > on any given system. With a footprint like that, bundling it seems
>>  > rather less attractive.
>>  >
>>  > I think Magnus is right: test the dbghelp.dll version and fall back to
>>  > supported features - as far back as XP, anyway; who cares about
>> anything
>>  > earlier than that. An updated version can always be put in place by the
>>  > user if the built-in one can't produce enough detail.
>>
>> Did you get a chance to test that it still produced a full dump on your
>> system?
>
> I've just tested that and it looks ok so far. The dumps produced are smaller
> - 1.3MB instead of ~4MB for a dump produced by calling a simple "crashme()"
> function from SQL, the source for which follows at the end of this post.
> However, I'm getting reasonable stack traces, as shown by the windb.exe
> output below.
>
> That said, the dump appears to exclude the backend's private memory. I can't
> resolve *fcinfo from PG_FUNCTION_ARGS; it looks like the direct values of
> arguments are available as they're on the stack, as are function local
> variables, but the process heap isn't dumped so you can't see the contents
> of any pointers-to-struct. That'll make it harder to track down many kinds
> of error - but, on the other hand, means that dumps of processes with huge
> work_mem etc won't be massively bloated.

What version of dbghelp do you have? And what does the API report for
it? The code is supposed to add the private memory if it finds version
6 or higher, perhaps that check is incorrect?

> That might be a safer starting point than including the private process
> memory, really. I'd like to offer a flag to turn on dumping of private
> process memory but I'm not sure how best to go about it, given that we'd
> want to avoid accessing GUCs etc if possible. "later", I think; this is
> better for now.

I'm not too happy with having a bunch of switches for it. People
likely to tweak those can just install the debugger tools and use
those, I think.

So I'd be happy to have it enabled - given that we want a default.
However, what I'm most interested in right now is finding out why it's
not being included anymore in the new patch, because it's supposed to
be..

> I'm happy with the patch as it stands, with the one exception that
> consideration of mingw is required before it can be committed.

What do you mean? That it has to be tested on mingw specifically, or
something else?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Dave Page <dpage(at)pgadmin(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-17 15:17:02
Message-ID: 29511.1292599022@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Fri, Dec 17, 2010 at 12:08, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
>> That might be a safer starting point than including the private process
>> memory, really.

> I'm not too happy with having a bunch of switches for it. People
> likely to tweak those can just install the debugger tools and use
> those, I think.

Yeah.

> So I'd be happy to have it enabled - given that we want a default.

There is absolutely nothing more frustrating than having a crash dump
that hasn't got the information you need. Please turn it on by default.

regards, tom lane


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Magnus Hagander <magnus(at)hagander(dot)net>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-17 16:24:20
Message-ID: 4D0B8EB4.9090302@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17/12/2010 7:17 PM, Magnus Hagander wrote:

> What version of dbghelp do you have?

6.1.7600.16385 by default, as shipped in Windows 7 32-bit, and what I
was testing with.

6.12.0002.633 is what came with my copy of Debugging Tools for windows,
from the windows SDK. The same version comes with Visual Studio 10 Express.

6.9.0003.113 is what came with Visual Studio 9 Express.

I've now re-tested with the latest, 6.12.0002.633, and have the same result.

> And what does the API report for it?

This:

version = (*pApiVersion)();
write_stderr("version: %hu.%hu.%hu\n",
version->MajorVersion,
version->MinorVersion,
version->Revision);

outputs "version: 4.0.5" when loading dbghelp.dll 6.12.0002.633 from
c:\postgres91\bin\dbghelp.dll as verified by a write_stderr() just
before LoadLibrary and a test verifying the LoadLibrary worked.

Shouldn't dbghelp.dll just ignore any flags it doesn't understand? I'm
surprised we can't just pass flags a particular version doesn't know
about and have it deal with them. Still, I guess it's not wise to assume
such things.

> The code is supposed to add the private memory if it finds version
> 6 or higher, perhaps that check is incorrect?

It begins to look like the version check might be lying. I'll dig
further in a bit; it's 12:20am now and I need to sleep.

I'm going on holiday in about 48 hours, and will be away from Windows
(phew!) unless I get a VM set up during that time. I'll see what I can do.

>> I'm happy with the patch as it stands, with the one exception that
>> consideration of mingw is required before it can be committed.
>
> What do you mean? That it has to be tested on mingw specifically, or
> something else?

I'm not even sure it'll compile or link with MinGW, and if it does, I
doubt it'll be useful. It should only be compiled and called for native
VC++ builds.

I'm not sure if backend\port\win32 is built for all win32, or only for
VC++. Testing for defined(_MSC_VER) would do the trick. I'm not sure
testing defined(WIN32_ONLY_COMPILER) is quite right, since I doubt
dbghelp.dll would be much use for a Pg compiled with Borland or the like
either if that were ever supported.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-17 16:42:30
Message-ID: AANLkTimxOFVoqNUFdH+hNzWsySHzQfqX-gu24FQ7dkH6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 17, 2010 at 17:24, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
> On 17/12/2010 7:17 PM, Magnus Hagander wrote:
>
>> What version of dbghelp do you have?
>
> 6.1.7600.16385 by default, as shipped in Windows 7 32-bit, and what I was
> testing with.
>
> 6.12.0002.633 is what came with my copy of Debugging Tools for windows, from
> the windows SDK. The same version comes with Visual Studio 10 Express.
>
> 6.9.0003.113 is what came with Visual Studio 9 Express.
>
> I've now re-tested with the latest, 6.12.0002.633, and have the same result.
>
>> And what does the API report for it?
>
> This:
>
> version = (*pApiVersion)();
> write_stderr("version: %hu.%hu.%hu\n",
>        version->MajorVersion,
>        version->MinorVersion,
>        version->Revision);
>
> outputs "version: 4.0.5" when loading dbghelp.dll 6.12.0002.633 from
> c:\postgres91\bin\dbghelp.dll as verified by a write_stderr() just before
> LoadLibrary and a test verifying the LoadLibrary worked.

Now, that's annoying. So clearly we can't use that function to
determine which version we're on. Seems it only works for "image help
api", and not the general thing.

According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx,
we could look for:

SysEnumLines - if present, we have at least 6.1.

However, I don't see any function that appeared in 6.0 only..

We're probably better off looking at the VERSIONINFO structure. Which
has a not-so-nice API, but at least it's documented :)

> Shouldn't dbghelp.dll just ignore any flags it doesn't understand? I'm
> surprised we can't just pass flags a particular version doesn't know about
> and have it deal with them. Still, I guess it's not wise to assume such
> things.

You'd think so, but if I include all the flags and run it with the
default one on Windows 2003, it comes back with the infamous
"Parameter Is Incorrect" error.

>> The code is supposed to add the private memory if it finds version
>> 6 or higher, perhaps that check is incorrect?
>
> It begins to look like the version check might be lying. I'll dig further in
> a bit; it's 12:20am now and I need to sleep.
>
> I'm going on holiday in about 48 hours, and will be away from Windows
> (phew!) unless I get a VM set up during that time. I'll see what I can do.

I'll try to put together a version that uses the VERSIONINFO to
determine it, for you to test.

>>> I'm happy with the patch as it stands, with the one exception that
>>> consideration of mingw is required before it can be committed.
>>
>> What do you mean? That it has to be tested on mingw specifically, or
>> something else?
>
> I'm not even sure it'll compile or link with MinGW, and if it does, I doubt
> it'll be useful. It should only be compiled and called for native VC++
> builds.

I'm pretty sure it could be useful with mingw as well. not *as*
useful, but still useful. You'd get proper stack traces and such for
anything in msvcrt, and the actual reason for the exception etc.

> I'm not sure if backend\port\win32 is built for all win32, or only for VC++.

It's for all win32.

>  Testing for defined(_MSC_VER) would do the trick. I'm not sure testing
> defined(WIN32_ONLY_COMPILER) is quite right, since I doubt dbghelp.dll would
> be much use for a Pg compiled with Borland or the like either if that were
> ever supported.

We don't support building the backend with borland - only libpq and psql.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-17 17:13:15
Message-ID: AANLkTimj7U0r_beWhK+YkHf9mRZ7ToK_LmFT2cbad+Sz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 17, 2010 at 17:42, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Fri, Dec 17, 2010 at 17:24, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
>> On 17/12/2010 7:17 PM, Magnus Hagander wrote:
> Now, that's annoying. So clearly we can't use that function to
> determine which version we're on. Seems it only works for "image help
> api", and not the general thing.
>
> According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx,
> we could look for:
>
> SysEnumLines - if present, we have at least 6.1.
>
> However, I don't see any function that appeared in 6.0 only..

Actually, I'm wrong - there are functions enough to determine the
version. So here's a patch that tries that.

(BTW, the document is actually incorrect - partially because the
version that's shipped with Windows 2003 which is 5.2, is more or less
undocumented)

Let me know how that one works for you, and if it doesn't, whether it
does actually detect any of those higher versions by searching for the
functions.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment Content-Type Size
win32_crash.patch text/x-patch 10.1 KB

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 06:06:36
Message-ID: 4D0DA0EC.3070509@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17/12/2010 11:17 PM, Tom Lane wrote:

>> So I'd be happy to have it enabled - given that we want a default.
>
> There is absolutely nothing more frustrating than having a crash dump
> that hasn't got the information you need. Please turn it on by default.

There have to be limits to that, though. dbghelp.dll can dump shared
memory, too - but it's not going to be practical to write or work with
many-gigabyte dumps most of the time, and neither my initial patch nor
Magnus's recent reworking of it ask windbg.dll to include shared memory
in the dump.

It's possible to tell dbghelp.dll to include specific additional memory
ranges, so it may well be worth including specific critical areas within
the shared memory block while omitting the bulk of the buffers. Doing
that safely in an unsafe we're-crashing environment might not be simple,
though, especially if you're not even sure whether you got far into
startup before crashing.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 06:26:08
Message-ID: 4D0DA580.1000009@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18/12/2010 1:13 AM, Magnus Hagander wrote:
> On Fri, Dec 17, 2010 at 17:42, Magnus Hagander<magnus(at)hagander(dot)net> wrote:
>> On Fri, Dec 17, 2010 at 17:24, Craig Ringer<craig(at)postnewspapers(dot)com(dot)au> wrote:
>>> On 17/12/2010 7:17 PM, Magnus Hagander wrote:
>> Now, that's annoying. So clearly we can't use that function to
>> determine which version we're on. Seems it only works for "image help
>> api", and not the general thing.
>>
>> According to http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx,
>> we could look for:
>>
>> SysEnumLines - if present, we have at least 6.1.
>>
>> However, I don't see any function that appeared in 6.0 only..
>
> Actually, I'm wrong - there are functions enough to determine the
> version. So here's a patch that tries that.

Great. I pulled the latest from your git tree, tested that, and got much
better results. Crashdump size is back to what I expected. In my test
code, fcinfo->args and fcinfo->argnull can be examined without problems.
Backtraces look good; see below. It seems to be including backend
private memory again now. Thanks _very_ much for your work on this.

fcinfo->flinfo is still inaccessible, but I suspect it's in shared
memory, as it's at 0x00000135 . Ditto fcinfo->resultinfo and
fcinfo->context.

This has me wondering - is it going to be necessary to dump shared
memory to make many backtraces useful? I just responded to Tom
mentioning that the patch doesn't currently dump shared memory, but I
hadn't realized the extent to which it's used for _lots_ more than just
disk buffers. I'm not sure how to handle dumping shared_buffers when
someone might be using multi-gigabyte shared_buffers, though. Dumping
the whole lot would risk sudden out-of-disk-space issues, slowdowns as
dumps are written, and the backend being "frozen" as it's being dumped
could delay the system coming back up again. Trying to selectively dump
critical parts could cause dumps to fail if the system is in early
startup or a bad state.

The same concern applies to writing backend private memory; it's fine
most of the time, but if you're doing data warehousing queries with 2GB
of work_mem, it's going to be nasty having all that extra disk I/O and
disk space use, not to mention the hold-up while the dump is written. If
this is something we want to have people running in production "just in
case" or to track down rare / hard to reproduce faults, that'll be a
problem.

OTOH, we can't really go poking around in palloc contexts to decide what
to dump.

I guess we could always do a small, minimalist minidump, then write
_another_ dump that attempts to include select parts of shm and backend
private memory.

I just thought of two other things, too:

- Is it possible for this handler to be called recursively if it fails
during the handler call? If so, do we need to uninstall the handler
before attempting a dump to avoid such recursion? I need to do some
testing and dig around MSDN to find out more about this.

- Can asynchronous events like signals (or their win32 emulation)
interrupt an executing crash handler, or are they blocked before the
crash handler is called? If they're not blocked, do we need to try to
block them before attempting a dump? Again, I need to do some reading on
this.

Anyway, here's an example of the backtraces I'm currently getting.
They're clearly missing some parameters (in shm? Unsure) but provide
source file+line, argument values where resolvable, and the call stack
its self. Locals are accessible at all levels of the stack when you go
poking around in windbg.

> This dump file has an exception of interest stored in it.
> The stored exception information can be accessed via .ecxr.
> (930.12e8): Access violation - code c0000005 (first/second chance not available)
> eax=00bce2c0 ebx=72d0e800 ecx=000002e4 edx=72cb81c8 esi=000000f0 edi=00000930
> eip=771464f4 esp=00bce294 ebp=00bce2a4 iopl=0 nv up ei pl zr na pe nc
> cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
> ntdll!KiFastSystemCallRet:
> 771464f4 c3 ret
> 0:000> .ecxr
> eax=00000000 ebx=00000000 ecx=015fd7d8 edx=7362100f esi=015fd7c8 edi=015fd804
> eip=73621052 esp=00bcf284 ebp=015fd7c8 iopl=0 nv up ei pl zr na pe nc
> cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
> crashme!crashdump_crashme+0x2:
> 73621052 c70001000000 mov dword ptr [eax],1 ds:0023:00000000=????????
> 0:000> kp
> *** Stack trace for last set context - .thread/.cxr resets it
> ChildEBP RetAddr
> 00bcf280 0031c797 crashme!crashdump_crashme(struct FunctionCallInfoData * fcinfo = 0x015e3318)+0x2 [c:\users\craig\developer\postgres\contrib\crashme\crashme.c @ 14]
> 00bcf2e4 0031c804 postgres!ExecMakeFunctionResult(struct FuncExprState * fcache = 0x015e3318, struct ExprContext * econtext = 0x00319410, char * isNull = 0x00000000 "", ExprDoneCond * isDone = 0x7362100f)+0x427 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 1824]
> 00bcf30c 0031b760 postgres!ExecEvalFunc(struct FuncExprState * fcache = 0x00000000, struct ExprContext * econtext = 0x00000000, char * isNull = 0x00000000 "", ExprDoneCond * isDone = 0x00000000)+0x34 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 2260]
> 00bcf338 0031ba83 postgres!ExecTargetList(struct List * targetlist = 0x00000000, struct ExprContext * econtext = 0x00000000, unsigned int * values = 0x00000000, char * isnull = 0x00000000 "", ExprDoneCond * itemIsDone = 0x00000000, ExprDoneCond * isDone = 0x00000000)+0x70 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5095]
> 00bcf378 0032f074 postgres!ExecProject(struct ProjectionInfo * projInfo = 0x00000000, ExprDoneCond * isDone = 0x00000000)+0x173 [c:\users\craig\developer\postgres\src\backend\executor\execqual.c @ 5312]
> 00bcf38c 00317e07 postgres!ExecResult(struct ResultState * node = <Memory access error>)+0x94 [c:\users\craig\developer\postgres\src\backend\executor\noderesult.c @ 157]
> 00bcf39c 00315ccd postgres!ExecProcNode(struct PlanState * node = <Memory access error>)+0x67 [c:\users\craig\developer\postgres\src\backend\executor\execprocnode.c @ 361]
> 00bcf3b0 00316ace postgres!ExecutePlan(struct EState * estate = 0x015fd7c8, struct PlanState * planstate = <Memory access error>, CmdType operation = <Memory access error>, char sendTuples = <Memory access error>, long numberTuples = <Memory access error>, ScanDirection direction = NoMovementScanDirection (0n0), struct _DestReceiver * dest = <Memory access error>)+0x2d [c:\users\craig\developer\postgres\src\backend\executor\execmain.c @ 1236]
> 00bcf3e0 0041ec5d postgres!standard_ExecutorRun(struct QueryDesc * queryDesc = <Memory access error>, ScanDirection direction = <Memory access error>, long count = <Memory access error>)+0x8e [c:\users\craig\developer\postgres\src\backend\executor\execmain.c @ 288]
> 00bcf404 0041f270 postgres!PortalRunSelect(struct PortalData * portal = 0x00000000, char forward = <Memory access error>, long count = <Memory access error>, struct _DestReceiver * dest = <Memory access error>)+0x6d [c:\users\craig\developer\postgres\src\backend\tcop\pquery.c @ 953]
> 00bcf48c 0041c292 postgres!PortalRun(struct PortalData * portal = 0x015fb5b8, long count = 0n2147483647, char isTopLevel = 0n1 '', struct _DestReceiver * dest = 0x015e3418, struct _DestReceiver * altdest = 0x015e3418, char * completionTag = 0x00bcf500 "")+0x190 [c:\users\craig\developer\postgres\src\backend\tcop\pquery.c @ 803]
> 00bcf540 0041cbc5 postgres!exec_simple_query(char * query_string = 0x015fd7d8 "???")+0x3a2 [c:\users\craig\developer\postgres\src\backend\tcop\postgres.c @ 1067]
> 00bcf5c4 003e2bdc postgres!PostgresMain(int argc = 0n2, char ** argv = 0x01555138, char * username = 0x00d484a0 "Craig")+0x575 [c:\users\craig\developer\postgres\src\backend\tcop\postgres.c @ 3935]
> 00bcf5e4 003e58a9 postgres!BackendRun(struct Port * port = 0x00000000)+0x19c [c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c @ 3562]
> 00bcf788 003475bc postgres!SubPostmasterMain(int argc = 0n13900471, char ** argv = 0x00d41ac5)+0x2f9 [c:\users\craig\developer\postgres\src\backend\postmaster\postmaster.c @ 4058]
> 00bcf7a0 0051845d postgres!main(int argc = 0n1990922644, char ** argv = 0x7ffdf000)+0x1ec [c:\users\craig\developer\postgres\src\backend\main\main.c @ 173]
> 00bcf7e4 76ab1194 postgres!__tmainCRTStartup(void)+0x10f [f:\dd\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 586]
> 00bcf7f0 7715b495 kernel32!BaseThreadInitThunk+0xe
> 00bcf830 7715b468 ntdll!__RtlUserThreadStart+0x70
> 00bcf848 00000000 ntdll!_RtlUserThreadStart+0x1b

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 11:51:19
Message-ID: AANLkTimFyXg9KYx4S-COj2Ha5EfP_v+y7kXyC2e98ptd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 19, 2010 at 07:26, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
> On 18/12/2010 1:13 AM, Magnus Hagander wrote:
>>
>> On Fri, Dec 17, 2010 at 17:42, Magnus Hagander<magnus(at)hagander(dot)net>
>>  wrote:
>>>
>>> On Fri, Dec 17, 2010 at 17:24, Craig Ringer<craig(at)postnewspapers(dot)com(dot)au>
>>>  wrote:
>>>>
>>>> On 17/12/2010 7:17 PM, Magnus Hagander wrote:
>>>
>>> Now, that's annoying. So clearly we can't use that function to
>>> determine which version we're on. Seems it only works for "image help
>>> api", and not the general thing.
>>>
>>> According to
>>> http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx,
>>> we could look for:
>>>
>>> SysEnumLines - if present, we have at least 6.1.
>>>
>>> However, I don't see any function that appeared in 6.0 only..
>>
>> Actually, I'm wrong - there are functions enough to determine the
>> version. So here's a patch that tries that.
>
> Great. I pulled the latest from your git tree, tested that, and got much
> better results. Crashdump size is back to what I expected. In my test code,
> fcinfo->args and fcinfo->argnull can be examined without problems.
> Backtraces look good; see below. It seems to be including backend private
> memory again now. Thanks _very_ much for your work on this.

Ok, great. I think that leaves us at least complete enough to commit -
we can always improve things further as we get some more real world
testing.

> fcinfo->flinfo is still inaccessible, but I suspect it's in shared memory,
> as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context.

Hmm. Not sure why those would be in shared memory, that seems strange.

> This has me wondering - is it going to be necessary to dump shared memory to
> make many backtraces useful? I just responded to Tom mentioning that the
> patch doesn't currently dump shared memory, but I hadn't realized the extent
> to which it's used for _lots_ more than just disk buffers. I'm not sure how
> to handle dumping shared_buffers when someone might be using multi-gigabyte
> shared_buffers, though. Dumping the whole lot would risk sudden
> out-of-disk-space issues, slowdowns as dumps are written, and the backend
> being "frozen" as it's being dumped could delay the system coming back up
> again. Trying to selectively dump critical parts could cause dumps to fail
> if the system is in early startup or a bad state.

I don'tt hink a slowdown issue during dump being written is really a
problem - it's not like this is something that happens normally. In
fact, we're going to restart all the other backends after the crash
anyway, and that's going to be a lot more disruptive.

Filling up the disk, however, is a much bigger issue.. As is having a
huge dump to mail around (for error reports), if it's not actually
necessary...

> The same concern applies to writing backend private memory; it's fine most
> of the time, but if you're doing data warehousing queries with 2GB of
> work_mem, it's going to be nasty having all that extra disk I/O and disk
> space use, not to mention the hold-up while the dump is written. If this is
> something we want to have people running in production "just in case" or to
> track down rare / hard to reproduce faults, that'll be a problem.

Potential problem, yes, but not a very big one I believe. Yes, if that
very big backend crashes there will be a big dump - but how else are
you going to find out what went wrong?

In the shared memory case, *all* dumps will be huge, not just those
that happen to use a lot of local memory.

> OTOH, we can't really go poking around in palloc contexts to decide what to
> dump.

No, that's way more complex stuff than we dare execute in a crash handler.

> I guess we could always do a small, minimalist minidump, then write
> _another_ dump that attempts to include select parts of shm and backend
> private memory.
>
> I just thought of two other things, too:
>
> - Is it possible for this handler to be called recursively if it fails
> during the handler call? If so, do we need to uninstall the handler before
> attempting a dump to avoid such recursion? I need to do some testing and dig
> around MSDN to find out more about this.

I'm pretty sure I read somewhere that it can't happen and we don't
need to do that, but I can't find any reference to it atm :-(

> - Can asynchronous events like signals (or their win32 emulation) interrupt
> an executing crash handler, or are they blocked before the crash handler is
> called? If they're not blocked, do we need to try to block them before
> attempting a dump? Again, I need to do some reading on this.

The exception handler itself is global and will run on the crashing
thread. I'm fairly certain it stops the other threads while running,
but again I can't find a reference to that right now. But ISTM it
would have to.

> Anyway, here's an example of the backtraces I'm currently getting. They're
> clearly missing some parameters (in shm? Unsure) but provide source
> file+line, argument values where resolvable, and the call stack its self.
> Locals are accessible at all levels of the stack when you go poking around
> in windbg.

Yeah, they're still very useful. Is that a release or debug build?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 12:05:47
Message-ID: AANLkTin-o0oPgLsJFOhh=9iwZAksBOXWbHGhUJHYZdUY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 19, 2010 at 12:51, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Sun, Dec 19, 2010 at 07:26, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
>> On 18/12/2010 1:13 AM, Magnus Hagander wrote:
>>>
>>> On Fri, Dec 17, 2010 at 17:42, Magnus Hagander<magnus(at)hagander(dot)net>
>>>  wrote:
>>>>
>>>> On Fri, Dec 17, 2010 at 17:24, Craig Ringer<craig(at)postnewspapers(dot)com(dot)au>
>>>>  wrote:
>>>>>
>>>>> On 17/12/2010 7:17 PM, Magnus Hagander wrote:
>>>>
>>>> Now, that's annoying. So clearly we can't use that function to
>>>> determine which version we're on. Seems it only works for "image help
>>>> api", and not the general thing.
>>>>
>>>> According to
>>>> http://msdn.microsoft.com/en-us/library/ms679294(v=vs.85).aspx,
>>>> we could look for:
>>>>
>>>> SysEnumLines - if present, we have at least 6.1.
>>>>
>>>> However, I don't see any function that appeared in 6.0 only..
>>>
>>> Actually, I'm wrong - there are functions enough to determine the
>>> version. So here's a patch that tries that.
>>
>> Great. I pulled the latest from your git tree, tested that, and got much
>> better results. Crashdump size is back to what I expected. In my test code,
>> fcinfo->args and fcinfo->argnull can be examined without problems.
>> Backtraces look good; see below. It seems to be including backend private
>> memory again now. Thanks _very_ much for your work on this.
>
> Ok, great. I think that leaves us at least complete enough to commit -
> we can always improve things further as we get some more real world
> testing.

Actually, looking through it again just before commit, I can't help
but think the whole code about loading the DLL from different places
is unnecessary. The windows DLL search order *always* has the
directory of the EXE first, followed by either system dirs or CWD
depending on version.

Any reason not to just call LoadLibrary once?

That makes the patch look like attached.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment Content-Type Size
win32_crash.patch text/x-patch 8.7 KB

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 12:57:00
Message-ID: 4D0E011C.3030600@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 19/12/2010 7:51 PM, Magnus Hagander wrote:

>> Great. I pulled the latest from your git tree, tested that, and got much
>> better results. Crashdump size is back to what I expected. In my test code,
>> fcinfo->args and fcinfo->argnull can be examined without problems.
>> Backtraces look good; see below. It seems to be including backend private
>> memory again now. Thanks _very_ much for your work on this.
>
> Ok, great. I think that leaves us at least complete enough to commit -
> we can always improve things further as we get some more real world
> testing.
>
>
>> fcinfo->flinfo is still inaccessible, but I suspect it's in shared memory,
>> as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context.
>
> Hmm. Not sure why those would be in shared memory, that seems strange.

OK, I'll have to do some more digging on that, then. I'm getting on a
plane in about 2 hours, but will be bringing Visual Studio snapshots, a
postgres git tree, an XP vm, etc with me, so time permitting I should be
able to keep on with this.

>> Anyway, here's an example of the backtraces I'm currently getting. They're
>> clearly missing some parameters (in shm? Unsure) but provide source
>> file+line, argument values where resolvable, and the call stack its self.
>> Locals are accessible at all levels of the stack when you go poking around
>> in windbg.
>
> Yeah, they're still very useful. Is that a release or debug build?

That's a release build. I'm intentionally testing with release builds
because I want something that's useful on real-world end-user systems,
and I'm aware that few Windows users will build Pg for themselves.

(That said, the Perl-based build scripts are some of the least painful
build tools I've ever worked with on Windows. Only CMake can rival them
for low-pain Windows compilation.)

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 13:06:09
Message-ID: 4D0E0341.2020907@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 19/12/2010 8:05 PM, Magnus Hagander wrote:

> Actually, looking through it again just before commit, I can't help
> but think the whole code about loading the DLL from different places
> is unnecessary. The windows DLL search order *always* has the
> directory of the EXE first, followed by either system dirs or CWD
> depending on version.
>
> Any reason not to just call LoadLibrary once?

Good point. The program directory was added to the DLL search path with
Windows XP. On any Windows version that Pg targets you can trust Windows
to load a DLL from the app dir before system32. We don't really care
about win2k/nt4 or 9x, where this is necessary.

It's not a security concern for the reasons already outlined in the
comments in the patch - in brief, because Pg keeps the cwd inside the
datadir, and if someone can write malicious files in there you have big
problems already. If it was a security concern our fallback would be an
attempt to load %WINDIR%\system32\dbghelp.dll by full path, rather than
using an unqualified name to search the path.

So: I agree. We should just let Windows find dbghelp.dll on the normal
search path; building an explicit path isn't necessary.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 13:10:58
Message-ID: AANLkTimDLgdS5xibz+T=s3pvjDsfDujKFs3M-fuN70LJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 19, 2010 at 13:57, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
> On 19/12/2010 7:51 PM, Magnus Hagander wrote:
>
>>> Great. I pulled the latest from your git tree, tested that, and got much
>>> better results. Crashdump size is back to what I expected. In my test
>>> code,
>>> fcinfo->args and fcinfo->argnull can be examined without problems.
>>> Backtraces look good; see below. It seems to be including backend private
>>> memory again now. Thanks _very_ much for your work on this.
>>
>> Ok, great. I think that leaves us at least complete enough to commit -
>> we can always improve things further as we get some more real world
>> testing.
>>
>>
>>> fcinfo->flinfo is still inaccessible, but I suspect it's in shared
>>> memory,
>>> as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context.
>>
>> Hmm. Not sure why those would be in shared memory, that seems strange.
>
> OK, I'll have to do some more digging on that, then. I'm getting on a plane
> in about 2 hours, but will be bringing Visual Studio snapshots, a postgres
> git tree, an XP vm, etc with me, so time permitting I should be able to keep
> on with this.

Ok. I still think what we have now is a great improvement over
nothing. But improvements on top of it is obviously always good :-)

>>> Anyway, here's an example of the backtraces I'm currently getting.
>>> They're
>>> clearly missing some parameters (in shm? Unsure) but provide source
>>> file+line, argument values where resolvable, and the call stack its self.
>>> Locals are accessible at all levels of the stack when you go poking
>>> around
>>> in windbg.
>>
>> Yeah, they're still very useful. Is that a release or debug build?
>
> That's a release build. I'm intentionally testing with release builds
> because I want something that's useful on real-world end-user systems, and
> I'm aware that few Windows users will build Pg for themselves.

Good - I just wanted to be sure that's what you were testing as well.

> (That said, the Perl-based build scripts are some of the least painful build
> tools I've ever worked with on Windows. Only CMake can rival them for
> low-pain Windows compilation.)

:-) Thanks! They're not quite that painless to maintain, but it's not *too* bad.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 15:49:15
Message-ID: AANLkTi=Ogm9Mpc4B-E5bYHs2B9bWZTu5-9XOGTspSEPO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 19, 2010 at 14:06, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
> On 19/12/2010 8:05 PM, Magnus Hagander wrote:
>
>> Actually, looking through it again just before commit, I can't help
>> but think the whole code about loading the DLL from different places
>> is unnecessary. The windows DLL search order *always* has the
>> directory of the EXE first, followed by either system dirs or CWD
>> depending on version.
>>
>> Any reason not to just call LoadLibrary once?
>
> Good point. The program directory was added to the DLL search path with
> Windows XP. On any Windows version that Pg targets you can trust Windows to
> load a DLL from the app dir before system32. We don't really care about
> win2k/nt4 or 9x, where this is necessary.
>
> It's not a security concern for the reasons already outlined in the comments
> in the patch - in brief, because Pg keeps the cwd inside the datadir, and if
> someone can write malicious files in there you have big problems already. If
> it was a security concern our fallback would be an attempt to load
> %WINDIR%\system32\dbghelp.dll by full path, rather than using an unqualified
> name to search the path.
>
> So: I agree. We should just let Windows find dbghelp.dll on the normal
> search path; building an explicit path isn't necessary.

I've committed the version that does this.

Thanks!

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 16:02:48
Message-ID: 24439.1292774568@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Sun, Dec 19, 2010 at 07:26, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
>> fcinfo->flinfo is still inaccessible, but I suspect it's in shared memory,
>> as it's at 0x00000135 . Ditto fcinfo->resultinfo and fcinfo->context.

> Hmm. Not sure why those would be in shared memory, that seems strange.

FWIW, I don't believe that theory either. Offhand I cannot think of any
case where an FmgrInfo would be in shared memory. It seems more likely
from the above that you've misidentified where the fcinfo record is.

There are some code paths where we don't bother to set up those fields,
but I think we're always careful to set them to NULL if so.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 16:07:25
Message-ID: 24511.1292774845@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
> On 17/12/2010 11:17 PM, Tom Lane wrote:
>> There is absolutely nothing more frustrating than having a crash dump
>> that hasn't got the information you need. Please turn it on by default.

> There have to be limits to that, though. dbghelp.dll can dump shared
> memory, too - but it's not going to be practical to write or work with
> many-gigabyte dumps most of the time, and neither my initial patch nor
> Magnus's recent reworking of it ask windbg.dll to include shared memory
> in the dump.

Well, actually my comment above is based directly on pain in working
with core dumps on Unix-oid systems that don't include shared memory.
Linux *does* include shared memory, and I have not noticed that that
was a problem. Of course, in development installations I don't try to
raise shared_buffers to the moon ...

> It's possible to tell dbghelp.dll to include specific additional memory
> ranges, so it may well be worth including specific critical areas within
> the shared memory block while omitting the bulk of the buffers. Doing
> that safely in an unsafe we're-crashing environment might not be simple,
> though, especially if you're not even sure whether you got far into
> startup before crashing.

I agree that having the crash dump code know anything specific about the
contents of shared memory is a nonstarter --- far too fragile. But
perhaps we could have some simple rule based only on what the kernel
knows about the shmem block, like "dump shmem if it's no more than 1GB".

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 16:28:03
Message-ID: AANLkTim5fRWcLGr6Rmtf1JWyig9TpwvBDSpD+cWqDsoZ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I agree that having the crash dump code know anything specific about the
> contents of shared memory is a nonstarter --- far too fragile.  But
> perhaps we could have some simple rule based only on what the kernel
> knows about the shmem block, like "dump shmem if it's no more than 1GB".

Not sure what knobs we have available, but would there be any value in
trying to dump the FIRST 1GB?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Dave Page <dpage(at)pgadmin(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 16:39:32
Message-ID: AANLkTi=k=zh93ddEuu95tRKQyEyMGrC-PC-+1MeLpxwx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 19, 2010 at 17:28, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I agree that having the crash dump code know anything specific about the
>> contents of shared memory is a nonstarter --- far too fragile.  But
>> perhaps we could have some simple rule based only on what the kernel
>> knows about the shmem block, like "dump shmem if it's no more than 1GB".
>
> Not sure what knobs we have available, but would there be any value in
> trying to dump the FIRST 1GB?

So such knob that I can find. Basically, we pick a combination of the flags at

http://msdn.microsoft.com/en-us/library/ms680519(v=VS.85).aspx

We could send it as a "user stream", i guess, but it won't be
available in the debugger at the same address space then - just as a
blob of data to process manually. I doubt it's worth it in that
case...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Magnus Hagander <magnus(at)hagander(dot)net>, Dave Page <dpage(at)pgadmin(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 16:40:47
Message-ID: 25046.1292776847@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I agree that having the crash dump code know anything specific about the
>> contents of shared memory is a nonstarter --- far too fragile. But
>> perhaps we could have some simple rule based only on what the kernel
>> knows about the shmem block, like "dump shmem if it's no more than 1GB".

> Not sure what knobs we have available, but would there be any value in
> trying to dump the FIRST 1GB?

Probably not.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Dave Page <dpage(at)pgadmin(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2010-12-19 16:46:03
Message-ID: 25138.1292777163@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Sun, Dec 19, 2010 at 17:28, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Sun, Dec 19, 2010 at 11:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> perhaps we could have some simple rule based only on what the kernel
>>> knows about the shmem block, like "dump shmem if it's no more than 1GB".

>> Not sure what knobs we have available, but would there be any value in
>> trying to dump the FIRST 1GB?

> So such knob that I can find. Basically, we pick a combination of the flags at
> http://msdn.microsoft.com/en-us/library/ms680519(v=VS.85).aspx

Yeah, I thought I was probably asking for something that couldn't
practically be supported. Let's try it as-is for a few releases
and see how it goes.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2011-07-10 20:23:58
Message-ID: 1310329438.11713.1.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tor, 2010-12-16 at 15:16 +0100, Magnus Hagander wrote:
> Found another problem in it: when running with an older version of
> dbghelp.dll (which I was), it simply didn't work. We need to grab the
> version of dbghelp.dll at runtime and pick which things we're going to
> dump based on that.
>
> The attached version of the patch does that. Can you please test that
> it still generates the full dump on your system, which supposedly has
> a much more modern version of dbghelp.dll than mine?

I was going through the GetLastError() calls to unify the printf
formats, as discussed, and I stumbled across this:

+ write_stderr("could not write crash dump to %s: error code %08x\n",
+ dumpPath, GetLastError());

Is there a reason why this uses that particular format, unlike all other
call sites?


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)
Date: 2011-07-10 22:36:14
Message-ID: 4E1A295E.9030103@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/07/2011 4:23 AM, Peter Eisentraut wrote:
> I was going through the GetLastError() calls to unify the printf
> formats, as discussed, and I stumbled across this:
>
> + write_stderr("could not write crash dump to %s: error code %08x\n",
> + dumpPath, GetLastError());
>
> Is there a reason why this uses that particular format, unlike all other
> call sites?
>

Other call site usages include:

errmsg_internal("could not create signal event: %d", (int) GetLastError())
write_stderr("could not create signal listener pipe: error code %d;
retrying\n", (int) GetLastError())

etc, so I presume you're referring to the use of %08x as a format
specifier rather than %d ? The only reason is that ntstatus errors are
typically reported this way - but, really, not all possible errors
arising there would be ntstatus errors. As it's not consistent with the
rest of Pg I don't see any reason not to change it to %d.

--
Craig Ringer

POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088 Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/