Re: Crash dumps

Lists: pgsql-hackers
From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Crash dumps
Date: 2011-06-14 18:37:04
Message-ID: ed1c29215834e81b477e68eee6af80cc@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

Because, I work a little bit on streaming protocol and from time to
time I have crashes. I want ask if you wont crash reporting (this is one
of minors products from mmap playing) those what I have there is mmaped
areas, and call stacks, and some other stuff. This based reports works
for Linux with gdb, but there is some pluggable architecture, which
connects with segfault - one thing that should be considered is to kill
other processes immediately when reporting started (as taking report
takes some time) so some IPC will be required.

I may polish this a little bit, and send patch for this (currently
without IPC killing of others).

Regards,
Radosław Smogura


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash dumps
Date: 2011-07-04 04:58:46
Message-ID: 4E114886.4010104@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15/06/2011 2:37 AM, Radosław Smogura wrote:
> Hello,
>
> Because, I work a little bit on streaming protocol and from time to time
> I have crashes. I want ask if you wont crash reporting (this is one of
> minors products from mmap playing) those what I have there is mmaped
> areas, and call stacks, and some other stuff.

Core files already contain all that, don't they? They omit shared memory
segments by default on most platforms, but should otherwise be quite
complete.

The usual approach on UNIXes and linux is to use the built-in OS
features to generate a core dump of a crashing process then analyze it
after the fact. That way the crash is over as fast as possible and you
can get services back up and running before spending the time, CPU and
I/O required to analyze the core dump.

> This based reports works
> for Linux with gdb, but there is some pluggable architecture, which
> connects with segfault

Which process does the debugging? Does the crashing process fork() a
copy of gdb to debug its self?

One thing I've been interested in is giving the postmaster (or more
likely a helper for the postmaster) the ability to handle "backend is
crashing" messages, attach a debugger to the crashing backend and
generate a dump and/or backtrace. This might be workable in cases where
in-process debugging can't be done due to a smashed stack, full heap
causing malloc() failure, etc.

--
Craig Ringer


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash dumps
Date: 2011-07-04 11:03:38
Message-ID: a6237ee4eb53019dac510df00954ae55@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 04 Jul 2011 12:58:46 +0800, Craig Ringer wrote:
> On 15/06/2011 2:37 AM, Radosław Smogura wrote:
>> Hello,
>>
>> Because, I work a little bit on streaming protocol and from time to
>> time
>> I have crashes. I want ask if you wont crash reporting (this is one
>> of
>> minors products from mmap playing) those what I have there is mmaped
>> areas, and call stacks, and some other stuff.
>
> Core files already contain all that, don't they? They omit shared
> memory segments by default on most platforms, but should otherwise be
> quite complete.
>
> The usual approach on UNIXes and linux is to use the built-in OS
> features to generate a core dump of a crashing process then analyze
> it
> after the fact. That way the crash is over as fast as possible and
> you
> can get services back up and running before spending the time, CPU
> and
> I/O required to analyze the core dump.

Actually this, what I was thinking about was, to add dumping of GUC,
etc. List of mappings came from when I tired to mmap PostgreSQL, and due
to many of errors, which sometimes occurred in unexpected places, I was
in need to add something that will be useful for me and easy to analyse
(I could simple find pointer, and then check which region failed). The
idea to try to evolve this come later.

I think report should looks like:
This is crash report of PostgreSQL database, generated on
Here is list of GUC variables:
Here is list of files:
Here is backtrace:
Here is detailed backtrace:
Here is list of m-mappings (you may get what library are linked in)
Here is your free memory
Here is your disk usage
Here is your custom addition

>> This based reports works
>> for Linux with gdb, but there is some pluggable architecture, which
>> connects with segfault
>
> Which process does the debugging? Does the crashing process fork() a
> copy of gdb to debug its self?
>
> One thing I've been interested in is giving the postmaster (or more
> likely a helper for the postmaster) the ability to handle "backend is
> crashing" messages, attach a debugger to the crashing backend and
> generate a dump and/or backtrace. This might be workable in cases
> where in-process debugging can't be done due to a smashed stack, full
> heap causing malloc() failure, etc.

Currently I do everything in segfault handler (no fork), but I like the
idea of fork (in segfault), this may resolve some problems.

Regards,
Radosław Smogura


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash dumps
Date: 2011-07-04 11:57:34
Message-ID: 4E11AAAE.8020207@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/07/2011 7:03 PM, Radosław Smogura wrote:

> Actually this, what I was thinking about was, to add dumping of GUC,
> etc. List of mappings came from when I tired to mmap PostgreSQL, and due
> to many of errors, which sometimes occurred in unexpected places, I was
> in need to add something that will be useful for me and easy to analyse
> (I could simple find pointer, and then check which region failed). The
> idea to try to evolve this come later.

Why not produce a tool that watches the datadir for core files and
processes them? Most but not all of the info you listed should be able
to be extracted from a core file. Things like GUCs should be extractable
with a bit of gdb scripting - and with much less chance of crashing than
trying to read them from a possibly corrupt heap within a crashing backend.

To capture any information not available from the core, you can enlist
the postmaster's help. It gets notified when a child crashes and should
be able to capture things like the memory and disk state. See void
reaper(SIGNAL_ARGS) in postmaster.c and HandleChildCrash(...) . If
nothing else, the postmaster could probably fork a "child crashed"
helper to collect data, analyse the core file, email the report to the
admin, etc.

About the only issue there is that the postmaster relies on the exit
status to trigger the reaper code. Once an exit status is available, the
crashed process is gone, so the free memory will reflect the memory
state after the backend dies, and shared memory's state will have moved
on from how it was when the backend was alive.

For that reason, it'd be handy if a backend could trap SIGSEGV and
reliably tell the postmaster "I'm crashing!" so the postmaster could
fork a helper to capture any additional info the backend needs to be
alive for. Then the helper can gcore() the backend, or the backend can
just clear the SIGSEGV handler and kill(11) its self to keep on crashing
and generate a core.

Unfortunately, "reliably" and "segfault" don't go together. You don't
want a crashing postmaster writing to shared memory so it can't use shm
to tell the postmaster it's dying. Signals are ... interesting ... at
the best of times, but would probably still be the best bet. The
postmaster could install a SIGUSR[whatever] or RT signal handler that
takes a siginfo so it knows the pid of the signal sender. The crashing
backend could signal the postmaster with an agreed signal to say "I'm
crashing" and let the postmaster clean it up. The problem with this is
that a lost signal (for any reason) would cause a zombie backend to hang
around waiting to be killed by a postmaster that never heard it was
crashing.

BTW, the win32 crash dump handler would benefit from being able to use
some of the same facilities. In particular, being able to tell the
postmaster "Argh, ogod I'm crashing, fork something to dump my core!"
rather than trying to self-dump would be great. It'd also allow the
addition of extra info like GUC data, last few lines of logs etc to the
minidump, something that the win32 crash dump handler cannot currently
do safely.

--
Craig Ringer


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash dumps
Date: 2011-07-04 14:32:32
Message-ID: 24287.1309789952@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
> Why not produce a tool that watches the datadir for core files and
> processes them? ...

By and large, our attitude has been that Postgres shouldn't be crashing
often enough to make this sort of infrastructure worthwhile. Developer
time spent on it would be far better spent on fixing the bugs instead.

> For that reason, it'd be handy if a backend could trap SIGSEGV and
> reliably tell the postmaster "I'm crashing!" so the postmaster could
> fork a helper to capture any additional info the backend needs to be
> alive for. ...
> Unfortunately, "reliably" and "segfault" don't go together.

Yeah. I think there's no chance at all that we'd accept patches pointed
in this direction. They'd be more likely to decrease the system's
reliability than anything else. Aside from the difficulty of doing
anything at all reliably in an already-failing process, once we realize
that something is badly wrong it's important to kill off all other
backends ASAP. That reduces the window for any possible corruption of
shared memory to make it into on-disk state. So interposing a "helper"
to fool around with the failed process doesn't sound good at all.

In practice I think you can generally get everything of interest
out of the core file, so it's not clear to me that there's any win
available from this line of thought anyhow.

regards, tom lane


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash dumps
Date: 2011-07-04 16:47:34
Message-ID: 201107041847.34583.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Monday 04 of July 2011 16:32:32
> Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
> > Why not produce a tool that watches the datadir for core files and
> > processes them? ...
>
> By and large, our attitude has been that Postgres shouldn't be crashing
> often enough to make this sort of infrastructure worthwhile. Developer
> time spent on it would be far better spent on fixing the bugs instead.
>
> > For that reason, it'd be handy if a backend could trap SIGSEGV and
> > reliably tell the postmaster "I'm crashing!" so the postmaster could
> > fork a helper to capture any additional info the backend needs to be
> > alive for. ...
> > Unfortunately, "reliably" and "segfault" don't go together.
>
> Yeah. I think there's no chance at all that we'd accept patches pointed
> in this direction. They'd be more likely to decrease the system's
> reliability than anything else. Aside from the difficulty of doing
> anything at all reliably in an already-failing process, once we realize
> that something is badly wrong it's important to kill off all other
> backends ASAP. That reduces the window for any possible corruption of
> shared memory to make it into on-disk state. So interposing a "helper"
> to fool around with the failed process doesn't sound good at all.
>
> In practice I think you can generally get everything of interest
> out of the core file, so it's not clear to me that there's any win
> available from this line of thought anyhow.
>
> regards, tom lane
I asked about crash reports becaus of at this time there was thread about
crashing in live system.

There is one win, I think, users will faster send crash report, then core dump
(from many reasons, size, number of confidential information, etc).

Such report may be quite usefull, for "reasonable" bug finding.

It may be attached as contrib too, but I noticed and we agree that report
generation should not affect speed of shoutdown.

PostgreSQL will look better - server application that generates crash reports
looks better then no generating. Just a bit of marketing ;)

Regards,
Radek.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash dumps
Date: 2011-07-05 13:02:15
Message-ID: CA+TgmobO01FSyyEan0-s=bSP-RwVXrbUdC9phs--zVzZ+i+R5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 4, 2011 at 12:47 PM, Radosław Smogura
<rsmogura(at)softperience(dot)eu> wrote:
> I asked about crash reports becaus of at this time there was thread about
> crashing in live system.

Yeah, I thought this was the result of that effort:

commit dcb09b595f88a3bca6097a6acc17bf2ec935d55f
Author: Magnus Hagander <magnus(at)hagander(dot)net>
Date: Sun Dec 19 16:45:28 2010 +0100

Support for collecting crash dumps on Windows

Add support for collecting "minidump" style crash dumps on
Windows, by setting up an exception handling filter. Crash
dumps will be generated in PGDATA/crashdumps if the directory
is created (the existance of the directory is used as on/off
switch for the generation of the dumps).

Craig Ringer and Magnus Hagander

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash dumps
Date: 2011-07-05 13:05:14
Message-ID: CABUevExCM7DYx+6k_CdvO=sJWvPP19Q=sH+UJKvS9qcHDoJ1QQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 5, 2011 at 15:02, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jul 4, 2011 at 12:47 PM, Radosław Smogura
> <rsmogura(at)softperience(dot)eu> wrote:
>> I asked about crash reports becaus of at this time there was thread about
>> crashing in live system.
>
> Yeah, I thought this was the result of that effort:
>
> commit dcb09b595f88a3bca6097a6acc17bf2ec935d55f
> Author: Magnus Hagander <magnus(at)hagander(dot)net>
> Date:   Sun Dec 19 16:45:28 2010 +0100
>
>    Support for collecting crash dumps on Windows
>
>    Add support for collecting "minidump" style crash dumps on
>    Windows, by setting up an exception handling filter. Crash
>    dumps will be generated in PGDATA/crashdumps if the directory
>    is created (the existance of the directory is used as on/off
>    switch for the generation of the dumps).
>
>    Craig Ringer and Magnus Hagander

That crash dump is basically the windows equivalent of a coredump,
though. Just a different name...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash dumps
Date: 2011-07-05 13:24:31
Message-ID: CA+TgmoYW0sB0zfLESHHqDQZy=96p0quKEjK0dk4Ox_S-jBjCwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 5, 2011 at 9:05 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Tue, Jul 5, 2011 at 15:02, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Jul 4, 2011 at 12:47 PM, Radosław Smogura
>> <rsmogura(at)softperience(dot)eu> wrote:
>>> I asked about crash reports becaus of at this time there was thread about
>>> crashing in live system.
>>
>> Yeah, I thought this was the result of that effort:
>>
>> commit dcb09b595f88a3bca6097a6acc17bf2ec935d55f
>> Author: Magnus Hagander <magnus(at)hagander(dot)net>
>> Date:   Sun Dec 19 16:45:28 2010 +0100
>>
>>    Support for collecting crash dumps on Windows
>>
>>    Add support for collecting "minidump" style crash dumps on
>>    Windows, by setting up an exception handling filter. Crash
>>    dumps will be generated in PGDATA/crashdumps if the directory
>>    is created (the existance of the directory is used as on/off
>>    switch for the generation of the dumps).
>>
>>    Craig Ringer and Magnus Hagander
>
> That crash dump is basically the windows equivalent of a coredump,
> though. Just a different name...

Do we need something else in addition to that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash dumps
Date: 2011-07-05 23:59:12
Message-ID: 4E13A550.9010104@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/07/2011 9:05 PM, Magnus Hagander wrote:
> On Tue, Jul 5, 2011 at 15:02, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Jul 4, 2011 at 12:47 PM, Radosław Smogura
>> <rsmogura(at)softperience(dot)eu> wrote:
>>> I asked about crash reports becaus of at this time there was thread about
>>> crashing in live system.
>>
>> Yeah, I thought this was the result of that effort:

[snip]

> That crash dump is basically the windows equivalent of a coredump,
> though. Just a different name...

Yup, it's a cut-down core dump. In this case generated in-process by
the crashing backend.

It'd be nice to be able to generate the crash dump from out-of-process.
Unfortunately, the automatic crash dump generation system on Windows
doesn't appear to be available to system services running
non-interactively. Not that I could see, anyway. As a result we had to
trap the crashes within the crashing process and generate the dump from
there. As previously stated, doing anything within a segfaulting process
is unreliable, so it's not the best approach in the world.

All I was saying in this thread is that it'd be nice to have a way for a
crashing backend to request that another process capture diagnostic
information from it before it exits with a fault, so it doesn't have to
try to dump its self.

As Tom said, though, anything like that is more likely to decrease the
reliability of the overall system. You don't want a dead backend hanging
around forever waiting for the postmaster to act on it, and you *really*
don't want other backends still alive and potentially writing from shm
that's in in who-knows-what state while the postmaster is busy fiddling
with a crashed backend.

So, overall, I think "dump a simple core and die as quickly as possible"
is the best option. That's how it already works on UNIX, and all the
win32 crash dump patches do is make it work on Windows too.

--
Craig Ringer

POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088 Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash dumps
Date: 2011-07-06 15:00:32
Message-ID: 0419f40a0c90ace74e08266dfc525f7a@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 06 Jul 2011 07:59:12 +0800, Craig Ringer wrote:
> On 5/07/2011 9:05 PM, Magnus Hagander wrote:
>> On Tue, Jul 5, 2011 at 15:02, Robert Haas<robertmhaas(at)gmail(dot)com>
>> wrote:
>>> On Mon, Jul 4, 2011 at 12:47 PM, Radosław Smogura
>>> <rsmogura(at)softperience(dot)eu> wrote:
>>>> I asked about crash reports becaus of at this time there was
>>>> thread about
>>>> crashing in live system.
>>>
>>> Yeah, I thought this was the result of that effort:
>
> [snip]
>
>> That crash dump is basically the windows equivalent of a coredump,
>> though. Just a different name...
>
> Yup, it's a cut-down core dump. In this case generated in-process by
> the crashing backend.
>
> It'd be nice to be able to generate the crash dump from
> out-of-process. Unfortunately, the automatic crash dump generation
> system on Windows doesn't appear to be available to system services
> running non-interactively. Not that I could see, anyway. As a result
> we had to trap the crashes within the crashing process and generate
> the dump from there. As previously stated, doing anything within a
> segfaulting process is unreliable, so it's not the best approach in
> the world.
>
> All I was saying in this thread is that it'd be nice to have a way
> for a crashing backend to request that another process capture
> diagnostic information from it before it exits with a fault, so it
> doesn't have to try to dump its self.
>
> As Tom said, though, anything like that is more likely to decrease
> the reliability of the overall system. You don't want a dead backend
> hanging around forever waiting for the postmaster to act on it, and
> you *really* don't want other backends still alive and potentially
> writing from shm that's in in who-knows-what state while the
> postmaster is busy fiddling with a crashed backend.
>
> So, overall, I think "dump a simple core and die as quickly as
> possible" is the best option. That's how it already works on UNIX,
> and
> all the win32 crash dump patches do is make it work on Windows too.
>
> --
> Craig Ringer
>
> POST Newspapers
> 276 Onslow Rd, Shenton Park
> Ph: 08 9381 3088 Fax: 08 9388 2258
> ABN: 50 008 917 717
> http://www.postnewspapers.com.au/

Personally I will not send core dump to anyone, core dump may not only
contain sensible information from postmaster, but from other application
too.
Btw, I just take core dump form postmaster, I found there some dns
addresses I connected before from bash. Postamster should not see it.

I think IPC for fast shout down all backends and wait for report
processing is quite enaugh.

Regards,
Radosław Smogura


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash dumps
Date: 2011-07-06 23:05:48
Message-ID: 4E14EA4C.90405@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6/07/2011 11:00 PM, Radosław Smogura wrote:
> I think IPC for fast shout down all backends and wait for report
> processing is quite enaugh.
How do you propose to make that reliable, though?

--
Craig Ringer

POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088 Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash dumps
Date: 2011-07-07 06:00:15
Message-ID: 201107070800.16015.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> Thursday 07 of July 2011 01:05:48
> On 6/07/2011 11:00 PM, Radosław Smogura wrote:
> > I think IPC for fast shout down all backends and wait for report
> > processing is quite enaugh.
>
> How do you propose to make that reliable, though?
>
> --
> Craig Ringer
>
> POST Newspapers
> 276 Onslow Rd, Shenton Park
> Ph: 08 9381 3088 Fax: 08 9388 2258
> ABN: 50 008 917 717
> http://www.postnewspapers.com.au/

I want to add IPC layer to postgresql, few approches may be considerable,
1. System IPC
2. Urgent data on socket
3. Sockets (at least descriptors) + threads
4. Not portable, fork in segfault (I think forked process should start in
segfault too).

I actualy think for 3. sockets (on Linux pipes) + threads will be the best and
more portable, for each backend PostgreSQL will open two channels urgent and
normal, for each channel a thread will be spanned and this thread will just
wait for data, backend will not start if it didn't connected to postmaster.
Some security must be accounted when opening plain socket.

In context of crash, segfault sends information on urgent channel, and
postmaster kills all backends except sender, giving it time to work in
segfault.

Normal channels may, be used for scheduling some async operations, like read
next n-blocks when sequence scan started.

By the way getting reports on segfault isn't something "unusal", Your favorite
software Java(tm) Virtual Machine does it.

Regards,
Radosław Smogura


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash dumps
Date: 2011-07-07 14:22:28
Message-ID: 29881.1310048548@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

=?utf-8?q?Rados=C5=82aw_Smogura?= <rsmogura(at)softperience(dot)eu> writes:
> Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> Thursday 07 of July 2011 01:05:48
>> How do you propose to make that reliable, though?

> I want to add IPC layer to postgresql, few approches may be considerable,
> 1. System IPC
> 2. Urgent data on socket
> 3. Sockets (at least descriptors) + threads
> 4. Not portable, fork in segfault (I think forked process should start in
> segfault too).

An IPC layer to be invoked during segfaults? Somehow I don't think
that's going to pass the reliability threshold. It doesn't sound
promising from a portability standpoint either, since not one of your
suggestions will work everywhere, even without the already-segfaulted
context to worry about.

regards, tom lane