Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session

Lists: pgsql-bugspgsql-hackers
From: "Cristian" <cbittel(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5305: Postgres service stops when closing Windows session
Date: 2010-02-01 16:28:19
Message-ID: 201002011628.o11GSJ2i071756@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


The following bug has been logged online:

Bug reference: 5305
Logged by: Cristian
Email address: cbittel(at)gmail(dot)com
PostgreSQL version: 8.3.9
Operating system: Windows 2003 Server Standard x64
Description: Postgres service stops when closing Windows session
Details:

We connect to Windows server using the Terminal Services Clients (mstsc),
and performs maintenance task with pgAdmin 3.

PostgreSQL service crashes when the user close session on Windows, and the
following error is recorded in the pg_log files:

LOG: server process (PID 5200) exited with exit code 128

LOG: terminating any other active server processes

WARNING: terminating connection because of crash of another server process

DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.

HINT: In a moment you should be able to reconnect to the database and
repeat your command. ..

The server has the following specs:

Windows 2003 SP2 Standard 64-bit, 4GB, NOT joined to a domain.

PostgreSQL 8.3.9

pgAdmin 3

We connect without the /console parameter.

Any ideas?


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Cristian <cbittel(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5305: Postgres service stops when closing Windows session
Date: 2010-02-03 20:26:35
Message-ID: 603c8f071002031226w6fce0a71je2b0e2bab9e632fc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Feb 1, 2010 at 11:28 AM, Cristian <cbittel(at)gmail(dot)com> wrote:
>
> The following bug has been logged online:
>
> Bug reference:      5305
> Logged by:          Cristian
> Email address:      cbittel(at)gmail(dot)com
> PostgreSQL version: 8.3.9
> Operating system:   Windows 2003 Server Standard x64
> Description:        Postgres service stops when closing Windows session
> Details:
>
> We connect to Windows server using the Terminal Services Clients (mstsc),
> and performs maintenance task with pgAdmin 3.
>
> PostgreSQL service crashes when the user close session on Windows, and the
> following error is recorded in the pg_log files:
>
>
>
> LOG:  server process (PID 5200) exited with exit code 128
>
> LOG:  terminating any other active server processes
>
> WARNING:  terminating connection because of crash of another server process
>
> DETAIL:  The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited
> abnormally and possibly corrupted shared memory.
>
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command. ..
>
>
>
> The server has the following specs:
>
> Windows 2003 SP2 Standard 64-bit, 4GB, NOT joined to a domain.
>
> PostgreSQL 8.3.9
>
> pgAdmin 3
>
> We connect without the /console parameter.
>
>
> Any ideas?

So you're saying that if pgadmin is open when you close the terminal
services session, the SERVER crashes?

Did you somehow start the server in that same session, or is the
server running as a service?

...Robert


From: Cristian Bittel <cbittel(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5305: Postgres service stops when closing Windows session
Date: 2010-02-04 13:38:51
Message-ID: 652d02c21002040538x5283f091p3fbe8cd76d94ff45@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

2010/2/3 Robert Haas <robertmhaas(at)gmail(dot)com>

> On Mon, Feb 1, 2010 at 11:28 AM, Cristian <cbittel(at)gmail(dot)com> wrote:
> >
> > The following bug has been logged online:
> >
> > Bug reference: 5305
> > Logged by: Cristian
> > Email address: cbittel(at)gmail(dot)com
> > PostgreSQL version: 8.3.9
> > Operating system: Windows 2003 Server Standard x64
> > Description: Postgres service stops when closing Windows session
> > Details:
> >
> > We connect to Windows server using the Terminal Services Clients (mstsc),
> > and performs maintenance task with pgAdmin 3.
> >
> > PostgreSQL service crashes when the user close session on Windows, and
> the
> > following error is recorded in the pg_log files:
> >
> >
> >
> > LOG: server process (PID 5200) exited with exit code 128
> >
> > LOG: terminating any other active server processes
> >
> > WARNING: terminating connection because of crash of another server
> process
> >
> > DETAIL: The postmaster has commanded this server process to roll back
> the
> > current transaction and exit, because another server process exited
> > abnormally and possibly corrupted shared memory.
> >
> > HINT: In a moment you should be able to reconnect to the database and
> > repeat your command. ..
> >
> >
> >
> > The server has the following specs:
> >
> > Windows 2003 SP2 Standard 64-bit, 4GB, NOT joined to a domain.
> >
> > PostgreSQL 8.3.9
> >
> > pgAdmin 3
> >
> > We connect without the /console parameter.
> >
> >
> > Any ideas?
>
> So you're saying that if pgadmin is open when you close the terminal
> services session, the SERVER crashes?
>
> Did you somehow start the server in that same session, or is the
> server running as a service?
>
> ...Robert
>

If pgAdmin is open inside any mstsc session, mine or another terminal
session of another user, the main PostgreSQL service crash.

Logs report server process exit code 128, two final lines are repeated for
each active connection to postgres from Apache server, and below (in
spanish) the Security Event Viwer where Administrator user logoff and then
"postgres" user tryed to login again to Windows:

2009-10-13 22:10:47 PYT LOG: loaded library "$libdir/plugins/plugin_
debugger.dll"
2009-10-13 22:30:08 PYT LOG: loaded library
"$libdir/plugins/plugin_debugger.dll"
2009-10-13 22:40:30 PYT LOG: loaded library
"$libdir/plugins/plugin_debugger.dll"
2009-10-13 22:50:09 PYT LOG: loaded library
"$libdir/plugins/plugin_debugger.dll"
*2009-10-13 22:57:41 PYT LOG: server process (PID 50516) exited with exit
code 128*
2009-10-13 22:57:41 PYT LOG: terminating any other active server processes
2009-10-13 22:57:41 PYT WARNING: terminating connection because of crash of
another server process
2009-10-13 22:57:41 PYT DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2009-10-13 22:57:41 PYT HINT: In a moment you should be able to reconnect
to the database and repeat your command.

The extract for the events:
1) Aplication Popup: postgres.exe Application Error. Application could not
initialize.
2) Service Control Manager: PostgreSQL Database Server 8.3 stopped.
3) Security: Session Login for the "postgres" user account by the
MICROSOFT_AUTHENTICATION_PACKAGE_V1_0
4, 5) Security: Details of session login for postgres user account.

Tipo de suceso: Información
Origen del suceso: Application Popup
Categoría del suceso: Ninguno
Id. suceso: 26
Fecha: 13/10/2009
Hora: 22:57:40
Usuario: No disponible
Equipo: SVCTAG-DL6W3J1
Descripción:
Aplicación emergente: postgres.exe - Error de la aplicación : La aplicación
no se ha podido inicializar correctamente (0xc0000142). Haga clic en Aceptar
para terminar la aplicación.

Para obtener más información, vea el Centro de ayuda y soporte técnico en
http://go.microsoft.com/fwlink/events.asp.

Tipo de suceso: Información
Origen del suceso: Service Control Manager
Categoría del suceso: Ninguno
Id. suceso: 7036
Fecha: 13/10/2009
Hora: 22:57:42
Usuario: No disponible
Equipo: SVCTAG-DL6W3J1
Descripción:
El servicio PostgreSQL Database Server 8.3 entró en estado detenido.

Para obtener más información, vea el Centro de ayuda y soporte técnico en
http://go.microsoft.com/fwlink/events.asp.

Tipo de suceso: Aciertos
Origen del suceso: Security
Categoría del suceso: Inicio de sesión de la cuenta
Id. suceso: 680
Fecha: 13/10/2009
Hora: 23:00:11
Usuario: SVCTAG-DL6W3J1\postgres
Equipo: SVCTAG-DL6W3J1
Descripción:
Inicio de sesión intentado por: MICROSOFT_AUTHENTICATION_
PACKAGE_V1_0
Cuenta de inicio de sesión: postgres
Estación de trabajo de origen: SVCTAG-DL6W3J1
Código de error: 0x0

Para obtener más información, vea el Centro de ayuda y soporte técnico en
http://go.microsoft.com/fwlink/events.asp.

Tipo de suceso: Aciertos
Origen del suceso: Security
Categoría del suceso: Inicio/cierre de sesión
Id. suceso: 552
Fecha: 13/10/2009
Hora: 23:00:11
Usuario: NT AUTHORITY\SYSTEM
Equipo: SVCTAG-DL6W3J1
Descripción:
Intento de inicio de sesión usando las credenciales explícitas:
Usuario que ha iniciado sesión:
Nombre de usuario: SVCTAG-DL6W3J1$
Dominio: WORKGROUP
Id. de inicio de sesión: (0x0,0x3E7)
GUID de inicio de sesión: -
Usuario cuyas credenciales se usaron:
Nombre usuario de destino: postgres
Dominio de destino: SVCTAG-DL6W3J1
GUID de inicio de sesión de destino -

Nombre de servidor de destino: localhost
Información de servidor de destino: localhost
Id del proceso del llamador:: 428
Dirección de red de origen: -
Puerto de origen: -

Para obtener más información, vea el Centro de ayuda y soporte técnico en
http://go.microsoft.com/fwlink/events.asp.

Tipo de suceso: Aciertos
Origen del suceso: Security
Categoría del suceso: Inicio/cierre de sesión
Id. suceso: 528
Fecha: 13/10/2009
Hora: 23:00:11
Usuario: SVCTAG-DL6W3J1\postgres
Equipo: SVCTAG-DL6W3J1
Descripción:
Inicio de sesión realizado:
Nombre de usuario: postgres
Dominio: SVCTAG-DL6W3J1
Id. de inicio de sesión: (0x0,0x277734D8)
Tipo de inicio de sesión: 5
Proceso de inicio de sesión: Advapi
Paquete de autenticación: Negotiate
Nombre de estación de trabajo: SVCTAG-DL6W3J1
GUID de inicio de sesión: - Nombre de usuario del llamador:
SVCTAG-DL6W3J1$
Dominio del llamador: WORKGROUP
Id de inicio de sesión del llamador: (0x0,0x3E7)
Id del proceso del llamador: 428
Servicios transitados: -
Dirección de red de origen: -
Puerto de origen: -

Para obtener más información, vea el Centro de ayuda y soporte técnico en
http://go.microsoft.com/fwlink/events.asp.

Tipo de suceso: Aciertos
Origen del suceso: Security
Categoría del suceso: Inicio/cierre de sesión
Id. suceso: 576
Fecha: 13/10/2009
Hora: 23:00:11
Usuario: SVCTAG-DL6W3J1\postgres
Equipo: SVCTAG-DL6W3J1
Descripción:
Privilegios especiales asignados al nuevo inicio de sesión:
Usuario:
Dominio:
Id. de inicio de sesión: (0x0,0x277734D8)
Privilegios: SeImpersonatePrivilege

Para obtener más información, vea el Centro de ayuda y soporte técnico en
http://go.microsoft.com/fwlink/events.asp.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Cristian Bittel <cbittel(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5305: Postgres service stops when closing Windows session
Date: 2010-02-06 22:36:27
Message-ID: 603c8f071002061436x5d74b9dj21d8ab34ae84f731@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Feb 4, 2010 at 8:38 AM, Cristian Bittel <cbittel(at)gmail(dot)com> wrote:
> 2010/2/3 Robert Haas <robertmhaas(at)gmail(dot)com>
>> So you're saying that if pgadmin is open when you close the terminal
>> services session, the SERVER crashes?
>>
>> Did you somehow start the server in that same session, or is the
>> server running as a service?
>>
>> ...Robert
>
> If pgAdmin is open inside any mstsc session, mine or another terminal
> session of another user, the main PostgreSQL service crash.
>
> Logs report server process exit code 128, two final lines are repeated for
> each active connection to postgres from Apache server, and below (in
> spanish) the Security Event Viwer where Administrator user logoff and then
> "postgres" user tryed to login again to Windows:

That's really odd. Nothing pgAdmin does should be able to crash the
PostgreSQL server, I would think. Have you got any custom code loaded
into PostgreSQL? Or non-custom, but buggy?

I'm guessing the problem only occurs if PGadmin is actually connected
to the PostgreSQL server, but perhaps you could verify that. If so, I
would see if you can get a stack backtrace of the backend to which
PGadmin is connected.

...Robert


From: Chris Travers <chris(at)metatrontech(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5305: Postgres service stops when closing Windows session
Date: 2010-02-07 01:09:24
Message-ID: 5ed37b141002061709s1a2451e8lf56588328918d388@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Sat, Feb 6, 2010 at 2:36 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>
> That's really odd.  Nothing pgAdmin does should be able to crash the
> PostgreSQL server, I would think.  Have you got any custom code loaded
> into PostgreSQL?  Or non-custom, but buggy?
>
> I'm guessing the problem only occurs if PGadmin is actually connected
> to the PostgreSQL server, but perhaps you could verify that.  If so, I
> would see if you can get a stack backtrace of the backend to which
> PGadmin is connected.

It wouldn't surprise me if this were a Windows bug (Terminal Services
may have improved since I was supporting it but it used to be quite
common that it would cause weird behavior in applications).... I
personally think the stack trace is likely to be the best way to test
where the problem is.

Best Wishes,
Chris Travers


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Chris Travers <chris(at)metatrontech(dot)com>
Cc: Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-20 01:43:02
Message-ID: AANLkTim-rdu_a=PQf3+4OAJzumjm=B_YWeMOPetZ3d2z@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Sat, Feb 6, 2010 at 9:09 PM, Chris Travers <chris(at)metatrontech(dot)com> wrote:
> On Sat, Feb 6, 2010 at 2:36 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> That's really odd.  Nothing pgAdmin does should be able to crash the
>> PostgreSQL server, I would think.  Have you got any custom code loaded
>> into PostgreSQL?  Or non-custom, but buggy?
>>
>> I'm guessing the problem only occurs if PGadmin is actually connected
>> to the PostgreSQL server, but perhaps you could verify that.  If so, I
>> would see if you can get a stack backtrace of the backend to which
>> PGadmin is connected.
>
> It wouldn't surprise me if this were a Windows bug (Terminal Services
> may have improved since I was supporting it but it used to be quite
> common that it would cause weird behavior in applications)....  I
> personally think the stack trace is likely to be the best way to test
> where the problem is.

I suspect this is the same problem as bug #4897, and probably also the
same problem as this:
http://archives.postgresql.org/pgsql-bugs/2009-08/msg00114.php

and maybe also this and this:
http://archives.postgresql.org/pgsql-bugs/2010-02/msg00179.php
http://archives.postgresql.org/pgsql-admin/2009-05/msg00105.php

Unfortunately, it seems that no one has been able to get a stack trace yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: "Soporte (at) Teksol SA" <soporte(at)teksol(dot)com(dot)ar>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Chris Travers <chris(at)metatrontech(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-20 11:55:22
Message-ID: AANLkTi=WMgKxW59oRi7EaBmFgsbwxXzXh-Jw92qX-D_R@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

From my side, i have no choice to get the stack trace from production
servers where i found the issue. I have another several servers with
almost the same config to development purposes and no crashes there. I
don't have any code into the database, there is no compiled functions,
just sql queries from php code, using persistant connections
pg_pconnect().

All bugs sseams to be the same issue, took some time to relate the
crashes with exit code 128 to the terminal session ends, sometimes
there is more than one session started.

Is just a world wide issue or is something that affects to a
non-USenglish version of Windows 2003 Standard x64 Servers?
mine are in spanish lang, other report is in french lang, other report
came from british. And seems to be independant from Postgres version,
i use 8.3.9 and there is another report with 8.4.1. There is a new
version of PgAdmin, maybe should i replace the original provided with
postgres.

all appreciate your big effort,

Cristian.

2010/8/19, Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Sat, Feb 6, 2010 at 9:09 PM, Chris Travers <chris(at)metatrontech(dot)com>
> wrote:
>> On Sat, Feb 6, 2010 at 2:36 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> That's really odd. Nothing pgAdmin does should be able to crash the
>>> PostgreSQL server, I would think. Have you got any custom code loaded
>>> into PostgreSQL? Or non-custom, but buggy?
>>>
>>> I'm guessing the problem only occurs if PGadmin is actually connected
>>> to the PostgreSQL server, but perhaps you could verify that. If so, I
>>> would see if you can get a stack backtrace of the backend to which
>>> PGadmin is connected.
>>
>> It wouldn't surprise me if this were a Windows bug (Terminal Services
>> may have improved since I was supporting it but it used to be quite
>> common that it would cause weird behavior in applications).... I
>> personally think the stack trace is likely to be the best way to test
>> where the problem is.
>
> I suspect this is the same problem as bug #4897, and probably also the
> same problem as this:
> http://archives.postgresql.org/pgsql-bugs/2009-08/msg00114.php
>
> and maybe also this and this:
> http://archives.postgresql.org/pgsql-bugs/2010-02/msg00179.php
> http://archives.postgresql.org/pgsql-admin/2009-05/msg00105.php
>
> Unfortunately, it seems that no one has been able to get a stack trace yet.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Chris Travers <chris(at)metatrontech(dot)com>
Cc: Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-23 14:03:54
Message-ID: AANLkTikWfGVc=m_OEfiz5O9eyx2d78i6FRE5U0SAwNO4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

[moving to -hackers]

On Thu, Aug 19, 2010 at 9:43 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I suspect this is the same problem as bug #4897, and probably also the
> same problem as this:
> http://archives.postgresql.org/pgsql-bugs/2009-08/msg00114.php
>
> and maybe also this and this:
> http://archives.postgresql.org/pgsql-bugs/2010-02/msg00179.php
> http://archives.postgresql.org/pgsql-admin/2009-05/msg00105.php
>
> Unfortunately, it seems that no one has been able to get a stack trace yet.

Bruce pointed out yet another report of this problem to me:

http://archives.postgresql.org/pgsql-general/2010-08/msg00550.php

After some discussion with Magnus, I think what is going on here is
that the postmaster kicks off a new child process, which terminates
before it actually starts running our code, either in OS-supplied code
or some sort of "filter" like anti-spam or anti-virus software. It's
presumably NOT dying in our code because - at least AFAICS - we don't
exit(128) anywhere. One way we could possibly improve the situation
is to not treat this as a child crash - that is, don't do a
crash-and-restart cycle; just treat that backend as having done
elog(FATAL). The trick is that you need a reliable way to distinguish
between a regular child crash and an "early" child crash. Magnus
suggested perhaps we could create a mutex that the child grabs before
mapping shared memory; the postmaster could check whether the mutex
had been taken. If so, we handle the crash normally; if not, we just
chalk it up to experience and continue on.

This isn't really a "fix" for the bug in the sense that the nicest
thing of all would be to prevent the child from exiting abnormally in
the first place. But it's far from clear that we can control that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-23 15:09:09
Message-ID: 21229.1282576149@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> After some discussion with Magnus, I think what is going on here is
> that the postmaster kicks off a new child process, which terminates
> before it actually starts running our code, either in OS-supplied code
> or some sort of "filter" like anti-spam or anti-virus software. It's
> presumably NOT dying in our code because - at least AFAICS - we don't
> exit(128) anywhere.

IIRC, in POSIX-compliant shells there's a specific convention about what
exit(128) means, and it's something that could result from exec()
failure. It might be too much of a stretch to suppose that Windows is
following that, but if it is, that would square with your idea that this
is happening during child process startup.

> One way we could possibly improve the situation
> is to not treat this as a child crash - that is, don't do a
> crash-and-restart cycle; just treat that backend as having done
> elog(FATAL).

That seems to me like a great idea for decreasing reliability, not
increasing it. If you mistakenly classify a child death as "not
a crash" then you're really seriously hosed; the best outcome you
can hope for is that the database freezes up without doing any
major damage to itself.

Furthermore, even if it is an early exit and you can afford to ignore
it, the client side is still going to see a dropped connection and tell
the user that the server crashed, and we're still going to get bug
reports about that.

I would be inclined to write this off as Windows randomness that's
unfixable on our end. We could recommend that people take a closer
look at what AV software they have installed and maybe try some other
one.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-23 15:14:00
Message-ID: AANLkTikD-VyRmAGV_2-nG-fpsU9qeLp7+u+Ls6uvhpEC@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Aug 23, 2010 at 17:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> After some discussion with Magnus, I think what is going on here is
>> that the postmaster kicks off a new child process, which terminates
>> before it actually starts running our code, either in OS-supplied code
>> or some sort of "filter" like anti-spam or anti-virus software.  It's
>> presumably NOT dying in our code because - at least AFAICS - we don't
>> exit(128) anywhere.
>
> IIRC, in POSIX-compliant shells there's a specific convention about what
> exit(128) means, and it's something that could result from exec()
> failure.  It might be too much of a stretch to suppose that Windows is
> following that, but if it is, that would square with your idea that this
> is happening during child process startup.

It is (assuming the idea is correct).

The problem is that the error code is not delivered at CreateProcess()
time - it's delivered later.

>> One way we could possibly improve the situation
>> is to not treat this as a child crash - that is, don't do a
>> crash-and-restart cycle; just treat that backend as having done
>> elog(FATAL).
>
> That seems to me like a great idea for decreasing reliability, not
> increasing it.  If you mistakenly classify a child death as "not
> a crash" then you're really seriously hosed; the best outcome you
> can hope for is that the database freezes up without doing any
> major damage to itself.
>
> Furthermore, even if it is an early exit and you can afford to ignore
> it, the client side is still going to see a dropped connection and tell
> the user that the server crashed, and we're still going to get bug
> reports about that.

Yes, but it's Less Evil.

> I would be inclined to write this off as Windows randomness that's
> unfixable on our end.  We could recommend that people take a closer
> look at what AV software they have installed and maybe try some other
> one.

It may well be, but we can at least attempt to mitigate it, no?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-23 15:37:04
Message-ID: 21786.1282577824@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Mon, Aug 23, 2010 at 17:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I would be inclined to write this off as Windows randomness that's
>> unfixable on our end. We could recommend that people take a closer
>> look at what AV software they have installed and maybe try some other
>> one.

> It may well be, but we can at least attempt to mitigate it, no?

I'm not excited about a "mitigation" approach that introduces new
data-loss hazards of its very own. That doesn't meet the Less Evil
standard in my eyes.

[ thinks for a bit... ] Although maybe it'd be all right to piggyback
on the dead-man-switch code that already exists in pmsignal.c. If the
child process hasn't got as far as doing MarkPostmasterChildActive,
then in principle it should be okay to assume it hasn't touched shared
memory. This really is independent of what exit code it returned.

regards, tom lane


From: Cristian Bittel <cbittel(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-23 18:07:27
Message-ID: AANLkTimUiDEVrNg0AdiJ=x2R9Rrob+ju4+K7T5VGOGuB@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

From the users point of view, this could be a Windows or AV issue, but just
stops Postgres service, does not affect or interfire on Windows stability or
AV stability, instead it affect your product. So if you can improve the
stability of the service (and data integrity at the most) it could be a
benefic for all.

I've found the same behavior on Postgres service when clossing MSTSC session
without any AV installed, and after some months of Postgres crashes,
administrators installed Kaspersky for Servers AV, and crashes are still
there.

Cristian.

2010/8/23 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > On Mon, Aug 23, 2010 at 17:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> I would be inclined to write this off as Windows randomness that's
> >> unfixable on our end. We could recommend that people take a closer
> >> look at what AV software they have installed and maybe try some other
> >> one.
>
> > It may well be, but we can at least attempt to mitigate it, no?
>
> I'm not excited about a "mitigation" approach that introduces new
> data-loss hazards of its very own. That doesn't meet the Less Evil
> standard in my eyes.
>
> [ thinks for a bit... ] Although maybe it'd be all right to piggyback
> on the dead-man-switch code that already exists in pmsignal.c. If the
> child process hasn't got as far as doing MarkPostmasterChildActive,
> then in principle it should be okay to assume it hasn't touched shared
> memory. This really is independent of what exit code it returned.
>
> regards, tom lane
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-23 18:32:47
Message-ID: AANLkTinnCZYS5Cz_Pn2ht6_ovMFVr-69G_fJWw2WP7VP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Aug 23, 2010 at 11:37 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Mon, Aug 23, 2010 at 17:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I would be inclined to write this off as Windows randomness that's
>>> unfixable on our end.  We could recommend that people take a closer
>>> look at what AV software they have installed and maybe try some other
>>> one.
>
>> It may well be, but we can at least attempt to mitigate it, no?
>
> I'm not excited about a "mitigation" approach that introduces new
> data-loss hazards of its very own.  That doesn't meet the Less Evil
> standard in my eyes.
>
> [ thinks for a bit... ]  Although maybe it'd be all right to piggyback
> on the dead-man-switch code that already exists in pmsignal.c.  If the
> child process hasn't got as far as doing MarkPostmasterChildActive,
> then in principle it should be okay to assume it hasn't touched shared
> memory.  This really is independent of what exit code it returned.

I'm confused. That seems like it would be LESS safe than the proposed
approach of taking a mutex just before mapping shared memory. There
is some finite amount of code that executes after shared memory is
mapped and before MarkPostmasterChildActive executes; the advantage of
the mutex is that it can be taken BEFORE shared memory is mapped. On
the other hand, if you think it's safe enough, it would certainly be
nice to use an existing mechanism rather than inventing something
totally new.

I agree that the exit code is irrelevant.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 12:57:34
Message-ID: 201008241257.o7OCvYt12456@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Robert Haas wrote:
> [moving to -hackers]
>
> On Thu, Aug 19, 2010 at 9:43 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > I suspect this is the same problem as bug #4897, and probably also the
> > same problem as this:
> > http://archives.postgresql.org/pgsql-bugs/2009-08/msg00114.php
> >
> > and maybe also this and this:
> > http://archives.postgresql.org/pgsql-bugs/2010-02/msg00179.php
> > http://archives.postgresql.org/pgsql-admin/2009-05/msg00105.php
> >
> > Unfortunately, it seems that no one has been able to get a stack trace yet.
>
> Bruce pointed out yet another report of this problem to me:
>
> http://archives.postgresql.org/pgsql-general/2010-08/msg00550.php
>
> After some discussion with Magnus, I think what is going on here is
> that the postmaster kicks off a new child process, which terminates
> before it actually starts running our code, either in OS-supplied code
> or some sort of "filter" like anti-spam or anti-virus software. It's
> presumably NOT dying in our code because - at least AFAICS - we don't
> exit(128) anywhere. One way we could possibly improve the situation
> is to not treat this as a child crash - that is, don't do a
> crash-and-restart cycle; just treat that backend as having done
> elog(FATAL). The trick is that you need a reliable way to distinguish
> between a regular child crash and an "early" child crash. Magnus
> suggested perhaps we could create a mutex that the child grabs before
> mapping shared memory; the postmaster could check whether the mutex
> had been taken. If so, we handle the crash normally; if not, we just
> chalk it up to experience and continue on.
>
> This isn't really a "fix" for the bug in the sense that the nicest
> thing of all would be to prevent the child from exiting abnormally in
> the first place. But it's far from clear that we can control that.

This URL has some interesting details on our problem:

http://stackoverflow.com/questions/139090/getexitcodeprocess-returns-128

Error code 128 is identified as:

error code 128 RROR_WAIT_NO_CHILDREN 128 0x80 There are no child
processes to wait for

and the suggested cause is:

Have a look at Desktop Heap memory.

Essentially the desktop heap issue comes down to exhausted resources (eg
starting too many processes). When your app runs out of these resources,
one of the symptoms is that you won't be able to start a new process,
and the call to CreateProcess will fail with code 128.

My guess is that at the time of CreateProcess(), there is enough desktop
heap memory, but at some later time, perhaps caused by a logout, there
isn't and the process never gets started.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 13:38:43
Message-ID: AANLkTikaaAR_g45N8xA5OHN9PjZUa+7AziQJotfvyqvJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 24, 2010 at 8:57 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Robert Haas wrote:
>> [moving to -hackers]
>>
>> On Thu, Aug 19, 2010 at 9:43 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> > I suspect this is the same problem as bug #4897, and probably also the
>> > same problem as this:
>> > http://archives.postgresql.org/pgsql-bugs/2009-08/msg00114.php
>> >
>> > and maybe also this and this:
>> > http://archives.postgresql.org/pgsql-bugs/2010-02/msg00179.php
>> > http://archives.postgresql.org/pgsql-admin/2009-05/msg00105.php
>> >
>> > Unfortunately, it seems that no one has been able to get a stack trace yet.
>>
>> Bruce pointed out yet another report of this problem to me:
>>
>> http://archives.postgresql.org/pgsql-general/2010-08/msg00550.php
>>
>> After some discussion with Magnus, I think what is going on here is
>> that the postmaster kicks off a new child process, which terminates
>> before it actually starts running our code, either in OS-supplied code
>> or some sort of "filter" like anti-spam or anti-virus software.  It's
>> presumably NOT dying in our code because - at least AFAICS - we don't
>> exit(128) anywhere.  One way we could possibly improve the situation
>> is to not treat this as a child crash - that is, don't do a
>> crash-and-restart cycle; just treat that backend as having done
>> elog(FATAL).  The trick is that you need a reliable way to distinguish
>> between a regular child crash and an "early" child crash.  Magnus
>> suggested perhaps we could create a mutex that the child grabs before
>> mapping shared memory; the postmaster could check whether the mutex
>> had been taken.  If so, we handle the crash normally; if not, we just
>> chalk it up to experience and continue on.
>>
>> This isn't really a "fix" for the bug in the sense that the nicest
>> thing of all would be to prevent the child from exiting abnormally in
>> the first place.  But it's far from clear that we can control that.
>
> This URL has some interesting details on our problem:
>
>        http://stackoverflow.com/questions/139090/getexitcodeprocess-returns-128
>
> Error code 128 is identified as:
>
>        error code 128 RROR_WAIT_NO_CHILDREN 128 0x80 There are no child
>        processes to wait for
>
> and the suggested cause is:
>
>        Have a look at Desktop Heap memory.
>
>        Essentially the desktop heap issue comes down to exhausted resources (eg
>        starting too many processes). When your app runs out of these resources,
>        one of the symptoms is that you won't be able to start a new process,
>        and the call to CreateProcess will fail with code 128.
>
> My guess is that at the time of CreateProcess(), there is enough desktop
> heap memory, but at some later time, perhaps caused by a logout, there
> isn't and the process never gets started.

Yeah, that seems very plausible, although exactly how to verify I don't know.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 13:43:00
Message-ID: 201008241343.o7ODh0N21461@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Robert Haas wrote:
> >> This isn't really a "fix" for the bug in the sense that the nicest
> >> thing of all would be to prevent the child from exiting abnormally in
> >> the first place. ?But it's far from clear that we can control that.
> >
> > This URL has some interesting details on our problem:
> >
> > ? ? ? ?http://stackoverflow.com/questions/139090/getexitcodeprocess-returns-128
> >
> > Error code 128 is identified as:
> >
> > ? ? ? ?error code 128 RROR_WAIT_NO_CHILDREN 128 0x80 There are no child
> > ? ? ? ?processes to wait for
> >
> > and the suggested cause is:
> >
> > ? ? ? ?Have a look at Desktop Heap memory.
> >
> > ? ? ? ?Essentially the desktop heap issue comes down to exhausted resources (eg
> > ? ? ? ?starting too many processes). When your app runs out of these resources,
> > ? ? ? ?one of the symptoms is that you won't be able to start a new process,
> > ? ? ? ?and the call to CreateProcess will fail with code 128.
> >
> > My guess is that at the time of CreateProcess(), there is enough desktop
> > heap memory, but at some later time, perhaps caused by a logout, there
> > isn't and the process never gets started.
>
> Yeah, that seems very plausible, although exactly how to verify I don't know.

And here is confirmation from the Microsoft web site:

http://support.microsoft.com/kb/156484

Cmd.exe, Perl.exe, or other console-mode applications may fail to
initialize properly and terminate prematurely when launched by a service
using the CreateProcess() or CreateProcessAsUser() APIs. The calling
process has no way of knowing that the launched console-mode application
has terminated prematurely.

In some instances, calling GetExitCode() against the failed process
indicates the following exit code:
128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
...
Internet Information Server (IIS) may exhibit this problem
intermittently when processing CGI or Perl scripts. In this case the
browser returns the following error when executing CGI scripts:

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 13:58:52
Message-ID: 28002.1282658332@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Robert Haas wrote:
>> Yeah, that seems very plausible, although exactly how to verify I don't know.

> And here is confirmation from the Microsoft web site:

> In some instances, calling GetExitCode() against the failed process
> indicates the following exit code:
> 128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.

Given the existence of the deadman switch mechanism (which I hadn't
remembered when this thread started), I'm coming around to the idea that
we could just treat exit(128) as nonfatal on Windows. If for some
reason the child hadn't died instantly at startup, the deadman switch
would distinguish that from the case described here.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 14:02:41
Message-ID: 201008241402.o7OE2ft26888@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Robert Haas wrote:
> >> Yeah, that seems very plausible, although exactly how to verify I don't know.
>
> > And here is confirmation from the Microsoft web site:
>
> > In some instances, calling GetExitCode() against the failed process
> > indicates the following exit code:
> > 128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
>
> Given the existence of the deadman switch mechanism (which I hadn't
> remembered when this thread started), I'm coming around to the idea that
> we could just treat exit(128) as nonfatal on Windows. If for some
> reason the child hadn't died instantly at startup, the deadman switch
> would distinguish that from the case described here.

Agreed. My guess is that there is some kind of Win32 OS race condition
in allocating desktop heap memory, and that sometimes with concurrent
CreateProcess() calls, a process gets started but can't complete its
creation.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 14:03:48
Message-ID: 201008241403.o7OE3mL27074@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Robert Haas wrote:
> >> Yeah, that seems very plausible, although exactly how to verify I don't know.
>
> > And here is confirmation from the Microsoft web site:
>
> > In some instances, calling GetExitCode() against the failed process
> > indicates the following exit code:
> > 128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
>
> Given the existence of the deadman switch mechanism (which I hadn't
> remembered when this thread started), I'm coming around to the idea that
> we could just treat exit(128) as nonfatal on Windows. If for some
> reason the child hadn't died instantly at startup, the deadman switch
> would distinguish that from the case described here.

Here is a more detailed explaination of the failure and its relation to
desktop heap:

http://kbalertz.com/Feedback.aspx?kbNumber=184802

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 19:10:25
Message-ID: AANLkTik9JaGK1AysdqTkx=iNiVcaEBy4-CBS8P4BYXza@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 24, 2010 at 15:58, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> Robert Haas wrote:
>>> Yeah, that seems very plausible, although exactly how to verify I don't know.
>
>> And here is confirmation from the Microsoft web site:
>
>>       In some instances, calling GetExitCode() against the failed process
>>       indicates the following exit code:
>>       128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
>
> Given the existence of the deadman switch mechanism (which I hadn't
> remembered when this thread started), I'm coming around to the idea that
> we could just treat exit(128) as nonfatal on Windows.  If for some
> reason the child hadn't died instantly at startup, the deadman switch
> would distinguish that from the case described here.

Just because I had written it before you posted that, here's how the
win32-specific-set-a-flag-when-we're-in-control thing would look. But
if we're convinced that just ignoring error 128 is safe, then that's
obviously a simpler patch..

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment Content-Type Size
win32_early_death.patch application/octet-stream 8.8 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 19:14:14
Message-ID: 201008241914.o7OJEEf18070@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Magnus Hagander wrote:
> On Tue, Aug 24, 2010 at 15:58, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> >> Robert Haas wrote:
> >>> Yeah, that seems very plausible, although exactly how to verify I don't know.
> >
> >> And here is confirmation from the Microsoft web site:
> >
> >> ? ? ? In some instances, calling GetExitCode() against the failed process
> >> ? ? ? indicates the following exit code:
> >> ? ? ? 128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
> >
> > Given the existence of the deadman switch mechanism (which I hadn't
> > remembered when this thread started), I'm coming around to the idea that
> > we could just treat exit(128) as nonfatal on Windows. ?If for some
> > reason the child hadn't died instantly at startup, the deadman switch
> > would distinguish that from the case described here.
>
> Just because I had written it before you posted that, here's how the
> win32-specific-set-a-flag-when-we're-in-control thing would look. But
> if we're convinced that just ignoring error 128 is safe, then that's
> obviously a simpler patch..

Can we please link to one of those URLs I mentioned so we have
definitive information on what is happening? I think the Microsoft URL is
best:

http://support.microsoft.com/kb/156484

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 19:16:29
Message-ID: AANLkTikuW1=CLO8Jch2ERKPMkScJAZbASHkHAgxhHbU5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 24, 2010 at 21:14, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Magnus Hagander wrote:
>> On Tue, Aug 24, 2010 at 15:58, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> >> Robert Haas wrote:
>> >>> Yeah, that seems very plausible, although exactly how to verify I don't know.
>> >
>> >> And here is confirmation from the Microsoft web site:
>> >
>> >> ? ? ? In some instances, calling GetExitCode() against the failed process
>> >> ? ? ? indicates the following exit code:
>> >> ? ? ? 128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
>> >
>> > Given the existence of the deadman switch mechanism (which I hadn't
>> > remembered when this thread started), I'm coming around to the idea that
>> > we could just treat exit(128) as nonfatal on Windows. ?If for some
>> > reason the child hadn't died instantly at startup, the deadman switch
>> > would distinguish that from the case described here.
>>
>> Just because I had written it before you posted that, here's how the
>> win32-specific-set-a-flag-when-we're-in-control thing would look. But
>> if we're convinced that just ignoring error 128 is safe, then that's
>> obviously a simpler patch..
>
> Can we please link to one of those URLs I mentioned so we have
> definitive information on what is happening?  I think the Microsoft URL is
> best:
>
>        http://support.microsoft.com/kb/156484

That URL is specifically labeled to only be valid for NT4.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 19:39:13
Message-ID: AANLkTikROhwVsyG9VBD_8-zx2PBoi4jLK7GkJR3FXBU5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 24, 2010 at 3:10 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Tue, Aug 24, 2010 at 15:58, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>>> Robert Haas wrote:
>>>> Yeah, that seems very plausible, although exactly how to verify I don't know.
>>
>>> And here is confirmation from the Microsoft web site:
>>
>>>       In some instances, calling GetExitCode() against the failed process
>>>       indicates the following exit code:
>>>       128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
>>
>> Given the existence of the deadman switch mechanism (which I hadn't
>> remembered when this thread started), I'm coming around to the idea that
>> we could just treat exit(128) as nonfatal on Windows.  If for some
>> reason the child hadn't died instantly at startup, the deadman switch
>> would distinguish that from the case described here.
>
> Just because I had written it before you posted that, here's how the
> win32-specific-set-a-flag-when-we're-in-control thing would look. But
> if we're convinced that just ignoring error 128 is safe, then that's
> obviously a simpler patch..

So, if we do this, what will happen to the client connection that was
due to be handled by the backend being spawned? Is this going to lead
to extra fds accumulating or any such thing?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 19:40:59
Message-ID: AANLkTimTZiJjS24Vmz9YtEbxVLCQu8wsKQx2HMJG_DXC@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 24, 2010 at 21:39, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Aug 24, 2010 at 3:10 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Tue, Aug 24, 2010 at 15:58, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>>>> Robert Haas wrote:
>>>>> Yeah, that seems very plausible, although exactly how to verify I don't know.
>>>
>>>> And here is confirmation from the Microsoft web site:
>>>
>>>>       In some instances, calling GetExitCode() against the failed process
>>>>       indicates the following exit code:
>>>>       128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
>>>
>>> Given the existence of the deadman switch mechanism (which I hadn't
>>> remembered when this thread started), I'm coming around to the idea that
>>> we could just treat exit(128) as nonfatal on Windows.  If for some
>>> reason the child hadn't died instantly at startup, the deadman switch
>>> would distinguish that from the case described here.
>>
>> Just because I had written it before you posted that, here's how the
>> win32-specific-set-a-flag-when-we're-in-control thing would look. But
>> if we're convinced that just ignoring error 128 is safe, then that's
>> obviously a simpler patch..
>
> So, if we do this, what will happen to the client connection that was
> due to be handled by the backend being spawned?  Is this going to lead
> to extra fds accumulating or any such thing?

I don't see why. The process goes away, and with it goes all the
handles. And the postmaster still closes all sockets and handles the
same way it did before.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 20:53:35
Message-ID: AANLkTinrLrFfdcCi_0EPTjyXuZG-DPiK4z6D3sEfcvOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 24, 2010 at 9:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> Robert Haas wrote:
>>> Yeah, that seems very plausible, although exactly how to verify I don't know.
>
>> And here is confirmation from the Microsoft web site:
>
>>       In some instances, calling GetExitCode() against the failed process
>>       indicates the following exit code:
>>       128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
>
> Given the existence of the deadman switch mechanism (which I hadn't
> remembered when this thread started), I'm coming around to the idea that
> we could just treat exit(128) as nonfatal on Windows.  If for some
> reason the child hadn't died instantly at startup, the deadman switch
> would distinguish that from the case described here.

So the options are:

(1) If running on Windows and the exit code is 128 and the deadman
switch is not engaged, don't crash-and-restart.
(2) If running on Windows, create a mutex in the parent process and
take it in the child; if the mutex has not been taken, don't
crash-and-restart.

There is some amount of user code (I'm not sure preceisely how much)
that runs after shared memory is mapped and before the deadman switch
is engaged. If we go with option #1, it would probably behoove us to
try to minimize the amount of such code (at least in HEAD). There is
probably not a great deal of danger that we could manage to scribble
on shared memory and then exit normally (rather than via signal),
never mind the need to exit with exactly 128. But "not a great deal"
is not the same as "none". If we go with option #2, the principal
danger seems to be that the code Magnus wrote will turn out to be less
robust than we might hope; for example, it might not work on all
versions of Windows, or be prone to some other installation-dependent
mischief.

Another question is how far either of these fixes could be
back-patched. I believe the dead-man switch only exists as far back
as 8.4, but the original commit message mentioned the possibility of
eventually back-patching it further:

Although this problem is of long standing, the lack of field complaints
seems to mean it's not critical enough to risk back-patching; at least
not till we get some more testing of this mechanism.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 21:11:38
Message-ID: 4901.1282684298@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> There is some amount of user code (I'm not sure preceisely how much)
> that runs after shared memory is mapped and before the deadman switch
> is engaged.

Er ... what would you define as "user code"?

The deadman switch is engaged at the point where we create a PGPROC.
Before that, it's entirely impossible to take either LWLocks or
heavyweight locks, which means that practically any access to shared
memory would be illegal anyway. If there's anything very interesting
going on in that stretch, I'd be surprised.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-25 00:17:15
Message-ID: AANLkTikXwsBsTgAX-9gtnjtW6qRT2dMAv3+Qe3eTPTCE@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 24, 2010 at 5:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> There is some amount of user code (I'm not sure preceisely how much)
>> that runs after shared memory is mapped and before the deadman switch
>> is engaged.
>
> Er ... what would you define as "user code"?

Our code, as opposed to the failure-inducing boatload of crap injected
by the operating system.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: David Fetter <david(at)fetter(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-25 01:18:46
Message-ID: 20100825011846.GA13478@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 24, 2010 at 08:17:15PM -0400, Robert Haas wrote:
> On Tue, Aug 24, 2010 at 5:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> There is some amount of user code (I'm not sure preceisely how
> >> much) that runs after shared memory is mapped and before the
> >> deadman switch is engaged.
> >
> > Er ... what would you define as "user code"?
>
> Our code, as opposed to the failure-inducing boatload of crap
> injected by the operating system.

Don't hold back. Tell us how you *really* feel ;)

Cheers,
David (who thinks Robert's view of that platform may be a good deal
too sunny)
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-26 18:47:36
Message-ID: AANLkTi=r1NKx6EuX4C92snJ0WHWTTiwdMVx6T_sfaG5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 24, 2010 at 9:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> Robert Haas wrote:
>>> Yeah, that seems very plausible, although exactly how to verify I don't know.
>
>> And here is confirmation from the Microsoft web site:
>
>>       In some instances, calling GetExitCode() against the failed process
>>       indicates the following exit code:
>>       128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
>
> Given the existence of the deadman switch mechanism (which I hadn't
> remembered when this thread started), I'm coming around to the idea that
> we could just treat exit(128) as nonfatal on Windows.  If for some
> reason the child hadn't died instantly at startup, the deadman switch
> would distinguish that from the case described here.

So do you want to code this up?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-26 18:56:38
Message-ID: 26313.1282848998@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Aug 24, 2010 at 9:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Given the existence of the deadman switch mechanism (which I hadn't
>> remembered when this thread started), I'm coming around to the idea that
>> we could just treat exit(128) as nonfatal on Windows. If for some
>> reason the child hadn't died instantly at startup, the deadman switch
>> would distinguish that from the case described here.

> So do you want to code this up?

Who, me? I don't do Windows --- I'd have no way to test it.

regards, tom lane


From: Cristian Bittel <cbittel(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-26 20:59:59
Message-ID: AANLkTikOPEGrF6dAZkYx1=vjmu2oUdyikg=ydXOnr0Sx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I still believe this "exit code 128" is related to pgAdmin opened during the
clossing session on Remote Desktop. I have a Windows user login wich is not
administrator just no privileged user, it cannot start/stop services, just
monitoring. With pgAdmin window opened inside my disconected session, as
Administrator if I "close" the another disconnected session, Postgres exit
with 128 code.

Did you reproduce this behavior?

Cristian.

2010/8/26 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Tue, Aug 24, 2010 at 9:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Given the existence of the deadman switch mechanism (which I hadn't
> >> remembered when this thread started), I'm coming around to the idea that
> >> we could just treat exit(128) as nonfatal on Windows. If for some
> >> reason the child hadn't died instantly at startup, the deadman switch
> >> would distinguish that from the case described here.
>
> > So do you want to code this up?
>
> Who, me? I don't do Windows --- I'd have no way to test it.
>
> regards, tom lane
>


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Cristian Bittel <cbittel(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-29 11:05:17
Message-ID: AANLkTinqunS_95Puq_3pDBXf_BKM0xxRbTVCVLUhfx16@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Aug 26, 2010 at 22:59, Cristian Bittel <cbittel(at)gmail(dot)com> wrote:
> I still believe this "exit code 128" is related to pgAdmin opened during the
> clossing session on Remote Desktop. I have a Windows user login wich is not
> administrator just no privileged user, it cannot start/stop services, just
> monitoring. With pgAdmin window opened inside my disconected session, as
> Administrator if I "close" the another disconnected session, Postgres exit
> with 128 code.

If the closing of a session on the remote desktop can affect a
*service* then frankly that sounds like a serious isolation bug in
Windows itself. The postmaster grabs the handle of the process when
it's started and waits on that - that should never be affected by
something in a different session.

I think it's more likely that Windows just looses track when you
terminate a lot of processes at once, and randomly kills off something
- or at least *indicates* that something has been killed off.

> Did you reproduce this behavior?

No, AFAIK nobody has managed to reproduce this behavior in any kind of
consistent way. It's certainly been seen more than once in many
places, but not consistently reproducible.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Cristian Bittel <cbittel(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-31 13:18:15
Message-ID: AANLkTimjBjegGH3KSihTKuQcwEUL6n7keG+hfMfLsGmx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Sun, Aug 29, 2010 at 12:05 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Thu, Aug 26, 2010 at 22:59, Cristian Bittel <cbittel(at)gmail(dot)com> wrote:
>> I still believe this "exit code 128" is related to pgAdmin opened during the
>> clossing session on Remote Desktop. I have a Windows user login wich is not
>> administrator just no privileged user, it cannot start/stop services, just
>> monitoring. With pgAdmin window opened inside my disconected session, as
>> Administrator if I "close" the another disconnected session, Postgres exit
>> with 128 code.
>
> If the closing of a session on the remote desktop can affect a
> *service* then frankly that sounds like a serious isolation bug in
> Windows itself. The postmaster grabs the handle of the process when
> it's started and waits on that - that should never be affected by
> something in a different session.
>
> I think it's more likely that Windows just looses track when you
> terminate a lot of processes at once, and randomly kills off something
> - or at least *indicates* that something has been killed off.
>
>> Did you reproduce this behavior?
>
> No, AFAIK nobody has managed to reproduce this behavior in any kind of
> consistent way. It's certainly been seen more than once in many
> places, but not consistently reproducible.

This behaviour, no - but desktop heap exhaustion is very easy to
reproduce. That's because the heap usage is caused by user32.dll which
uses a consistent amount with each process started, which is allocated
as the process is created. When I was working on the issue a couple of
years ago, it was entirely predictable - user32.dll allocates N bytes
and as soon as N * numbackends exceeds the allocated heap size, we
fall over.

It shouldn't matter as desktop heap is allocated on a per-session
basis, but are you logging on using the service account to run your
admin tasks Cristian? If so, do you see the problem if you login
interactively using a different account?

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Cristian Bittel <cbittel(at)gmail(dot)com>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-31 14:40:33
Message-ID: AANLkTi=-H6Ud8es4Q-kTAECf2VpbWKxvjsTGYz5eawRa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I am the "remote" support guy for a web developed application
(Apache+PHP+Pg. Postgres is isolated on a server, Apache runs on another
server), and installed at our client, our client is the Administrator user
on Windows Server, I just have a limited privileges Windows user for
monitoring. I have my own "support" superuser (not "postgres" user) for
Postgres database to monitor the status, logs and to perform stats queries.

To Windows Server I just can login using remote desktop, my interactive user
cannot start or stop the PostgreSQL service or other services, just
Administrators users can do it.

From inside my underprivileged session on the Windows server I can open
pgAdmin and connect to Postgres service. When I left the pgAdmin connected
to Postgres service opened into the Windows session (session connected or
disconnected) and I or someone else (Administrators) "close" my session,
then is when PostgreSQL service crash. If inside the remote session I
normally close pgAdmin using the "X" button or File>Exit or "Ctrl+Q", that
not affect PostgreSQL service.

This is the major reason to think is pgAdmin.exe when forced shutdown by
terminating Windows session which sends abnormal signal to PostgreSQL
service.

Besides the abnormal signal that pgAdmin forced shutingdown could being send
to PostgreSQL service, the service itself also could catch that behavior in
any of the aproaches you are discussing for the service itself to ignore
that signal.

To Dave's question, this behavior occurs on all Windows Server interactive
sessions, no matter if Administrators or underpriviledge users, but is
related to closing Windows interactive session while pgAdmin window is
opened and connected to service. Nobody logon to Windows using "postgres"
service user.

Regards,

Cristian.

2010/8/31 Dave Page <dpage(at)pgadmin(dot)org>

> On Sun, Aug 29, 2010 at 12:05 PM, Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
> > On Thu, Aug 26, 2010 at 22:59, Cristian Bittel <cbittel(at)gmail(dot)com>
> wrote:
> >> I still believe this "exit code 128" is related to pgAdmin opened during
> the
> >> clossing session on Remote Desktop. I have a Windows user login wich is
> not
> >> administrator just no privileged user, it cannot start/stop services,
> just
> >> monitoring. With pgAdmin window opened inside my disconected session, as
> >> Administrator if I "close" the another disconnected session, Postgres
> exit
> >> with 128 code.
> >
> > If the closing of a session on the remote desktop can affect a
> > *service* then frankly that sounds like a serious isolation bug in
> > Windows itself. The postmaster grabs the handle of the process when
> > it's started and waits on that - that should never be affected by
> > something in a different session.
> >
> > I think it's more likely that Windows just looses track when you
> > terminate a lot of processes at once, and randomly kills off something
> > - or at least *indicates* that something has been killed off.
> >
> >> Did you reproduce this behavior?
> >
> > No, AFAIK nobody has managed to reproduce this behavior in any kind of
> > consistent way. It's certainly been seen more than once in many
> > places, but not consistently reproducible.
>
> This behaviour, no - but desktop heap exhaustion is very easy to
> reproduce. That's because the heap usage is caused by user32.dll which
> uses a consistent amount with each process started, which is allocated
> as the process is created. When I was working on the issue a couple of
> years ago, it was entirely predictable - user32.dll allocates N bytes
> and as soon as N * numbackends exceeds the allocated heap size, we
> fall over.
>
> It shouldn't matter as desktop heap is allocated on a per-session
> basis, but are you logging on using the service account to run your
> admin tasks Cristian? If so, do you see the problem if you login
> interactively using a different account?
>
> --
> Dave Page
> Blog: http://pgsnake.blogspot.com
> Twitter: @pgsnake
>
> EnterpriseDB UK: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Cristian Bittel <cbittel(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-31 15:09:58
Message-ID: AANLkTi=bJeqsj4arc7kL3zbDN0PVvuOFruw+GUa7jPTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 31, 2010 at 3:40 PM, Cristian Bittel <cbittel(at)gmail(dot)com> wrote:
> To Dave's question, this behavior occurs on all Windows Server interactive
> sessions, no matter if Administrators or underpriviledge users, but is
> related to closing Windows interactive session while pgAdmin window is
> opened and connected to service. Nobody logon to Windows using "postgres"
> service user.

Thanks Cristian.

Can you reproduce the problem if you use psql instead of pgAdmin? Both
use libpq to talk to the server, so if your theory is correct, I would
expect to see the same crash. It's hard to see what would bring the
server down though...

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Cristian Bittel <cbittel(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-31 15:27:35
Message-ID: 201008311527.o7VFRZr28078@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Dave Page wrote:
> On Tue, Aug 31, 2010 at 3:40 PM, Cristian Bittel <cbittel(at)gmail(dot)com> wrote:
> > To Dave's question, this behavior occurs on all Windows Server interactive
> > sessions, no matter if Administrators or underpriviledge users, but is
> > related to closing Windows interactive session while pgAdmin window is
> > opened and connected to service. Nobody logon to Windows using "postgres"
> > service user.
>
> Thanks Cristian.
>
> Can you reproduce the problem if you use psql instead of pgAdmin? Both
> use libpq to talk to the server, so if your theory is correct, I would
> expect to see the same crash. It's hard to see what would bring the
> server down though...

We have already found that exceeding desktop heap might cause a
CreateProcess to return success but later fail with a return code of
128, which causes a server restart.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Cristian Bittel <cbittel(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-31 15:30:01
Message-ID: AANLkTik3ChtTY2BMhLpDPZoq_Xb_EZgF4S2S-86ZXL-8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 31, 2010 at 4:27 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Dave Page wrote:
>> On Tue, Aug 31, 2010 at 3:40 PM, Cristian Bittel <cbittel(at)gmail(dot)com> wrote:
>> > To Dave's question, this behavior occurs on all Windows Server interactive
>> > sessions, no matter if Administrators or underpriviledge users, but is
>> > related to closing Windows interactive session while pgAdmin window is
>> > opened and connected to service. Nobody logon to Windows using "postgres"
>> > service user.
>>
>> Thanks Cristian.
>>
>> Can you reproduce the problem if you use psql instead of pgAdmin? Both
>> use libpq to talk to the server, so if your theory is correct, I would
>> expect to see the same crash. It's hard to see what would bring the
>> server down though...
>
> We have already found that exceeding desktop heap might cause a
> CreateProcess to return success but later fail with a return code of
> 128, which causes a server restart.

That doesn't mean that this is desktop heap exhaustion though - just
that it can cause the same effect.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Cristian Bittel <cbittel(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-31 15:35:34
Message-ID: 201008311535.o7VFZYN29394@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Dave Page wrote:
> On Tue, Aug 31, 2010 at 4:27 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Dave Page wrote:
> >> On Tue, Aug 31, 2010 at 3:40 PM, Cristian Bittel <cbittel(at)gmail(dot)com> wrote:
> >> > To Dave's question, this behavior occurs on all Windows Server interactive
> >> > sessions, no matter if Administrators or underpriviledge users, but is
> >> > related to closing Windows interactive session while pgAdmin window is
> >> > opened and connected to service. Nobody logon to Windows using "postgres"
> >> > service user.
> >>
> >> Thanks Cristian.
> >>
> >> Can you reproduce the problem if you use psql instead of pgAdmin? Both
> >> use libpq to talk to the server, so if your theory is correct, I would
> >> expect to see the same crash. It's hard to see what would bring the
> >> server down though...
> >
> > We have already found that exceeding desktop heap might cause a
> > CreateProcess to return success but later fail with a return code of
> > 128, which causes a server restart.
>
> That doesn't mean that this is desktop heap exhaustion though - just
> that it can cause the same effect.

Right, but it is the only possible server crash cause we have come up
with so far.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Cristian Bittel <cbittel(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-31 15:50:38
Message-ID: AANLkTint9DHxWmVROYf755hqb-yBoiNR5c_aoYMjWLv+@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 31, 2010 at 4:35 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Dave Page wrote:
>> On Tue, Aug 31, 2010 at 4:27 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> > We have already found that exceeding desktop heap might cause a
>> > CreateProcess to return success but later fail with a return code of
>> > 128, which causes a server restart.
>>
>> That doesn't mean that this is desktop heap exhaustion though - just
>> that it can cause the same effect.
>
> Right, but it is the only possible server crash cause we have come up
> with so far.

Understood - I'm just unconvinced it's the cause - aside from the
point I made earlier about heap exhaustion being very predictable and
reproducible (which this issue apparently is not), when the server is
run under the SCM, it creates a logon session for that service alone
which has it's own heap allocation which is entirely independent of
the allocation used by any interactive logon sessions.

So unless there's a major isolation bug in Windows, any desktop heap
usage in an interactive session for one user should have zero effect
on a non-interactive session for another user.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Cristian Bittel <cbittel(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-31 15:59:55
Message-ID: 201008311559.o7VFxtJ03879@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Dave Page wrote:
> On Tue, Aug 31, 2010 at 4:35 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Dave Page wrote:
> >> On Tue, Aug 31, 2010 at 4:27 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >> > We have already found that exceeding desktop heap might cause a
> >> > CreateProcess to return success but later fail with a return code of
> >> > 128, which causes a server restart.
> >>
> >> That doesn't mean that this is desktop heap exhaustion though - just
> >> that it can cause the same effect.
> >
> > Right, but it is the only possible server crash cause we have come up
> > with so far.
>
> Understood - I'm just unconvinced it's the cause - aside from the
> point I made earlier about heap exhaustion being very predictable and
> reproducible (which this issue apparently is not), when the server is
> run under the SCM, it creates a logon session for that service alone
> which has it's own heap allocation which is entirely independent of
> the allocation used by any interactive logon sessions.
>
> So unless there's a major isolation bug in Windows, any desktop heap
> usage in an interactive session for one user should have zero effect
> on a non-interactive session for another user.

Well, the only description that we have ever heard that makes sense is
some kind of heap exhaustion, perhaps triggered by a Windows bug that
doesn't properly track heap allocations sometimes.

Of course, the cause might be aliens, but we don't have any evidence of
that either. :-|

What we do know is that CreateProcess is returning success, and the
child is exiting with 128 no_such_child, and that logging out can
trigger it sometimes.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Cristian Bittel <cbittel(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-01 14:49:52
Message-ID: AANLkTinuwmzyf2UwF0RFkVyFdVaX-HaunJmRk6=Z2axg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Maybe the issue, for the momtent, could be avoided modifying the shared heap
for sessions on Windows. But I don't really have idea how much to increase
or decrease the values. Try and error? But, inside the opened Windows
sessions nothing alerts of a heap exaust so could be unpredictable how much
to change the values until the next PostgreSQL service crash...
32-bits: http://support.microsoft.com/kb/184802
<http://support.microsoft.com/kb/184802%20>

There are several reports for another services with the same behavior
including exit code 128 and a workaround to increase the heap on old Windows
versions but the Exit Code 128 seems to apply to Windows 2003 Server x64
also. And seems to be improved in Windows 2008 where heap is not fixed.
https://fogbugz.bitvise.com/default.asp?WinSSHD.1.12888.2
http://support.microsoft.com/kb/824422

2010/8/31 Bruce Momjian <bruce(at)momjian(dot)us>

> Dave Page wrote:
> > On Tue, Aug 31, 2010 at 4:35 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > > Dave Page wrote:
> > >> On Tue, Aug 31, 2010 at 4:27 PM, Bruce Momjian <bruce(at)momjian(dot)us>
> wrote:
> > >> > We have already found that exceeding desktop heap might cause a
> > >> > CreateProcess to return success but later fail with a return code of
> > >> > 128, which causes a server restart.
> > >>
> > >> That doesn't mean that this is desktop heap exhaustion though - just
> > >> that it can cause the same effect.
> > >
> > > Right, but it is the only possible server crash cause we have come up
> > > with so far.
> >
> > Understood - I'm just unconvinced it's the cause - aside from the
> > point I made earlier about heap exhaustion being very predictable and
> > reproducible (which this issue apparently is not), when the server is
> > run under the SCM, it creates a logon session for that service alone
> > which has it's own heap allocation which is entirely independent of
> > the allocation used by any interactive logon sessions.
> >
> > So unless there's a major isolation bug in Windows, any desktop heap
> > usage in an interactive session for one user should have zero effect
> > on a non-interactive session for another user.
>
> Well, the only description that we have ever heard that makes sense is
> some kind of heap exhaustion, perhaps triggered by a Windows bug that
> doesn't properly track heap allocations sometimes.
>
> Of course, the cause might be aliens, but we don't have any evidence of
> that either. :-|
>
> What we do know is that CreateProcess is returning success, and the
> child is exiting with 128 no_such_child, and that logging out can
> trigger it sometimes.
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + It's impossible for everything to be true. +
>


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Cristian Bittel <cbittel(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-01 15:13:42
Message-ID: AANLkTimQ688ZHfZ9F=b2g6GzG=sULbTKtanREHBXmf=D@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Sep 1, 2010 at 3:49 PM, Cristian Bittel <cbittel(at)gmail(dot)com> wrote:
> Maybe the issue, for the momtent, could be avoided modifying the shared heap
> for sessions on Windows. But I don't really have idea how much to increase
> or decrease the values. Try and error? But, inside the opened Windows
> sessions nothing alerts of a heap exaust so could be unpredictable how much
> to change the values until the next PostgreSQL service crash...
> 32-bits: http://support.microsoft.com/kb/184802
>
> There are several reports for another services with the same behavior
> including exit code 128 and a workaround to increase the heap on old Windows
> versions but the Exit Code 128 seems to apply to Windows 2003 Server x64
> also. And seems to be improved in Windows 2008 where heap is not fixed.
> https://fogbugz.bitvise.com/default.asp?WinSSHD.1.12888.2
> http://support.microsoft.com/kb/824422

Given the unpredictability, if this is connected to desktop heap I
don't think it's running out of per-session memory, so much as the
system-wide heap (which, afaict, is fixed at 48MB). That might explain
why a desktop session could affect other sessions.

Is this a terminal server, with lots of interactive users? Can you
check the heap usage using the desktop heap monitor:
http://www.microsoft.com/downloads/details.aspx?familyid=5cfc9b74-97aa-4510-b4b9-b2dc98c8ed8b&displaylang=en

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-09 17:35:59
Message-ID: AANLkTikihcFNVDBtnPXsE7RaXaa5gweHCvdGpqG8uC4t@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 24, 2010 at 15:58, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> Robert Haas wrote:
>>> Yeah, that seems very plausible, although exactly how to verify I don't know.
>
>> And here is confirmation from the Microsoft web site:
>
>>       In some instances, calling GetExitCode() against the failed process
>>       indicates the following exit code:
>>       128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
>
> Given the existence of the deadman switch mechanism (which I hadn't
> remembered when this thread started), I'm coming around to the idea that
> we could just treat exit(128) as nonfatal on Windows.  If for some
> reason the child hadn't died instantly at startup, the deadman switch
> would distinguish that from the case described here.

Just to be clear, do you mean something as simple as this?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment Content-Type Size
win32_128.patch application/octet-stream 833 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-09 17:48:42
Message-ID: 2521.1284054522@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Tue, Aug 24, 2010 at 15:58, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Given the existence of the deadman switch mechanism (which I hadn't
>> remembered when this thread started), I'm coming around to the idea that
>> we could just treat exit(128) as nonfatal on Windows. If for some
>> reason the child hadn't died instantly at startup, the deadman switch
>> would distinguish that from the case described here.

> Just to be clear, do you mean something as simple as this?

That seems like a rather klugy place and way to insert the fix. One
complaint about it is that the notice won't get logged nicely. It'd be
better if the main reaper() code was responsible for ignoring 128 so
that it could log the fact that it'd done so in the regular postmaster
log.

Another issue is that "nonfatal" doesn't mean "successful". In
particular, if this happened for the startup process, or probably some
other cases, taking the exit code as 0 would cause seriously wrong
things to happen.

On balance I think I'd suggest an #ifdef WIN32 in CleanupBackend that
made it accept 128 as a "normal exit" case. That would allow normal
processing to continue only when this happens to a regular backend,
which is probably sufficient for the purpose.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-09 17:57:35
Message-ID: AANLkTimVY9r6UnEKXjrf6xRZbq63ScvLcdWp74W1FHmK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Sep 9, 2010 at 19:48, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Tue, Aug 24, 2010 at 15:58, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Given the existence of the deadman switch mechanism (which I hadn't
>>> remembered when this thread started), I'm coming around to the idea that
>>> we could just treat exit(128) as nonfatal on Windows.  If for some
>>> reason the child hadn't died instantly at startup, the deadman switch
>>> would distinguish that from the case described here.
>
>> Just to be clear, do you mean something as simple as this?
>
> That seems like a rather klugy place and way to insert the fix.  One
> complaint about it is that the notice won't get logged nicely.  It'd be
> better if the main reaper() code was responsible for ignoring 128 so
> that it could log the fact that it'd done so in the regular postmaster
> log.

Agreed - I just wanted to throw it in somewhere for testing. Should've
mentioned htat.

> Another issue is that "nonfatal" doesn't mean "successful".  In
> particular, if this happened for the startup process, or probably some
> other cases, taking the exit code as 0 would cause seriously wrong
> things to happen.
>
> On balance I think I'd suggest an #ifdef WIN32 in CleanupBackend that
> made it accept 128 as a "normal exit" case.  That would allow normal
> processing to continue only when this happens to a regular backend,
> which is probably sufficient for the purpose.

Seems reasonable. I'll whack it around for that - see attached.

Dave has a reasonably reproducible test environment. Unforunately it's
on 8.3, so this patch will be completely unsafe there (it doesn't have
the deadman switch). But hopefully it can be used to see it fixes this
problem (while introducing others)h

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment Content-Type Size
win32_128.patch application/octet-stream 753 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-09 18:23:42
Message-ID: 3236.1284056622@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Thu, Sep 9, 2010 at 19:48, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> On balance I think I'd suggest an #ifdef WIN32 in CleanupBackend that
>> made it accept 128 as a "normal exit" case.

> Seems reasonable. I'll whack it around for that - see attached.

Hm, still doesn't log, which I think it should, even for testing
purposes (how will you know the case occurred?). Maybe like this:

/*
* If a backend dies in an ugly way then we must signal all other backends
* to quickdie. If exit status is zero (normal) or one (FATAL exit), we
* assume everything is all right and proceed to remove the backend from
* the active backend list.
+ *
+ * On Windows, also treat ERROR_WAIT_NO_CHILDREN (128) as a nonfatal
+ * case, since that sometimes happens under load.
*/
+#ifdef WIN32
+ if (exitstatus == ERROR_WAIT_NO_CHILDREN)
+ {
+ LogChildExit(LOG, _("server process"), pid, exitstatus);
+ exitstatus = 0;
+ }
+#endif
+
if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
{
HandleChildCrash(pid, exitstatus, _("server process"));
return;
}

> Dave has a reasonably reproducible test environment. Unforunately it's
> on 8.3, so this patch will be completely unsafe there (it doesn't have
> the deadman switch). But hopefully it can be used to see it fixes this
> problem (while introducing others)h

Sounds like a plan.

We're not so worried about this case that we'd want to backport the
deadman switch into 8.3 or 8.2 to have a fix there, are we?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-09 18:27:32
Message-ID: AANLkTinnaCT7YUntGot_6vJT9E99pPNm+ytSV0v7DbOz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Sep 9, 2010 at 2:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> We're not so worried about this case that we'd want to backport the
> deadman switch into 8.3 or 8.2 to have a fix there, are we?

I think we should consider backporting the deadman switch to 8.3 and 8.2.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-09 19:00:04
Message-ID: 4072.1284058804@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Sep 9, 2010 at 2:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> We're not so worried about this case that we'd want to backport the
>> deadman switch into 8.3 or 8.2 to have a fix there, are we?

> I think we should consider backporting the deadman switch to 8.3 and 8.2.

[ raised eyebrow... ] Weren't you the one just lecturing me about
minimizing changes in back branches?

That was a fairly large patch, and I *don't* want to back-port it.
The thrust of my question was more along the lines of whether we should
look for a different solution to the current problem, so that we would
have something that could be back-ported into 8.2 and 8.3. Personally
I'm satisfied with only fixing it in 8.4 and up, but then again I don't
use Windows.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-09 19:02:44
Message-ID: AANLkTimFL+X+9ywmyybigz+4uea4XxASxeO0A3TWAYZm@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Sep 9, 2010 at 21:00, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Sep 9, 2010 at 2:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> We're not so worried about this case that we'd want to backport the
>>> deadman switch into 8.3 or 8.2 to have a fix there, are we?
>
>> I think we should consider backporting the deadman switch to 8.3 and 8.2.
>
> [ raised eyebrow... ]  Weren't you the one just lecturing me about
> minimizing changes in back branches?
>
> That was a fairly large patch, and I *don't* want to back-port it.
> The thrust of my question was more along the lines of whether we should
> look for a different solution to the current problem, so that we would
> have something that could be back-ported into 8.2 and 8.3.  Personally
> I'm satisfied with only fixing it in 8.4 and up, but then again I don't
> use Windows.

Once we've shown that it works, I think we should look at doing
something for <= 8.3 as well.

How about something along the line of y previous patch (with the
event) for 8.2 and 8.3, and then this simplified one for 8.4+?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-09 19:08:40
Message-ID: 4242.1284059320@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Thu, Sep 9, 2010 at 21:00, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The thrust of my question was more along the lines of whether we should
>> look for a different solution to the current problem, so that we would
>> have something that could be back-ported into 8.2 and 8.3. Personally
>> I'm satisfied with only fixing it in 8.4 and up, but then again I don't
>> use Windows.

> Once we've shown that it works, I think we should look at doing
> something for <= 8.3 as well.

> How about something along the line of y previous patch (with the
> event) for 8.2 and 8.3, and then this simplified one for 8.4+?

Actually, I was just wondering how much we really need the dead-man
switch for this patch. If we don't have it, then what we risk is that
exit(128) will be taken as successful exit when it shouldn't be. But
how likely is it that such a call will ever be made? I think accepting
that small risk might be reasonable in the old branches. It's not like
the other possible fixes are zero-risk in themselves; especially not
patches that are only meant for the old branches and will never get
testing in HEAD.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-09 19:26:48
Message-ID: AANLkTi=8R0x72Ly6+JYvOK0bfGf-kPoGqfegEvwnuYnr@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Sep 9, 2010 at 3:00 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Sep 9, 2010 at 2:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> We're not so worried about this case that we'd want to backport the
>>> deadman switch into 8.3 or 8.2 to have a fix there, are we?
>
>> I think we should consider backporting the deadman switch to 8.3 and 8.2.
>
> [ raised eyebrow... ]  Weren't you the one just lecturing me about
> minimizing changes in back branches?

They call me Professor Haas?

I believe the specific nature of my complaint was that we should only
back-patch important bug or security fixes. I think that there is
credible argument that unnecessary database PANICs fall into that
category and wonky whitespace in the ps output does not. YMMV, of
course.

> That was a fairly large patch, and I *don't* want to back-port it.
> The thrust of my question was more along the lines of whether we should
> look for a different solution to the current problem, so that we would
> have something that could be back-ported into 8.2 and 8.3.  Personally
> I'm satisfied with only fixing it in 8.4 and up, but then again I don't
> use Windows.

I'm a bit surprised that you don't think this is back-patchable
material, considering the last paragraph of the commit message, which
seems to imply that you at least gave the matter some brief
consideration before deciding against it:

Although this problem is of long standing, the lack of field complaints
seems to mean it's not critical enough to risk back-patching; at least
not till we get some more testing of this mechanism.

We certainly now have MANY documented field complaints at least of the
exit-128-on-Windows problem, if not the more general
backend-exits-without-going-through-the-normal-cleanup-path problem.
Having said that, I'd be just as happy to go back to Magnus's original
solution, which didn't depend on the dead-man switch anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-09 19:28:10
Message-ID: 4587.1284060490@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> We certainly now have MANY documented field complaints at least of the
> exit-128-on-Windows problem, if not the more general
> backend-exits-without-going-through-the-normal-cleanup-path problem.

Right, which is why I still don't care to risk back-porting a fix for
the latter.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-09 19:37:11
Message-ID: AANLkTimQg=MmL5dbRwQNB15Zt=SO6prJ1cDwuKOYKCGi@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Sep 9, 2010 at 3:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> We certainly now have MANY documented field complaints at least of the
>> exit-128-on-Windows problem, if not the more general
>> backend-exits-without-going-through-the-normal-cleanup-path problem.
>
> Right, which is why I still don't care to risk back-porting a fix for
> the latter.

It's hard to say what the safest option is, I think. There seem to be
basically three proposals on the table:

1. Back-port the dead-man switch, and ignore exit 128.
2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
3. Revert to Magnus's original solution.

Each of these has advantages and disadvantages. The advantage of #1
is that it is safer than #2, and that is usually something we prize
fairly highly. The disadvantage of #1 is that it involves
back-porting the dead-man switch, but on the flip side that code has
been out in the field for over a year now in 8.4, and AFAIK we haven't
any trouble with it. Solution #3 should be approximately as safe as
solution #1, and has the advantage of touching less code in the back
branches, but on the other hand it is also NEW code. So I think it's
arguable which is the best solution. I think I like option #2 least
as among those choices, but it's a tough call.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-09 20:09:01
Message-ID: 5282.1284062941@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> It's hard to say what the safest option is, I think. There seem to be
> basically three proposals on the table:

> 1. Back-port the dead-man switch, and ignore exit 128.
> 2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
> 3. Revert to Magnus's original solution.

> Each of these has advantages and disadvantages. The advantage of #1
> is that it is safer than #2, and that is usually something we prize
> fairly highly. The disadvantage of #1 is that it involves
> back-porting the dead-man switch, but on the flip side that code has
> been out in the field for over a year now in 8.4, and AFAIK we haven't
> any trouble with it. Solution #3 should be approximately as safe as
> solution #1, and has the advantage of touching less code in the back
> branches, but on the other hand it is also NEW code. So I think it's
> arguable which is the best solution. I think I like option #2 least
> as among those choices, but it's a tough call.

Well, I don't want to use Magnus' original solution in 8.4 or up,
so I don't like #3 much: it's not only new code but code which would
get very limited testing. And I don't believe that the risk of
unexpected use of exit(128) is large enough to make #1 preferable to #2.
YMMV.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-09 20:16:30
Message-ID: AANLkTikuNe5vrsR+=HrMtFqaW=3Z7kcWqkKGEuw3aQoR@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Sep 9, 2010 at 22:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> It's hard to say what the safest option is, I think.  There seem to be
>> basically three proposals on the table:
>
>> 1. Back-port the dead-man switch, and ignore exit 128.
>> 2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
>> 3. Revert to Magnus's original solution.
>
>> Each of these has advantages and disadvantages.  The advantage of #1
>> is that it is safer than #2, and that is usually something we prize
>> fairly highly.  The disadvantage of #1 is that it involves
>> back-porting the dead-man switch, but on the flip side that code has
>> been out in the field for over a year now in 8.4, and AFAIK we haven't
>> any trouble with it.  Solution #3 should be approximately as safe as
>> solution #1, and has the advantage of touching less code in the back
>> branches, but on the other hand it is also NEW code.  So I think it's
>> arguable which is the best solution.  I think I like option #2 least
>> as among those choices, but it's a tough call.
>
> Well, I don't want to use Magnus' original solution in 8.4 or up,
> so I don't like #3 much: it's not only new code but code which would
> get very limited testing.  And I don't believe that the risk of
> unexpected use of exit(128) is large enough to make #1 preferable to #2.
> YMMV.

I agree on option #3 not being good - that'd basically be dead-end
code in backbranches only, and it's significantly different.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-10 01:12:05
Message-ID: 201009100112.o8A1C5Y20114@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Robert Haas wrote:
> On Thu, Sep 9, 2010 at 3:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> We certainly now have MANY documented field complaints at least of the
> >> exit-128-on-Windows problem, if not the more general
> >> backend-exits-without-going-through-the-normal-cleanup-path problem.
> >
> > Right, which is why I still don't care to risk back-porting a fix for
> > the latter.
>
> It's hard to say what the safest option is, I think. There seem to be
> basically three proposals on the table:
>
> 1. Back-port the dead-man switch, and ignore exit 128.
> 2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
> 3. Revert to Magnus's original solution.
>
> Each of these has advantages and disadvantages. The advantage of #1
> is that it is safer than #2, and that is usually something we prize
> fairly highly. The disadvantage of #1 is that it involves
> back-porting the dead-man switch, but on the flip side that code has
> been out in the field for over a year now in 8.4, and AFAIK we haven't
> any trouble with it. Solution #3 should be approximately as safe as
> solution #1, and has the advantage of touching less code in the back
> branches, but on the other hand it is also NEW code. So I think it's
> arguable which is the best solution. I think I like option #2 least
> as among those choices, but it's a tough call.

Well, the dead-man timer is for all platforms, while the 128 return
failure is Win32-only, so I don't see why applying the dead-man timer
makes sense when it might destabalize all platforms, when the bug is
just on Win32, and I don't think using defines to make the dead-man
timer Win32-only makes sense.

I think we have clear enough evidence that 128 on Win32 means
no-such-child and we can be sure the child never got started on that
platform.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-10 07:45:00
Message-ID: AANLkTi=L-kLrXWtETCYxm8XtwkzORnH49hfoDu1rpH5c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Sep 9, 2010 at 20:23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Thu, Sep 9, 2010 at 19:48, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> On balance I think I'd suggest an #ifdef WIN32 in CleanupBackend that
>>> made it accept 128 as a "normal exit" case.
>
>> Seems reasonable. I'll whack it around for that - see attached.
>
> Hm, still doesn't log, which I think it should, even for testing
> purposes (how will you know the case occurred?).  Maybe like this:

Agreed, that's better.

>> Dave has a reasonably reproducible test environment. Unforunately it's
>> on 8.3, so this patch will be completely unsafe there (it doesn't have
>> the deadman switch). But hopefully it can be used to see it fixes this
>> problem (while introducing others)h
>
> Sounds like a plan.

Patch is with dave for testing now :-)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-10 07:46:23
Message-ID: AANLkTikBzy3ZcHfj-KW=0Vp9eGc2-4gDvc9fMWpfhkVn@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Fri, Sep 10, 2010 at 03:12, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Robert Haas wrote:
>> On Thu, Sep 9, 2010 at 3:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> >> We certainly now have MANY documented field complaints at least of the
>> >> exit-128-on-Windows problem, if not the more general
>> >> backend-exits-without-going-through-the-normal-cleanup-path problem.
>> >
>> > Right, which is why I still don't care to risk back-porting a fix for
>> > the latter.
>>
>> It's hard to say what the safest option is, I think.  There seem to be
>> basically three proposals on the table:
>>
>> 1. Back-port the dead-man switch, and ignore exit 128.
>> 2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
>> 3. Revert to Magnus's original solution.
>>
>> Each of these has advantages and disadvantages.  The advantage of #1
>> is that it is safer than #2, and that is usually something we prize
>> fairly highly.  The disadvantage of #1 is that it involves
>> back-porting the dead-man switch, but on the flip side that code has
>> been out in the field for over a year now in 8.4, and AFAIK we haven't
>> any trouble with it.  Solution #3 should be approximately as safe as
>> solution #1, and has the advantage of touching less code in the back
>> branches, but on the other hand it is also NEW code.  So I think it's
>> arguable which is the best solution.  I think I like option #2 least
>> as among those choices, but it's a tough call.
>
> Well, the dead-man timer is for all platforms, while the 128 return
> failure is Win32-only, so I don't see why applying the dead-man timer
> makes sense when it might destabalize all platforms, when the bug is
> just on Win32, and I don't think using defines to make the dead-man
> timer Win32-only makes sense.

Yes, that's the problem, really.

> I think we have clear enough evidence that 128 on Win32 means
> no-such-child and we can be sure the child never got started on that
> platform.

We have evidence that 128 occurs in this case. I don't think we have
evidence that there is no other case when this can happen, and we need
to investigate that some further to be *sure*.

We can safely say that *we* never do exit(128). What if a third party
library does it? Or the operating system itself? For the OS we can
check it, but do we care about third party libraries?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-10 12:45:46
Message-ID: 201009101245.o8ACjkD21094@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Magnus Hagander wrote:
> > I think we have clear enough evidence that 128 on Win32 means
> > no-such-child and we can be sure the child never got started on that
> > platform.
>
> We have evidence that 128 occurs in this case. I don't think we have
> evidence that there is no other case when this can happen, and we need
> to investigate that some further to be *sure*.
>
> We can safely say that *we* never do exit(128). What if a third party
> library does it? Or the operating system itself? For the OS we can
> check it, but do we care about third party libraries?

Good question. Unix wait() splits apart the return code so you can tell
which part is the process exit code and which part is extra:

WEXITSTATUS(status)
If WIFEXITED(status) is true, evaluates to the low-order 8 bits
of the argument passed to _exit(2) or exit(3) by the child.

but we don't have that split on Win32 so you are right that anything
could return 128 from the process. Of course, it could also return
exit(0) too, but would hope that nothing does that as an error return.

I am not sure how clear it is on Win32 that 128 is a special return
code.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-15 08:03:22
Message-ID: AANLkTimCTkNKKrHCd3Ot6kAsrSS7SeDpOTcaLsEP7i+M@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Fri, Sep 10, 2010 at 1:45 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> I am not sure how clear it is on Win32 that 128 is a special return
> code.

I asked Microsoft platform support (roughly) that question. Here's the response:

=====
>From NTSTATUS.H
//
// The success status codes 128 - 191 are reserved for wait completion
// status with an abandoned mutant object.
//
#define STATUS_ABANDONED ((NTSTATUS)0x00000080L)

//
// MessageId: STATUS_ABANDONED_WAIT_0
//
// MessageText:
//
// STATUS_ABANDONED_WAIT_0
//
#define STATUS_ABANDONED_WAIT_0 ((NTSTATUS)0x00000080L) // winnt

I believe what you are seeing is an abandoned wait on a mutant which
is the same as a mutex. Therefore this error will be set whenever a
mutex is abandoned.

Per Concurrent Programming on Windows
An abandoned mutex is a mutex kernel object that was not correctly
released before its owning thread terminated. This can happen for any
number of reasons.

He goes on to discuss the case of a thread waiting on a global mutex
that will get this error when it is awakened from a wait and the mutex
had been abandoned by the previous owner. This is a difficult
situation to recover from as you are not sure about the shared state
that was being protected by the mutex. It

Therefore I cannot give you specific areas where this will happen. Of
course when systems are low on resources or they are completely
depleted (100% CPU) things will stop working
=====

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-15 17:25:41
Message-ID: AANLkTinau81rOK=acDHB-FmpmbKRyBanj-OaCEqmwP9H@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Sep 15, 2010 at 4:03 AM, Dave Page <dpage(at)pgadmin(dot)org> wrote:
> Therefore I cannot give you specific areas where this will happen.  Of
> course when systems are low on resources or they are completely
> depleted (100% CPU) things will stop working

Of course. As we all know, degrading gracefully under load is an
unachievable goal.

Anyway, this more or less confirms what I was kind of suspecting all
along: it's hopeless to try to avoid these exit(128) events, so we
just need to look for ways to minimize the impact as much as possible
(i.e. avoid a database PANIC where possible).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-16 17:05:30
Message-ID: AANLkTi=i-9mObOEzZ=k7xbS0PzJiRcETe0fHMOPrXr3N@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Sep 15, 2010 at 19:25, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Sep 15, 2010 at 4:03 AM, Dave Page <dpage(at)pgadmin(dot)org> wrote:
>> Therefore I cannot give you specific areas where this will happen.  Of
>> course when systems are low on resources or they are completely
>> depleted (100% CPU) things will stop working
>
> Of course.  As we all know, degrading gracefully under load is an
> unachievable goal.
>
> Anyway, this more or less confirms what I was kind of suspecting all
> along: it's hopeless to try to avoid these exit(128) events, so we
> just need to look for ways to minimize the impact as much as possible
> (i.e. avoid a database PANIC where possible).

So, it's been tested by at leasdt one EDB customer with success.

Do we want to sneak this in before we release 9.0.0?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-16 17:22:20
Message-ID: 201009161722.o8GHMK721911@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Dave Page wrote:
> On Fri, Sep 10, 2010 at 1:45 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >
> > I am not sure how clear it is on Win32 that 128 is a special return
> > code.
>
> I asked Microsoft platform support (roughly) that question. Here's the response:

I assume we are going to summarize this in a C comment when we ignore
128 return codes.

Can we assume all the mutexes will be cleaned up from a 128-exit?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-16 17:30:42
Message-ID: 28242.1284658242@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> So, it's been tested by at leasdt one EDB customer with success.

> Do we want to sneak this in before we release 9.0.0?

I think we had consensus on applying the simple fix as far back as we
have the deadman switch code. If you can get it done in the next
few hours, go ahead.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-16 17:33:21
Message-ID: 28292.1284658401@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Can we assume all the mutexes will be cleaned up from a 128-exit?

In the deadman-switch case I think we're safe enough. I'm not convinced
at the moment that ignoring the error would be safe without that.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-16 20:37:48
Message-ID: AANLkTikgxJwC1wXT0D8S=VGJA7AtC8T9DSr_PAJY0=BV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Sep 16, 2010 at 19:30, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> So, it's been tested by at leasdt one EDB customer with success.
>
>> Do we want to sneak this in before we release 9.0.0?
>
> I think we had consensus on applying the simple fix as far back as we
> have the deadman switch code.  If you can get it done in the next
> few hours, go ahead.

Done.

Anybody with a win32 buildfarm member - if you can give it a kick to
make sure it does a run ASAP, please do so.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-16 21:29:38
Message-ID: 4C928C42.7070603@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 09/16/2010 04:37 PM, Magnus Hagander wrote:
> On Thu, Sep 16, 2010 at 19:30, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Magnus Hagander<magnus(at)hagander(dot)net> writes:
>>> So, it's been tested by at leasdt one EDB customer with success.
>>> Do we want to sneak this in before we release 9.0.0?
>> I think we had consensus on applying the simple fix as far back as we
>> have the deadman switch code. If you can get it done in the next
>> few hours, go ahead.
> Done.
>
> Anybody with a win32 buildfarm member - if you can give it a kick to
> make sure it does a run ASAP, please do so.
>
>

OK, I have started MSVC/9.0 (red_bat) running.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-17 00:25:30
Message-ID: 4C92B57A.3090704@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 09/16/2010 05:29 PM, Andrew Dunstan wrote:
>
>
> On 09/16/2010 04:37 PM, Magnus Hagander wrote:
>> On Thu, Sep 16, 2010 at 19:30, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Magnus Hagander<magnus(at)hagander(dot)net> writes:
>>>> So, it's been tested by at leasdt one EDB customer with success.
>>>> Do we want to sneak this in before we release 9.0.0?
>>> I think we had consensus on applying the simple fix as far back as we
>>> have the deadman switch code. If you can get it done in the next
>>> few hours, go ahead.
>> Done.
>>
>> Anybody with a win32 buildfarm member - if you can give it a kick to
>> make sure it does a run ASAP, please do so.
>>
>>
>
> OK, I have started MSVC/9.0 (red_bat) running.
>
>

Looks like we're green on 9.0 for both MinGW and MSVC.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-17 00:50:38
Message-ID: 12418.1284684638@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Looks like we're green on 9.0 for both MinGW and MSVC.

Would you kick brown_bat too so we can check the cygwin case?

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-17 02:00:26
Message-ID: 4C92CBBA.6080200@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 09/16/2010 08:50 PM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> Looks like we're green on 9.0 for both MinGW and MSVC.
> Would you kick brown_bat too so we can check the cygwin case?

Done. Looks fine.

cheers

andrew


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-27 12:34:40
Message-ID: AANLkTi=6txnNA9e+ns4SfkRpTtK1_qxyDRd6Q2CcnO3W@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Sep 9, 2010 at 9:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> It's hard to say what the safest option is, I think.  There seem to be
>> basically three proposals on the table:
>
>> 1. Back-port the dead-man switch, and ignore exit 128.
>> 2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
>> 3. Revert to Magnus's original solution.
>
>> Each of these has advantages and disadvantages.  The advantage of #1
>> is that it is safer than #2, and that is usually something we prize
>> fairly highly.  The disadvantage of #1 is that it involves
>> back-porting the dead-man switch, but on the flip side that code has
>> been out in the field for over a year now in 8.4, and AFAIK we haven't
>> any trouble with it.  Solution #3 should be approximately as safe as
>> solution #1, and has the advantage of touching less code in the back
>> branches, but on the other hand it is also NEW code.  So I think it's
>> arguable which is the best solution.  I think I like option #2 least
>> as among those choices, but it's a tough call.
>
> Well, I don't want to use Magnus' original solution in 8.4 or up,
> so I don't like #3 much: it's not only new code but code which would
> get very limited testing.  And I don't believe that the risk of
> unexpected use of exit(128) is large enough to make #1 preferable to #2.
> YMMV.

So, can we go with #2 for the next point releases of <= 8.3? I
understand that our customer who has been testing that approach hasn't
seen any unexpected side-effects.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-29 13:45:15
Message-ID: AANLkTinm5=TY+vEcYKzdewFCtJ_jCj7LA3R8GSzNzk=n@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Sep 27, 2010 at 14:34, Dave Page <dpage(at)pgadmin(dot)org> wrote:
> On Thu, Sep 9, 2010 at 9:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> It's hard to say what the safest option is, I think.  There seem to be
>>> basically three proposals on the table:
>>
>>> 1. Back-port the dead-man switch, and ignore exit 128.
>>> 2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
>>> 3. Revert to Magnus's original solution.
>>
>>> Each of these has advantages and disadvantages.  The advantage of #1
>>> is that it is safer than #2, and that is usually something we prize
>>> fairly highly.  The disadvantage of #1 is that it involves
>>> back-porting the dead-man switch, but on the flip side that code has
>>> been out in the field for over a year now in 8.4, and AFAIK we haven't
>>> any trouble with it.  Solution #3 should be approximately as safe as
>>> solution #1, and has the advantage of touching less code in the back
>>> branches, but on the other hand it is also NEW code.  So I think it's
>>> arguable which is the best solution.  I think I like option #2 least
>>> as among those choices, but it's a tough call.
>>
>> Well, I don't want to use Magnus' original solution in 8.4 or up,
>> so I don't like #3 much: it's not only new code but code which would
>> get very limited testing.  And I don't believe that the risk of
>> unexpected use of exit(128) is large enough to make #1 preferable to #2.
>> YMMV.
>
> So, can we go with #2 for the next point releases of <= 8.3? I
> understand that our customer who has been testing that approach hasn't
> seen any unexpected side-effects.

Do we feel this is safe enough?

Also, just to be clear - they tested the "ignore 128 only" patch? Or
did they test the patch that also had some global events implementing
a "win32-only deadman switch"?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-29 13:46:10
Message-ID: AANLkTind9upTC6NVFDomMT0LJKWPeeBc03Gjv-m53ky9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Sep 27, 2010 at 14:34, Dave Page <dpage(at)pgadmin(dot)org> wrote:
> On Thu, Sep 9, 2010 at 9:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> It's hard to say what the safest option is, I think.  There seem to be
>>> basically three proposals on the table:
>>
>>> 1. Back-port the dead-man switch, and ignore exit 128.
>>> 2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
>>> 3. Revert to Magnus's original solution.
>>
>>> Each of these has advantages and disadvantages.  The advantage of #1
>>> is that it is safer than #2, and that is usually something we prize
>>> fairly highly.  The disadvantage of #1 is that it involves
>>> back-porting the dead-man switch, but on the flip side that code has
>>> been out in the field for over a year now in 8.4, and AFAIK we haven't
>>> any trouble with it.  Solution #3 should be approximately as safe as
>>> solution #1, and has the advantage of touching less code in the back
>>> branches, but on the other hand it is also NEW code.  So I think it's
>>> arguable which is the best solution.  I think I like option #2 least
>>> as among those choices, but it's a tough call.
>>
>> Well, I don't want to use Magnus' original solution in 8.4 or up,
>> so I don't like #3 much: it's not only new code but code which would
>> get very limited testing.  And I don't believe that the risk of
>> unexpected use of exit(128) is large enough to make #1 preferable to #2.
>> YMMV.
>
> So, can we go with #2 for the next point releases of <= 8.3? I
> understand that our customer who has been testing that approach hasn't
> seen any unexpected side-effects.

Do we feel this is safe enough?

Also, just to be clear - they tested the "ignore 128 only" patch? Or
did they test the patch that also had some global events implementing
a "win32-only deadman switch"?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-29 13:54:23
Message-ID: AANLkTin=b_HTDC5Z+Gha46q8kMaM9qnLr3arO1ARO-SA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Sep 29, 2010 at 2:45 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Mon, Sep 27, 2010 at 14:34, Dave Page <dpage(at)pgadmin(dot)org> wrote:
>> On Thu, Sep 9, 2010 at 9:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> It's hard to say what the safest option is, I think.  There seem to be
>>>> basically three proposals on the table:
>>>
>>>> 1. Back-port the dead-man switch, and ignore exit 128.
>>>> 2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
>>>> 3. Revert to Magnus's original solution.
>>>
>>>> Each of these has advantages and disadvantages.  The advantage of #1
>>>> is that it is safer than #2, and that is usually something we prize
>>>> fairly highly.  The disadvantage of #1 is that it involves
>>>> back-porting the dead-man switch, but on the flip side that code has
>>>> been out in the field for over a year now in 8.4, and AFAIK we haven't
>>>> any trouble with it.  Solution #3 should be approximately as safe as
>>>> solution #1, and has the advantage of touching less code in the back
>>>> branches, but on the other hand it is also NEW code.  So I think it's
>>>> arguable which is the best solution.  I think I like option #2 least
>>>> as among those choices, but it's a tough call.
>>>
>>> Well, I don't want to use Magnus' original solution in 8.4 or up,
>>> so I don't like #3 much: it's not only new code but code which would
>>> get very limited testing.  And I don't believe that the risk of
>>> unexpected use of exit(128) is large enough to make #1 preferable to #2.
>>> YMMV.
>>
>> So, can we go with #2 for the next point releases of <= 8.3? I
>> understand that our customer who has been testing that approach hasn't
>> seen any unexpected side-effects.
>
> Do we feel this is safe enough?

I've yet to hear of a way a process can exit with a 128 that seems
like it could happen in our code.

> Also, just to be clear - they tested the "ignore 128 only" patch?

Yes.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-09-29 14:24:03
Message-ID: AANLkTi=6Y2LCG81ff8BovEa2DrB1PCzSvQp8a=-dJMJk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Sep 29, 2010 at 15:54, Dave Page <dpage(at)pgadmin(dot)org> wrote:
> On Wed, Sep 29, 2010 at 2:45 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Mon, Sep 27, 2010 at 14:34, Dave Page <dpage(at)pgadmin(dot)org> wrote:
>>> On Thu, Sep 9, 2010 at 9:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>>> It's hard to say what the safest option is, I think.  There seem to be
>>>>> basically three proposals on the table:
>>>>
>>>>> 1. Back-port the dead-man switch, and ignore exit 128.
>>>>> 2. Don't back-port the dead-man switch, but ignore exit 128 anyway.
>>>>> 3. Revert to Magnus's original solution.
>>>>
>>>>> Each of these has advantages and disadvantages.  The advantage of #1
>>>>> is that it is safer than #2, and that is usually something we prize
>>>>> fairly highly.  The disadvantage of #1 is that it involves
>>>>> back-porting the dead-man switch, but on the flip side that code has
>>>>> been out in the field for over a year now in 8.4, and AFAIK we haven't
>>>>> any trouble with it.  Solution #3 should be approximately as safe as
>>>>> solution #1, and has the advantage of touching less code in the back
>>>>> branches, but on the other hand it is also NEW code.  So I think it's
>>>>> arguable which is the best solution.  I think I like option #2 least
>>>>> as among those choices, but it's a tough call.
>>>>
>>>> Well, I don't want to use Magnus' original solution in 8.4 or up,
>>>> so I don't like #3 much: it's not only new code but code which would
>>>> get very limited testing.  And I don't believe that the risk of
>>>> unexpected use of exit(128) is large enough to make #1 preferable to #2.
>>>> YMMV.
>>>
>>> So, can we go with #2 for the next point releases of <= 8.3? I
>>> understand that our customer who has been testing that approach hasn't
>>> seen any unexpected side-effects.
>>
>> Do we feel this is safe enough?
>
> I've yet to hear of a way a process can exit with a 128 that seems
> like it could happen in our code.
>
>> Also, just to be clear - they tested the "ignore 128 only" patch?
>
> Yes.

Ok, applied. Please verify that it matches your expectations :D

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/