Re: Some 9.5beta2 backend processes not terminating properly?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Shay Rojansky <roji(at)roji(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Some 9.5beta2 backend processes not terminating properly?
Date: 2015-12-29 12:37:02
Message-ID: 20151229123702.wgeplwsrp6nyxxsb@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-12-29 12:41:40 +0200, Shay Rojansky wrote:
> >
> > > The tests run for a couple minutes, open and close some connection. With
> > my
> > > pre-9.5 backends, the moment the test runner exits I can see that all
> > > backend processes exit immediately, and pg_activity_stat has no rows
> > > (except the querying one). With 9.5beta2, however, some backend processes
> > > continue to stay alive beyond the test runner, and pg_activity_stat
> > > contains extra rows (state idle, waiting false). This situation persists
> > > until I restart PostgreSQL.

Could you describe the worklad a bit more? Is this rather concurrent? Do
you use optimized or debug builds? How long did you wait for the
backends to die? Is this all over localhost, external ip but local,
remotely?

> Note that the number of backends that stay stuck after the tests is
> constant (always 12).

Can you increase the number of backends used in the test? And check
whether it's still 12?

> Here's are stack dumps of the same process taken with both VS2015 Community
> and Process Explorer, I went over 4 processes and saw the same thing. Let
> me know what I else I can provide to help.
>
> From VS2015 Community:
>
> Main Thread
> > ntdll.dll!NtWaitForMultipleObjects() Unknown
> KernelBase.dll!WaitForMultipleObjectsEx() Unknown
> KernelBase.dll!WaitForMultipleObjects() Unknown
> postgres.exe!WaitLatchOrSocket(volatile Latch * latch, int wakeEvents,
> unsigned __int64 sock, long timeout) Line 202 C
> postgres.exe!secure_read(Port * port, void * ptr, unsigned __int64 len)
> Line 151 C
> postgres.exe!pq_getbyte() Line 926 C
> postgres.exe!SocketBackend(StringInfoData * inBuf) Line 345 C
> postgres.exe!PostgresMain(int argc, char * * argv, const char * dbname,
> const char * username) Line 3984 C
> postgres.exe!BackendRun(Port * port) Line 4236 C
> postgres.exe!SubPostmasterMain(int argc, char * * argv) Line 4727 C
> postgres.exe!main(int argc, char * * argv) Line 211 C
> postgres.exe!__tmainCRTStartup() Line 626 C
> kernel32.dll!BaseThreadInitThunk() Unknown
> ntdll.dll!RtlUserThreadStart() Unknown

Hm. So we're waiting for the latch, and expecting to get a FD_CLOSE
error back because the socket is actually closed. Which should happen
always in that path - a read through win32_latch.c doesn't show any
obvious problems. But then I really have not too much clue about windows
development.

How are your clients disconnecting? Possibly without properly
disconnecting?

Regards,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shay Rojansky 2015-12-29 13:34:36 Re: Some 9.5beta2 backend processes not terminating properly?
Previous Message Heikki Linnakangas 2015-12-29 12:18:18 Re: pg_controldata/pg_resetxlog "Latest checkpoint's NextXID" format