Re: Some 9.5beta2 backend processes not terminating properly?

From: Shay Rojansky <roji(at)roji(dot)org>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some 9.5beta2 backend processes not terminating properly?
Date: 2015-12-30 17:01:10
Message-ID: CADT4RqBMPE_V=7DCtqkdQdzWyF-E-uV-jWTpuP8u7eOfXziOmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

OK, I finally found some time to dive into this.

The backends seem to hang when the client closes a socket without first
sending a Terminate message - some of the tests make this happen. I've
confirmed this happens with 9.5rc1 running on Windows (versions 10 and 7),
but this does not occur on Ubuntu 15.10. The client runs on Windows as well
(although I doubt that's important).

In case it helps, here's a gist
<https://gist.github.com/roji/33df4e818c5d64a607aa> with some .NET code
that uses Npgsql 3.0.4 to reproduce this.

If there's anything else I can do please let me know.

Shay

On Wed, Dec 30, 2015 at 5:32 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

>
>
> On Tue, Dec 29, 2015 at 7:04 PM, Shay Rojansky <roji(at)roji(dot)org> wrote:
>
>> Could you describe the worklad a bit more? Is this rather concurrent? Do
>>> you use optimized or debug builds? How long did you wait for the
>>> backends to die? Is this all over localhost, external ip but local,
>>> remotely?
>>>
>>
>> The workload is a a rather diverse set of integration tests executed with
>> Npgsql. There's no concurrency whatsoever - tests are executed serially.
>> The backends stay alive indefinitely, until they are killed. All this is
>> over localhost with TCP. I can try other scenarios if that'll help.
>>
>>
>
> What procedure do you use to kill backends? Normally, if we kill
> via task manager using "End Process", it is considered as backend
> crash and the server gets restarted and all other backends got
> disconnected.
>
>
>> > Note that the number of backends that stay stuck after the tests is
>>> > constant (always 12).
>>>
>>> Can you increase the number of backends used in the test? And check
>>> whether it's still 12?
>>>
>>
>> Well, I ran the testsuite twice in parallel, and got... 23 backends stuck
>> at the end.
>>
>>
>>> How are your clients disconnecting? Possibly without properly
>>> disconnecting?
>>>
>>
>> That's possible, definitely in some of the test cases.
>>
>> What I can do is try to isolate things further by playing around with the
>> tests and trying to see if a more minimal repro can be done - I'll try
>> doing this later today or tomorrow. If anyone has any other specific tests
>> or checks I should do let me know.
>>
>
> I think first we should try to isolate whether the hanged backends
> are due to the reason that they are not disconnected properly or
> there is some other factor involved as well, so you can try to kill/
> disconnect the sessions connected via psql in the same way as
> you are doing for connections with Npgsql and see if you can
> reproduce the same behaviour.
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-12-30 17:14:23 Rationalizing Query.withCheckOptions
Previous Message Andres Freund 2015-12-30 16:51:20 Re: --enable-depend by default (was Re: Patch: fix lock contention for HASHHDR.mutex)