BUG #10141: Server fails to send query result.

Lists: pgsql-bugs
From: molind(at)gmail(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #10141: Server fails to send query result.
Date: 2014-04-25 15:52:27
Message-ID: 20140425155227.2715.56833@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 10141
Logged by: Evgen Bodunov
Email address: molind(at)gmail(dot)com
PostgreSQL version: 9.3.4
Operating system: Ubuntu 13.10
Description:

I have big database with geospatial data from OpenStreetMap. I prepare
vector tiles from this data using console app which requests data from
database for each tile in given area.

While it works it opens 1 db connection per thread (8 threads) and queries
all tiles from list 25k tiles in average. 5 queries per tile. During map
generation usually 2 threads stuck in idle state.

http://i.stack.imgur.com/VUU4O.png

Look at the last row. Query started. We have a timestamp, then it finished
and changed state to idle, and we have timestamp of that moment. Everything
looks ok, except console app which waits answer from server inside

PGresult *res = PQexec(conn, query);

It waits for query results from server. And I can't find why server thinks
that job is done and app says that there was no answer from db yet.

I added PQtrace for each thread and wait for stuck thread. There is trace:

>From backend> D
>From backend (#4)> 139891
>From backend> D
>From backend (#4)> 139891
>From backend> D
>From backend (#4)> 139891
>From backend> D
>From backend (#4)> 139891
>From backend> D
>From backend (#4)> 139891
>From backend> D
>From backend (#4)> 139891

Seems problem somewhere inside PostgreSQL. It tries to send result but
fails.

PostgreSQL 9.3.4-1.pgdg70+1 installed from apt source
deb http://apt.postgresql.org/pub/repos/apt/ wheezy-pgdg main


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: molind(at)gmail(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #10141: Server fails to send query result.
Date: 2014-04-25 18:21:42
Message-ID: 20140425182142.GQ2556@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Greetings,

* molind(at)gmail(dot)com (molind(at)gmail(dot)com) wrote:
> While it works it opens 1 db connection per thread (8 threads) and queries
> all tiles from list 25k tiles in average. 5 queries per tile. During map
> generation usually 2 threads stuck in idle state.
>
> http://i.stack.imgur.com/VUU4O.png

This only shows what's happening server-side.

> Look at the last row. Query started. We have a timestamp, then it finished
> and changed state to idle, and we have timestamp of that moment. Everything
> looks ok, except console app which waits answer from server inside
>
> PGresult *res = PQexec(conn, query);
>
> It waits for query results from server. And I can't find why server thinks
> that job is done and app says that there was no answer from db yet.

Can you provide a small example test case which illustrates this? You
didn't provide your code, so it's really difficult to have any idea
what's going on.

What version of libpq is being used and on what platform? Are you using
SSL at all for your connection?

> I added PQtrace for each thread and wait for stuck thread. There is trace:
>
> >From backend> D
> >From backend (#4)> 139891
> >From backend> D
> >From backend (#4)> 139891
> >From backend> D
> >From backend (#4)> 139891
> >From backend> D
> >From backend (#4)> 139891
> >From backend> D
> >From backend (#4)> 139891
> >From backend> D
> >From backend (#4)> 139891
>
> Seems problem somewhere inside PostgreSQL. It tries to send result but
> fails.

I don't see anything here which shows that to be the case. Is the TCP
connection still fully ESTABLISHED on both sides? What's in the
middle..? Could you have a firewall or IDS which is dropping the
connections?

Thanks,

Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: molind(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #10141: Server fails to send query result.
Date: 2014-04-25 19:20:15
Message-ID: 13193.1398453615@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> * molind(at)gmail(dot)com (molind(at)gmail(dot)com) wrote:
>> I added PQtrace for each thread and wait for stuck thread. There is trace:

>> From backend> D
>> From backend (#4)> 139891
>> From backend> D
>> From backend (#4)> 139891
>> From backend> D
>> From backend (#4)> 139891
>> From backend> D
>> From backend (#4)> 139891
>> From backend> D
>> From backend (#4)> 139891
>> From backend> D
>> From backend (#4)> 139891

FWIW, this looks suspiciously like libpq is unable to consume data from
its input buffer (and keeps retrying to process the same data). Are you
using PQsetnonblocking by any chance? If so, this probably indicates
failure to follow the required call sequencing to process data.

Another likely theory, given that you mentioned multiple client threads,
is that the threads are stomping on each others' toes somehow. libpq
does not defend itself against that: it's up to you to be sure that
only one thread is touching each PGconn object.

>> Seems problem somewhere inside PostgreSQL. It tries to send result but
>> fails.

> I don't see anything here which shows that to be the case.

Indeed. I'd bet considerable money that this is a client-side issue.
It's possible that it's a libpq bug, but what seems more likely is
that you're using libpq improperly.

regards, tom lane


From: Evgen Bodunov <molind(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #10141: Server fails to send query result.
Date: 2014-04-25 20:55:53
Message-ID: 4892048A-2D47-44FD-94E1-B3AA7B6A55BF@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

There is more details:

Even when it works it sends D more than once usually

From backend> D
From backend (#4)> 33170
From backend> D
From backend (#4)> 33170
From backend> D
From backend (#4)> 33170
From backend> D
From backend (#4)> 33170
From backend> D
From backend (#4)> 33170
From backend (#2)> 2
From backend (#4)> 33132
From backend (33132)> \x01060000006c0000000103000000010000000a00000080ce26eb13eaa640003fad1864359840800c2b00bdeea6400055d
From backend (#4)> 24
From backend (24)> "landuse"=>"residential"

My query looks like
select * from poly9(3130860.6785608195, 7592337.1455099881, 3209132.1955248402, 7670608.6624740083);

while poly9 is function generated by app before data processing starts. And basically it looks like:

CREATE FUNCTION "public"."poly9"(IN minx numeric, IN miny numeric, IN maxx numeric, IN maxy numeric)
returns table (geom_out bytea, tags_out public.hstore)
AS $BODY$
declare
bbox = st_makebox2d(st_makepoint(minx, miny), st_makepoint(maxx, maxy));
begin
-- simplified version
return query
SELECT way, tags FROM planet_polygon WHERE way && bbox;
end;
$BODY$
LANGUAGE plpgsql
COST 100
ROWS 1000
CALLED ON NULL INPUT
SECURITY INVOKER
VOLATILE;

Hope it helps. This solution worked fine 2 months ago. Is there a small chance that something broken inside 9.3.4?

Best Regards,
Evgen Bodunov.

On Apr 25, 2014, at 10:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Stephen Frost <sfrost(at)snowman(dot)net> writes:
>> * molind(at)gmail(dot)com (molind(at)gmail(dot)com) wrote:
>>> I added PQtrace for each thread and wait for stuck thread. There is trace:
>
>>> From backend> D
>>> From backend (#4)> 139891
>>> From backend> D
>>> From backend (#4)> 139891
>>> From backend> D
>>> From backend (#4)> 139891
>>> From backend> D
>>> From backend (#4)> 139891
>>> From backend> D
>>> From backend (#4)> 139891
>>> From backend> D
>>> From backend (#4)> 139891
>
> FWIW, this looks suspiciously like libpq is unable to consume data from
> its input buffer (and keeps retrying to process the same data). Are you
> using PQsetnonblocking by any chance? If so, this probably indicates
> failure to follow the required call sequencing to process data.
>
> Another likely theory, given that you mentioned multiple client threads,
> is that the threads are stomping on each others' toes somehow. libpq
> does not defend itself against that: it's up to you to be sure that
> only one thread is touching each PGconn object.
>
>>> Seems problem somewhere inside PostgreSQL. It tries to send result but
>>> fails.
>
>> I don't see anything here which shows that to be the case.
>
> Indeed. I'd bet considerable money that this is a client-side issue.
> It's possible that it's a libpq bug, but what seems more likely is
> that you're using libpq improperly.
>
> regards, tom lane


From: Evgen Bodunov <molind(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #10141: Server fails to send query result.
Date: 2014-04-26 08:45:11
Message-ID: A16FD213-A3F0-414B-8EBC-E90683821D44@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hello guys,

I checked it with direct ethernet connection and now it works without issues. When i switched OpenVPN server from "proto udp” to “proto tcp" mode it seems also works ok.

Thanks for your help.

Best Regards,
Evgen Bodunov.

On Apr 25, 2014, at 11:55 PM, Evgen Bodunov <molind(at)gmail(dot)com> wrote:

> There is more details:
>
> Even when it works it sends D more than once usually
>
> From backend> D
> From backend (#4)> 33170
> From backend> D
> From backend (#4)> 33170
> From backend> D
> From backend (#4)> 33170
> From backend> D
> From backend (#4)> 33170
> From backend> D
> From backend (#4)> 33170
> From backend (#2)> 2
> From backend (#4)> 33132
> From backend (33132)> \x01060000006c0000000103000000010000000a00000080ce26eb13eaa640003fad1864359840800c2b00bdeea6400055d
> From backend (#4)> 24
> From backend (24)> "landuse"=>"residential"
>
> My query looks like
> select * from poly9(3130860.6785608195, 7592337.1455099881, 3209132.1955248402, 7670608.6624740083);
>
> while poly9 is function generated by app before data processing starts. And basically it looks like:
>
> CREATE FUNCTION "public"."poly9"(IN minx numeric, IN miny numeric, IN maxx numeric, IN maxy numeric)
> returns table (geom_out bytea, tags_out public.hstore)
> AS $BODY$
> declare
> bbox = st_makebox2d(st_makepoint(minx, miny), st_makepoint(maxx, maxy));
> begin
> -- simplified version
> return query
> SELECT way, tags FROM planet_polygon WHERE way && bbox;
> end;
> $BODY$
> LANGUAGE plpgsql
> COST 100
> ROWS 1000
> CALLED ON NULL INPUT
> SECURITY INVOKER
> VOLATILE;
>
> Hope it helps. This solution worked fine 2 months ago. Is there a small chance that something broken inside 9.3.4?
>
> Best Regards,
> Evgen Bodunov.
>
> On Apr 25, 2014, at 10:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Stephen Frost <sfrost(at)snowman(dot)net> writes:
>>> * molind(at)gmail(dot)com (molind(at)gmail(dot)com) wrote:
>>>> I added PQtrace for each thread and wait for stuck thread. There is trace:
>>
>>>> From backend> D
>>>> From backend (#4)> 139891
>>>> From backend> D
>>>> From backend (#4)> 139891
>>>> From backend> D
>>>> From backend (#4)> 139891
>>>> From backend> D
>>>> From backend (#4)> 139891
>>>> From backend> D
>>>> From backend (#4)> 139891
>>>> From backend> D
>>>> From backend (#4)> 139891
>>
>> FWIW, this looks suspiciously like libpq is unable to consume data from
>> its input buffer (and keeps retrying to process the same data). Are you
>> using PQsetnonblocking by any chance? If so, this probably indicates
>> failure to follow the required call sequencing to process data.
>>
>> Another likely theory, given that you mentioned multiple client threads,
>> is that the threads are stomping on each others' toes somehow. libpq
>> does not defend itself against that: it's up to you to be sure that
>> only one thread is touching each PGconn object.
>>
>>>> Seems problem somewhere inside PostgreSQL. It tries to send result but
>>>> fails.
>>
>>> I don't see anything here which shows that to be the case.
>>
>> Indeed. I'd bet considerable money that this is a client-side issue.
>> It's possible that it's a libpq bug, but what seems more likely is
>> that you're using libpq improperly.
>>
>> regards, tom lane
>