Re: Re: Connection hanging on INSERT apparently due to large batch size and 4 CPU cores

Lists: pgsql-jdbc
From: jgassner(at)gmail(dot)com
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Connection hanging on INSERT apparently due to large batch size and 4 CPU cores
Date: 2008-10-21 06:27:11
Message-ID: c44a1bc0-dcb1-4b57-8106-e50f9303b7d1@79g2000hsk.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

I am experiencing a problem while using the JDBC driver where one or
more connections hang soon after application startup. The connections
don't hang immediately, but instead hang after performing a few
operations (such as SELECTS) - we are using a connection pool. The
number of operations it takes for a connection to hang varies, but
typically we are able to reproduce the issue within an hour or two.
In all cases, however, when they do hang, it is while performing an
INSERT into one particular table of ours. I'm not aware if it is
always the first INSERT for a particular connection that results in a
hang, but I have seen this to be the case in many db log traces. The
table where the hanging INSERTS are occurring is the table that
receives, by far, the most number of INSERTS.

We are using a batched INSERT to perform the insert operation into
this table. When we attempt to insert batches of size 600 rows, we
observe this hang issue. However, we evetually found that lowering
the batch size to 100 avoided the hang issue. While it is nice to
apparently get around the issue using a lower batch size, it would be
nice to understand what the actual problem is so we can be sure it
won't pop up again. This is my question... Anyone know why lowering
the batch size fixed the problem and why the problem occurred in the
first place? Is there a maximum amount of data that can be inserted
in one batch?

Here are some observations on the problem:

From a stack trace of the java side of the connection, I see that it
consistenly is hanging in a socketWrite and while binding the values:

Thread-94 [RUNNABLE]
java.net.SocketOutputStream.socketWrite0(native method)
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
java.net.SocketOutputStream.write(SocketOutputStream.java:136)
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
java.io.FilterOutputStream.write(FilterOutputStream.java:80)
org.postgresql.core.PGStream.SendInteger2(PGStream.java:205)
org.postgresql.core.v3.QueryExecutorImpl.sendBind(QueryExecutorImpl.java:
840)
org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:
1053)
org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:
644)
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:
345)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:
2674)

Performing an strace on the PostgreSQL server process corresponding to
the hung connection shows that it is stuck inside a call to "sendto".

Using lsof I can see that the tcp connection is still in the
ESTABLISHED state.

So it appears that boths side are trying to write to each other at the
same time and the underlying tcp connection is good.

Threading: This is a multi-threaded java server application. All
threads take connections from the same connection pool. Most threads
aren't very busy, other than the threads inserting into the particular
table where the INSERTS are hanging.

Info on my setup:
JDBC driver: postgresql-8.3-603.jdbc3.jar
PostgreSQL Server: postgresql-8.3.3.tar.bz2 (built with the options --
without-readline --disable-rpath --with-openssl on x86-64)
- Although it is built with ssl, our tests are not using ssl
JVM: Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_15-
b04)
Machine: SLES 10 SP2 (x86_64); 4 CPU cores
- Doesn't seem to happen on machines with only 2 CPU cores

All idea are appreciated.

Thanks,
John


From: Kris Jurka <books(at)ejurka(dot)com>
To: jgassner(at)gmail(dot)com
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Connection hanging on INSERT apparently due to large batch size and 4 CPU cores
Date: 2008-10-22 17:48:22
Message-ID: Pine.BSO.4.64.0810221336170.13384@leary.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On Mon, 20 Oct 2008, jgassner(at)gmail(dot)com wrote:

> We are using a batched INSERT to perform the insert operation into
> this table. When we attempt to insert batches of size 600 rows, we
> observe this hang issue. However, we evetually found that lowering
> the batch size to 100 avoided the hang issue. While it is nice to
> apparently get around the issue using a lower batch size, it would be
> nice to understand what the actual problem is so we can be sure it
> won't pop up again. This is my question... Anyone know why lowering
> the batch size fixed the problem and why the problem occurred in the
> first place? Is there a maximum amount of data that can be inserted
> in one batch?

See the comments here starting at line #226.

http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/jdbc/pgjdbc/org/postgresql/core/v3/QueryExecutorImpl.java?annotate=1.42

The driver is already breaking your batch into sub-batches of 256 queries.
We don't expect to get a large response from an insert statement, but
that's apparently not true in your case. Perhaps there is something
unusual about your inserts? Do they fail, resulting in large error
messages? Do they produce warnings or notices for things like
escape_string_warning? If you can reproduce this failure, then turning on
additional logging via the loglevel=2 URL parameter will show the exact
protocol level responses coming from the server and then we should be able
to see what's really happening. (You'd want to look at the protocol level
logging for both the success and failure cases, because on the failure
side, we never get around to processing the results.

Kris Jurka


From: John <jgassner(at)gmail(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Connection hanging on INSERT apparently due to large batch size and 4 CPU cores
Date: 2008-10-25 20:24:29
Message-ID: 843188e4-0b74-4360-bcb0-815464087ec0@u28g2000hsc.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On Oct 22, 1:48 pm, bo(dot)(dot)(dot)(at)ejurka(dot)com (Kris Jurka) wrote:
> On Mon, 20 Oct 2008, jgass(dot)(dot)(dot)(at)gmail(dot)com wrote:
> > We are using a batched INSERT to perform the insert operation into
> > this table.  When we attempt to insert batches of size 600 rows, we
> > observe this hang issue.  However, we evetually found that lowering
> > the batch size to 100 avoided the hang issue.  While it is nice to
> > apparently get around the issue using a lower batch size, it would be
> > nice to understand what the actual problem is so we can be sure it
> > won't pop up again.  This is my question... Anyone know why lowering
> > the batch size fixed the problem and why the problem occurred in the
> > first place?  Is there a maximum amount of data that can be inserted
> > in one batch?
>
> See the comments here starting at line #226.
>
> http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/jdbc/pgjdbc/org/postgresq...
>
> The driver is already breaking your batch into sub-batches of 256 queries.
> We don't expect to get a large response from an insert statement, but
> that's apparently not true in your case.  Perhaps there is something
> unusual about your inserts?  Do they fail, resulting in large error
> messages?  Do they produce warnings or notices for things like
> escape_string_warning?  If you can reproduce this failure, then turning on
> additional logging via the loglevel=2 URL parameter will show the exact
> protocol level responses coming from the server and then we should be able
> to see what's really happening.  (You'd want to look at the protocol level
> logging for both the success and failure cases, because on the failure
> side, we never get around to processing the results.
>
> Kris Jurka
>
> --
> Sent via pgsql-jdbc mailing list (pgsql-j(dot)(dot)(dot)(at)postgresql(dot)org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-jdbc

I was able to reproduce the issue with loglevel=2. I believe I
located the connection in the log file that corresponds to the hung
connection. Reviewing the log activity of this connection didn't show
any errors that were plain for me to see. I searched for "warning"
and didn't get any hits either.

I do see that this connection (3) just stops logging at the log
message pasted below. This concurs with my finding that it is hung
within a bind statement. All the other connections continue logging,
so it was fairly obvious that this was the connection that was the
hung one.

Could it be related to the INSERT being done on a table that has 330
columns?

Thanks,
John

2008/10/25 15:16:30 | INFO | jvm 1 | 15:16:30.415 (3) FE=>
Execute(portal=null,limit=1)
2008/10/25 15:16:30 | INFO | jvm 1 | 15:16:30.415 (3) FE=>
Describe(statement=S_489)
2008/10/25 15:16:30 | INFO | jvm 1 | 15:16:30.415 (3) FE=>
Bind(stmt=S_489,portal=null,$1=<42BE2611-84F5-102B-A965-0022190B8F18>,
$2=<2008-10
-25 19:03:55.000000 +00:00:00>,$3=<101000>,$4=<0>,$5=<0>,$6=<0>,
$7=<101000>,$8=<7655123>,$9=<109100>,$10=<NULL>,$11=<NULL>,$12=<NULL>,
$13=<NULL>,$
14=<NULL>,$15=<443>,$16=<16001>,$17=<NULL>,$18=<NULL>,$19=<3>,
$20=<दकEventGeneratorदक>,$21=<NULL>,$22=<-1062731758>,$23=<NULL>,
$24=<NULL>,$25=<-10
62731759>,$26=<NULL>,$27=<https>,$28=<MyInitUser>,$29=<MyTargetUser>,
$30=<दकqffदक fik fik >,$31=<NULL>,$32=<NULL>,$33=<NULL>,$34=<NULL>,
$35=<NULL>
,$36=<NULL>,$37=<2008-10-25 19:16:25.650000 +00:00:00>,$38=<2008-10-25
19:16:25.650000 +00:00:00>,$39=<NULL>,$40=<NULL>,$41=<0>,$42=<NULL>,
$43=<NU
LL>,$44=<NULL>,$45=<NULL>,$46=<NULL>,$47=<NULL>,$48=<NULL>,$49=<NULL>,
$50=<NULL>,$51=<NULL>,$52=<NULL>,$53=<NULL>,$54=<NULL>,$55=<NULL>,
$56=<NULL>
,$57=<NULL>,$58=<NULL>,$59=<NULL>,$60=<NULL>,$61=<NULL>,$62=<NULL>,
$63=<NULL>,$64=<NULL>,$65=<NULL>,$66=<NULL>,$67=<NULL>,$68=<NULL>,
$69=<NULL>,$7
0=<NULL>,$71=<NULL>,$72=<NULL>,$73=<NULL>,$74=<MyInitUserDomain>,
$75=<NULL>,$76=<NULL>,$77=<NULL>,$78=<NULL>,$79=<NULL>,$80=<NULL>,
$81=<NULL>,$82=
<NULL>,$83=<MyTargetUserDomain>,$84=<NULL>,$85=<NULL>,$86=<NULL>,
$87=<NULL>,$88=<0>,$89=<101000>,$90=<NULL>,$91=<NULL>,$92=<NULL>,
$93=<NULL>,$94=<
NULL>,$95=<NULL>,$96=<NULL>,$97=<NULL>,$98=<NULL>,$99=<NULL>,
$100=<NULL>,$101=<NULL>,$102=<NULL>,$103=<NULL>,$104=<NULL>,
$105=<NULL>,$106=<NULL>,$
107=<NULL>,$108=<NULL>,$109=<NULL>,$110=<NULL>,$111=<NULL>,$112=<NULL>,
$113=<NULL>,$114=<NULL>,$115=<NULL>,$116=<NULL>,$117=<NULL>,
$118=<NULL>,$11
9=<NULL>,$120=<NULL>,$121=<NULL>,$122=<NULL>,$123=<NULL>,$124=<NULL>,
$125=<NULL>,$126=<NULL>,$127=<NULL>,$128=<NULL>,$129=<NULL>,
$130=<NULL>,$131=
<NULL>,$132=<NULL>,$133=<NULL>,$134=<NULL>,$135=<NULL>,$136=<NULL>,
$137=<NULL>,$138=<NULL>,$139=<NULL>,$140=<NULL>,$141=<NULL>,
$142=<NULL>,$143=<N
ULL>,$144=<NULL>,$145=<NULL>,$146=<NULL>,$147=<NULL>,$148=<NULL>,
$149=<NULL>,$150=<NULL>,$151=<NULL>,$152=<NULL>,$153=<NULL>,
$154=<NULL>,$155=<NUL
L>,$156=<NULL>,$157=<NULL>,$158=<NULL>,$159=<NULL>,$160=<NULL>,
$161=<NULL>,$162=<NULL>,$163=<NULL>,$164=<NULL>,$165=<NULL>,
$166=<NULL>,$167=<NULL>
,$168=<NULL>,$169=<NULL>,$170=<NULL>,$171=<NULL>,$172=<NULL>,
$173=<NULL>,$174=<NULL>,$175=<NULL>,$176=<NULL>,$177=<NULL>,
$178=<NULL>,$179=<NULL>,$
180=<NULL>,$181=<NULL>,$182=<NULL>,$183=<NULL>,$184=<NULL>,$185=<NULL>,
$186=<NULL>,$187=<NULL>,$188=<NULL>,$189=<NULL>,$190=<NULL>,
$191=<NULL>,$19
2=<NULL>,$193=<NULL>,$194=<NULL>,$195=<NULL>,$196=<NULL>,$197=<NULL>,
$198=<NULL>,$199=<NULL>,$200=<NULL>,$201=<NULL>,$202=<NULL>,
$203=<NULL>,$204=
<NULL>,$205=<NULL>,$206=<NULL>,$207=<NULL>,$208=<0>,$209=<NULL>,
$210=<fik >,$211=<NULL>,$212=<InitUserId1>,$213=<NULL>,
$214=<TargetUserId1>,$215=<
NULL>,$216=<NULL>,$217=<NULL>,$218=<NULL>,$219=<NULL>,$220=<NULL>,
$221=<NULL>,$222=<-1062731760>,$223=<-1062731761>,$224=<NULL>,
$225=<NULL>,$226=<
0>,$227=<0>,$228=<NULL>,$229=<NULL>,$230=<NULL>,$231=<NULL>,
$232=<NULL>,$233=<NULL>,$234=<NULL>,$235=<NULL>,$236=<NULL>,
$237=<NULL>,$238=<NULL>,$2
39=<NULL>,$240=<NULL>,$241=<NULL>,$242=<NULL>,$243=<NULL>,$244=<NULL>,
$245=<NULL>,$246=<NULL>,$247=<NULL>,$248=<NULL>,$249=<NULL>,
$250=<NULL>,$251
=<NULL>,$252=<NULL>,$253=<NULL>,$254=<NULL>,$255=<NULL>,$256=<NULL>,
$257=<NULL>,$258=<NULL>,$259=<NULL>,$260=<NULL>,$261=<NULL>,
$262=<NULL>,$263=<
NULL>,$264=<NULL>,$265=<NULL>,$266=<NULL>,$267=<NULL>,$268=<NULL>,
$269=<NULL>,$270=<NULL>,$271=<NULL>,$272=<NULL>,$273=<NULL>,
$274=<NULL>,$275=<NU
LL>,$276=<NULL>,$277=<NULL>,$278=<NULL>,$279=<NULL>,$280=<NULL>,
$281=<NULL>,$282=<NULL>,$283=<NULL>,$284=<NULL>,$285=<NULL>,
$286=<NULL>,$287=<NULL
>,$288=<NULL>,$289=<NULL>,$290=<NULL>,$291=<NULL>,$292=<NULL>,$293=<NULL>,$294=<NULL>,$295=<NULL>,$296=<NULL>,$297=<NULL>,$298=<NULL>,$299=<NULL>,
$300=<NULL>,$301=<NULL>,$302=<NULL>,$303=<NULL>,$304=<NULL>,
$305=<NULL>,$306=<NULL>,$307=<NULL>,$308=<NULL>,$309=<NULL>,
$310=<NULL>,$311=<NULL>,$3
12=<NULL>,$313=<NULL>,$314=<NULL>,$315=<NULL>,$316=<NULL>,$317=<NULL>,
$318=<NULL>,$319=<NULL>,$320=<NULL>,$321=<NULL>,$322=<NULL>,
$323=<NULL>,$324
=<NULL>,$325=<NULL>,$326=<NULL>,$327=<NULL>,$328=<NULL>,$329=<NULL>,
$330=<NULL>)


From: Kris Jurka <books(at)ejurka(dot)com>
To: John <jgassner(at)gmail(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Re: Connection hanging on INSERT apparently due to large batch size and 4 CPU cores
Date: 2008-10-29 17:54:23
Message-ID: Pine.BSO.4.64.0810291326250.14951@leary.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On Sat, 25 Oct 2008, John wrote:

> Could it be related to the INSERT being done on a table that has 330
> columns?
>

Yes, that's it (or at least part of it). In the simple case of using the
same, known parameter types for each batch entry, the data returned from
the server has no dependence on parameter count. The issue arises when
binding a parameter with unspecified type. In the attached
SimpleDeadLockDemo, you can see the driver issuing repeated
Describe(statement=S_1) which results in a ParameterDescription message
whose size depends on the number of parameters. With a large parameter
count, the backend is going to fill its network buffer with this data and
we'll get the deadlock you're seeing.

In this case we've supplied an explicit parameter type at parse time, so
describing it isn't going to tell us anything new. Even if it was going
to provide us information, we only need to issue the describe once, not
for every batch entry. I'm not sure how complicated a solution might be,
but at least I understand what's going on now.

Also attached (BatchDeadLock) is a test case that reliably locks up as
you've described.

Kris Jurka

Attachment Content-Type Size
SimpleDeadLockDemo.java text/plain 693 bytes
simple-output.txt text/plain 4.0 KB
BatchDeadLock.java text/plain 1.2 KB