High memory usage with savepoints & encoding differences

Lists: pgsql-general
From: "Dylan Adams" <dadams(at)bybaxter(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: High memory usage with savepoints & encoding differences
Date: 2009-04-17 22:57:49
Message-ID: DF0E97345AF8CB4F839616E5FB1FFBB104759E@enterprise.bybaxter.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I'm running into a situation where postmaster is consuming significantly
more memory than I would expect. This only seems to happen when I
combine savepoints with differences between client and database
encoding. I originally discovered this while running some Java code
which uses JDBC (the postgres JDBC driver always sets client_encoding to
UNICODE) to connect to a latin1 encoded database, but it's reproducible
with psql as well.

Here's a script which generates sql that triggers the unexpected
behavior:

dadams(at)postgres:/tmp > cat pg_savepoint.bash
#!/bin/bash

echo "SET client_encoding TO $1;"
echo "BEGIN WORK;"
for ((i=0; i<$2; i++))
do
echo "SAVEPOINT x;"
echo "RELEASE SAVEPOINT x;"
done

echo "COMMIT WORK;"

Given these databases:

dadams(at)postgres:/tmp > psql -l
List of databases
Name | Owner | Encoding
-------------+----------+----------
postgres | postgres | UTF8
template0 | postgres | UTF8
template1 | postgres | UTF8
test_latin1 | dadams | LATIN1
test_utf8 | dadams | UTF8
(5 rows)

All memory usages are those reported by top as VIRT.

If I run "./pg_savepoint.bash UNICODE 100000 | psql -q test_utf8" or
"./pg_savepoint.bash latin1 100000 | psql -q test_latin1", the memory
usage of postmaster remains relatively constant at about 50mb.

But when I run "./pg_savepoint.bash UNICODE 100000 | psql -q
test_latin1" or "./pg_savepoint.bash latin1 100000 | psql -q test_utf8",
the postmaster quickly consumes hundreds of mbs, topping out at
800-900mb.

If I omit the SAVEPOINT/RELEASE SAVEPOINT statements and, instead, do
repeated INSERTs into a temporary table, the memory usage remains
reasonable (~50mb), regardless of differences between client and
database encoding.

This doesn't seem to happen in all cases of mixed encodings. Mixing
win1252 and latin1 seems to be fine. It seems that only UNICODE/UTF8 <->
single byte charsets triggers this, although I wasn't exhaustive in my
testing.

I've tested this with 8.3.7 and 8.3.5, running on CentOS 5.2 and 4
update 6, respectively. Postgres was installed from the official RPMs
downloaded from http://www.postgresql.org/ftp/binary/. I'm seeing the
same behavior on enterprisedb's 8.3.5 One-click installer Postgres for
Windows (although, using CP1252 instead of LATIN1).

postgresql.conf is how initdb created it.

Is this expected behavior? Or am I missing something? This seems really
weird.

thanks,
dylan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Dylan Adams" <dadams(at)bybaxter(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: High memory usage with savepoints & encoding differences
Date: 2009-04-18 00:03:48
Message-ID: 1762.1240013028@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Dylan Adams" <dadams(at)bybaxter(dot)com> writes:
> I'm running into a situation where postmaster is consuming significantly
> more memory than I would expect. This only seems to happen when I
> combine savepoints with differences between client and database
> encoding. I originally discovered this while running some Java code
> which uses JDBC (the postgres JDBC driver always sets client_encoding to
> UNICODE) to connect to a latin1 encoded database, but it's reproducible
> with psql as well.

I think this example is pretty artificial. The fundamental reason
memory is increasing is that each subtransaction can require some state
storage. In the example the per-subtransaction CurTransactionContexts
are not getting used for anything except encoding conversion on the
SAVEPOINT/RELEASE command completion messages --- but if you were doing
any real work in the subtransactions then most likely there would be
additional stuff there, so I'm not excited about trying to suppress
this particular symptom.

The bottom line is that lots and lots of subtransactions isn't a very
good thing for performance, especially with a couple of network round
trips for each one. Consider pushing whatever work is involved here
into a server-side function.

regards, tom lane