bytea memory improvement

Lists: pgsql-jdbc
From: Luis Vilar Flores <lflores(at)evolute(dot)pt>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: bytea memory improvement
Date: 2006-06-13 12:41:47
Message-ID: 448EB28B.2020301@evolute.pt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


The current postgresql driver has some memory issues when reading
bytea fields from the backend.

The problem is in the class org.postgresql.util.PGbytea, in method
public static byte[] toBytes(byte[] s).
This method (as I understood it) translates from the wire protocol to
the java byte[] for the user. The current implementation uses 2 buffers
(the receiving buffer and one temp) with the wire size plus one smaller
buffer (the final translated buffer).
For big files (bytea fields can be as big as 2GB) this is too
expensive in RAM (to download a field of 100MB I need at least 300MB
free in the client).
One workaround could be to translate inplace on the receive buffer
(I think it is not used for anything else), and then copy to the right
size final buffer, but this would imply to alter the the receiving
buffer, not very elegant.

My solution is to have a threshold (I think 1MB is a balanced value)
and below that execute as always, 3 buffers. Above, do an extra cycle
through the incoming buffer, compute the final buffer size, and then
behave as befoe (but skip the last part - we already have the right size
of the buffer, so we don't need to do the last copy).

I've implemented the code (it's very simple), and tested it
(comparing the old function and the new) and it look ok.

The overhead for passing one more time in the initial buffer is about
30ms for each 5MB in a Celeron M 1.6GHz.

I hope this code (or some improved version of it) could make it's
way into the driver, I need to work with lots of big bytea fields and
the memory constraints are very hard to meet.

Thanks for the nice work,
--

Luis Flores

Analista de Sistemas

*Evolute* - Consultoria Informática

<http://www.evolute.pt> Email: lflores(at)evolute(dot)pt
<mailto:lflores(at)evolute(dot)pt>

Tel: (+351) 212949689

AVISO DE CONFIDENCIALIDADE
Esta mensagem de correio electrónico e eventuais ficheiros anexos são
confidenciais e destinados apenas à(s) pessoa(s) ou entidade(s) acima
referida(s), podendo conter informação privilegiada e confidencial, a
qual não poderá ser divulgada, copiada, gravada ou distribuída nos
termos da lei vigente. Caso não seja o destinatário da mensagem, ou se
ela lhe foi enviada por engano, agradecemos que não faça uso ou
divulgação da mesma. A distribuição ou utilização da informação nela
contida é interdita. Se recebeu esta mensagem por engano, por favor
notifique o remetente e apague este e-mail do seu sistema. Obrigado.

CONFIDENTIALITY NOTICE
This e-mail transmission and eventual attached files are intended only
for the use of the individual(s) or entity(ies) named above and may
contain information that is both privileged and confidential and is
exempt from disclosure under applicable law. If you are not the intended
recipient, you are hereby notified that any disclosure, copying,
distribution or use of any of the information contained in this
transmission is strictly restricted. If by any means you have received
this transmission in error, please immediately notify the sender and
delete this e-mail from your system. Thank you.

Attachment Content-Type Size
PGbytea.java text/plain 4.0 KB

From: Luis Vilar Flores <lflores(at)evolute(dot)pt>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: bytea memory improvement
Date: 2006-06-28 22:10:16
Message-ID: 6C675D36-3097-4B3A-A4E5-319C2AAD5EAC@evolute.pt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


Can any JDBC developer give some feedback about this patch ? Memory
usage is a real issue for a few applications ...

Please give some info about the problem, if the patch is not
suitable, if is going to be commited to cvs ...

Thanks,

Luis Flores

Analista de Sistemas

Evolute - Consultoria Informática

Email: lflores(at)evolute(dot)pt

Tel: (+351) 212949689

AVISO DE CONFIDENCIALIDADE
Esta mensagem de correio electrónico e eventuais ficheiros anexos são
confidenciais e destinados apenas à(s) pessoa(s) ou entidade(s) acima
referida(s), podendo conter informação privilegiada e confidencial, a
qual não poderá ser divulgada, copiada, gravada ou distribuída nos
termos da lei vigente. Caso não seja o destinatário da mensagem, ou
se ela lhe foi enviada por engano, agradecemos que não faça uso ou
divulgação da mesma. A distribuição ou utilização da informação nela
contida é interdita. Se recebeu esta mensagem por engano, por favor
notifique o remetente e apague este e-mail do seu sistema. Obrigado.

CONFIDENTIALITY NOTICE
This e-mail transmission and eventual attached files are intended
only for the use of the individual(s) or entity(ies) named above and
may contain information that is both privileged and confidential and
is exempt from disclosure under applicable law. If you are not the
intended recipient, you are hereby notified that any disclosure,
copying, distribution or use of any of the information contained in
this transmission is strictly restricted. If by any means you have
received this transmission in error, please immediately notify the
sender and delete this e-mail from your system. Thank you.


From: Kris Jurka <books(at)ejurka(dot)com>
To: Luis Vilar Flores <lflores(at)evolute(dot)pt>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: bytea memory improvement
Date: 2006-06-29 23:08:27
Message-ID: Pine.BSO.4.63.0606291801150.26248@leary2.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On Wed, 28 Jun 2006, Luis Vilar Flores wrote:

> Can any JDBC developer give some feedback about this patch ? Memory
> usage is a real issue for a few applications ...
>

It looks like a reasonable thing to do, but could you give us some more
details on the cost/benefits? Your original email said it cost an extra
30ms for 5MB of data on your machine. What's percentage of the original
cost is this? Also you could be more clear on what percentage of memory
this saves. For the worst case scenario your going to get four bytes of
escaped data for every real byte so the total size of the original method
would be 4 + 4 + 1 and for the new method 4 + 1, so a savings of 44%? Is
that what you've calculated?

Finally you haven't actually submitted a patch, you've just sent a
modified copy of a whole file. Since it's a small file that changes
infrequently it's not a big deal, but we prefer context diffs if
you can.

Kris Jurka


From: Luis Vilar Flores <lflores(at)evolute(dot)pt>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: bytea memory improvement
Date: 2006-06-30 23:24:51
Message-ID: 44A5B2C3.5040506@evolute.pt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Kris Jurka wrote:
>
>
> On Wed, 28 Jun 2006, Luis Vilar Flores wrote:
>
>> Can any JDBC developer give some feedback about this patch ?
>> Memory usage is a real issue for a few applications ...
>>
>
> It looks like a reasonable thing to do, but could you give us some
> more details on the cost/benefits?
My sugestion is to trade some CPU time for memory in large files.
The benefits are: memory saving (one byte[] of final size at least),
less garbage collecting, these buffers are always created on a bytea read,
we also save the CPU time of one System.arraycopy, but this is a native
method and should be very fast, so it's not a big deal.
The only cost is one more pass through the original buffer, this pass
is just to count the final array size.

In simple terms, the original method did:
incoming buffer size N
allocate temp buffer size N
for through incoming buffer, translate data from incoming to temp
allocate final buffer, size M (M is between N and N/4)
copy temp buffer to final buffer
return final buffer

The new method does:
incoming buffer size N
for through incoming buffer, calculate final buffer size
allocate final buffer, size M (M is between N and N/4)
for through incoming buffer, translate data from incoming to final
return final buffer

> Your original email said it cost an extra 30ms for 5MB of data on
> your machine. What's percentage
> of the original cost is this?
This cost is to read and count the escaped bytes of a 5MB byte array in
a for cycle on a CeleronM 1.6, but keep in mind that we save the last
System.arraycopy (last if in the method).
Originally this extra passage didn't existed.
I will make more detailed timing in the full method body (old and new),
and send results tomorrow.
> Also you could be more clear on what percentage of memory this saves.
> For the worst case scenario your going to get four bytes of escaped
> data for every real byte so the total size of the original method
> would be 4 + 4 + 1 and for the new method 4 + 1, so a savings of 44%?
> Is that what you've calculated?
Yeap, that's for files larger than 1MB on the incoming buffer (the
threshold could be defined by some property) - with the extra pass on
the incoming array I calculate the final size, so I can skip the temp
buffer, so we save the 44% (on worst case), the minimum savings is 16.7%
for no escaped data.
>
> Finally you haven't actually submitted a patch, you've just sent a
> modified copy of a whole file. Since it's a small file that changes
> infrequently it's not a big deal, but we prefer context diffs if you can.
>
> Kris Jurka
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
I will send the diff too.

Thanks for the comments ...

--

Luis Flores

Analista de Sistemas

*Evolute* - Consultoria Informática

<http://www.evolute.pt> Email: lflores(at)evolute(dot)pt
<mailto:lflores(at)evolute(dot)pt>

Tel: (+351) 212949689

AVISO DE CONFIDENCIALIDADE
Esta mensagem de correio electrónico e eventuais ficheiros anexos são
confidenciais e destinados apenas à(s) pessoa(s) ou entidade(s) acima
referida(s), podendo conter informação privilegiada e confidencial, a
qual não poderá ser divulgada, copiada, gravada ou distribuída nos
termos da lei vigente. Caso não seja o destinatário da mensagem, ou se
ela lhe foi enviada por engano, agradecemos que não faça uso ou
divulgação da mesma. A distribuição ou utilização da informação nela
contida é interdita. Se recebeu esta mensagem por engano, por favor
notifique o remetente e apague este e-mail do seu sistema. Obrigado.

CONFIDENTIALITY NOTICE
This e-mail transmission and eventual attached files are intended only
for the use of the individual(s) or entity(ies) named above and may
contain information that is both privileged and confidential and is
exempt from disclosure under applicable law. If you are not the intended
recipient, you are hereby notified that any disclosure, copying,
distribution or use of any of the information contained in this
transmission is strictly restricted. If by any means you have received
this transmission in error, please immediately notify the sender and
delete this e-mail from your system. Thank you.