Bytea network traffic: binary vs text result format

From: "Miha D(dot) Puc" <miha(dot)puc(at)eba(dot)si>
To: pgsql-interfaces(at)postgresql(dot)org
Subject: Bytea network traffic: binary vs text result format
Date: 2007-06-03 10:36:23
Message-ID: 466299A7.20403@eba.si
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-interfaces

Hi!

There was some debate recently about using text or binary format.
There's people who would like to use it but have trouble converting
binary encoded results into native types and there's people that say
there's not much performance difference.

I'd like to stress that performance is very different over slow
network. The biggest difference is for byte where the text format
performance is about 3.5 times worse at inserts and updates and about
2.9 times worse at selects . Here's the reasoning:

In text format bytea are escaped using PQescapeBytea. In an average
binary stream about 2/3 would be escaped. Each escaped byte becomes of
form \\ooo at upload and of form \ooo for download, so the size of the
escaped stream is 1/3 + 2/3 * 5 = 11/3 = 3.6 and 1/3 + 2/3 * 4 = 3
respectively.

Here are the results of my test. I inserted and selected an OpenOffice
document of size 2Mb over a 2M/512K cable.
text format:
insert: 120.1s
select: 24.9s
binary format:
insert: 33.5s
select: 8.6s
factor:
insert: 3.6
select: 2.9

The difference between the test and the above calculation comes from the
estimate that 2/3 of bytes are escaped where in fact 95 out of 256 are
escaped (63%).

So there is a need (people asking) and reason (performance) to use
binary format. But there's a huge drawback - the conversions. It's easy
for varchar, not too bad for basic types (int, float, bool), effort is
needed for timestamp, date, time and numeric is a pain.

So with all the above there should be a utility for conversion between
binary format and native types and/or string format in libpq.

Regards,
Miha Puc

Responses

Browse pgsql-interfaces by date

  From Date Subject
Next Message Markus Schiltknecht 2007-06-04 12:54:39 Re: Bytea network traffic: binary vs text result format
Previous Message Francisco Figueiredo Jr. 2007-05-30 15:23:42 Re: Question on NpgsqlParameter object