Bytea network traffic: binary vs text result format

Lists: pgsql-interfaces
From: "Miha D(dot) Puc" <miha(dot)puc(at)eba(dot)si>
To: pgsql-interfaces(at)postgresql(dot)org
Subject: Bytea network traffic: binary vs text result format
Date: 2007-06-03 10:36:23
Message-ID: 466299A7.20403@eba.si
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces

Hi!

There was some debate recently about using text or binary format.
There's people who would like to use it but have trouble converting
binary encoded results into native types and there's people that say
there's not much performance difference.

I'd like to stress that performance is very different over slow
network. The biggest difference is for byte where the text format
performance is about 3.5 times worse at inserts and updates and about
2.9 times worse at selects . Here's the reasoning:

In text format bytea are escaped using PQescapeBytea. In an average
binary stream about 2/3 would be escaped. Each escaped byte becomes of
form \\ooo at upload and of form \ooo for download, so the size of the
escaped stream is 1/3 + 2/3 * 5 = 11/3 = 3.6 and 1/3 + 2/3 * 4 = 3
respectively.

Here are the results of my test. I inserted and selected an OpenOffice
document of size 2Mb over a 2M/512K cable.
text format:
insert: 120.1s
select: 24.9s
binary format:
insert: 33.5s
select: 8.6s
factor:
insert: 3.6
select: 2.9

The difference between the test and the above calculation comes from the
estimate that 2/3 of bytes are escaped where in fact 95 out of 256 are
escaped (63%).

So there is a need (people asking) and reason (performance) to use
binary format. But there's a huge drawback - the conversions. It's easy
for varchar, not too bad for basic types (int, float, bool), effort is
needed for timestamp, date, time and numeric is a pain.

So with all the above there should be a utility for conversion between
binary format and native types and/or string format in libpq.

Regards,
Miha Puc


From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: "Miha D(dot) Puc" <miha(dot)puc(at)eba(dot)si>
Cc: pgsql-interfaces(at)postgresql(dot)org
Subject: Re: Bytea network traffic: binary vs text result format
Date: 2007-06-04 12:54:39
Message-ID: 46640B8F.8040503@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces

Hi,

Miha D. Puc wrote:
> So there is a need (people asking) and reason (performance) to use
> binary format.

You are aware that PostgreSQL itself *can* transfer values in binary
format? Check the Documentation: "43.1.3. Formats and Format Codes":

http://www.postgresql.org/docs/8.1/interactive/protocol.html

This works since protocol version 3, AFAICT. The client needs to support
that, though. But the php pgsql binding, just as an example, doesn't use
that feature.

> So with all the above there should be a utility for conversion between
> binary format and native types and/or string format in libpq.

There already are. Not in libpq, though. Most (if not all) internal
types have those functions. See for example:

# SELECT typinput, typoutput, typreceive, typsend
# FROM pg_type WHERE typname='int4';

typinput | typoutput | typreceive | typsend
----------+-----------+------------+----------
int4in | int4out | int4recv | int4send
(1 row)

The input and output functions deal with the textual representation,
while send and receive convert to a binary representation in network
byte order.

Hope that helps.

Regards

Markus


From: "Wilhansen Li" <willi(dot)t1(at)gmail(dot)com>
To: pgsql-interfaces(at)postgresql(dot)org
Subject: Re: Bytea network traffic: binary vs text result format
Date: 2007-06-04 14:51:13
Message-ID: bc9549a50706040751k313ef481t88b8966d42941ef9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces

On 6/4/07, Markus Schiltknecht <markus(at)bluegap(dot)ch> wrote:
>
>
> There already are. Not in libpq, though. Most (if not all) internal
> types have those functions.

Well, I don't think it would be puzzling that someone will ask for them even
if those functions already exist in PostgreSQL because, as you stated, it's
not in libpq. People will actually have to dig in the source to pinpoint
where in the source tree those functions exist and how to include them in
the program, which, I assume, is less ideal than when it's already included
in libpq in the first place. There was a post before on a user who got
disappointed because of the "crappy" support of libpq (not PostgreSQL) for
binary formats.

In line with this, I'm aware of the issues that this may pose:
http://archives.postgresql.org/pgsql-hackers/1999-08/msg00374.php this is
already very old (8 years
ago!). And they discuss about the issue that the representation might
change from version to version so it's not done. There have been plans
to incorporate CORBA,
which IMHO is an overkill,
to solve this problem but I don't think it's done yet because it's
probably too complex (?). I'd rather recommend them to use
ASN.1 (if that's feasible..).

--
Life is too short for dial-up.


From: Andrew McNamara <andrewm(at)object-craft(dot)com(dot)au>
To: "Miha D(dot) Puc" <miha(dot)puc(at)eba(dot)si>
Cc: pgsql-interfaces(at)postgresql(dot)org
Subject: Re: Bytea network traffic: binary vs text result format
Date: 2007-06-05 00:57:31
Message-ID: 20070605005731.F0D606007A7@longblack.object-craft.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces

>There was some debate recently about using text or binary format.
>There's people who would like to use it but have trouble converting
>binary encoded results into native types and there's people that say
>there's not much performance difference.
>
>I'd like to stress that performance is very different over slow
>network. The biggest difference is for byte where the text format
>performance is about 3.5 times worse at inserts and updates and about
>2.9 times worse at selects . Here's the reasoning:

You're referring to the worst case of the text format - handling pure
binary data - and yes, in this case, up to 4 times as many bytes can
flow over the network. If the network is your constraining factor, then
this will be significant. But in many other cases (fast local network),
the disks are the limiting factor, or other data types dominate, and in
those cases, the text format can actually be smaller.

>So there is a need (people asking) and reason (performance) to use
>binary format. But there's a huge drawback - the conversions. It's easy
>for varchar, not too bad for basic types (int, float, bool), effort is
>needed for timestamp, date, time and numeric is a pain.
>
>So with all the above there should be a utility for conversion between
>binary format and native types and/or string format in libpq.

And there's the rub... the "basic" types you mention all have standard C
representations, but there is no standard C type for timestamp, date, time
and numeric, so what would libpq convert to? Any chosen format will only
suite a subset of users, and will result in double conversions for others
who are constrained by the existing types used within their application.

I'm not sure what the answer is - certainly documenting the wire format
would be a good first step.

--
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/