Re: libpq and Binary Data Formats

Lists: pgsql-hackers
From: "Wilhansen Li" <willi(dot)t1(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: libpq and Binary Data Formats
Date: 2007-06-04 15:52:24
Message-ID: bc9549a50706040852u27633f41ib1e6b09f8339d845@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

First of all, apologies if this was not meant to be a feedback/wishlist
mailing list.

Binary formats in libpq has been (probably) a long
issue (refer to the listings below) and I want to express my hope that the
next revision of PostgreSQL would have better support for binary data types
in libpq. I am in no doubt that those binary vs. text debates sprouted
because of PostgreSQL's (or rather libpq's) ambiguity when it comes to
binary data support. One instance is the documentation itself: it didn't
really say (correct me if I'm wrong) that binary data is poorly/not
supported and that textual data is preferred. Moreover, those ambiguities
are only cleared up in mailing lists/irc/forums which make it seem that the
arguments for text data is just an excuse to not have proper support for
binary data (e.x. C:"Elephant doesn't support Hammer!" P: "You don't really
need Hammer (we don't support it yet), you can do it with Screwdriver.").
This is not meant to be a binary vs. text post so I'll reserve my comments
for them. Nevertheless, they each have their own advantages and
disadvantages especially when it comes to strongly typed languages that
neither shouldn't be ignored.

I am well-aware of the problems associated with binary formats and
backward/forward compatibility:
http://archives.postgresql.org/pgsql-hackers/1999-08/msg00374.php but
nevertheless, that shouldn't stop PostgreSQL/libpq's hardworking developers
from coming up with a solution. The earling link showed the interest of
using CORBA to handle PostgreSQL objects but I belive that it's an overkill
and would like to propose using ASN.1 instead. However, what's important is
not really the binary/text representation. If we look again the the list
below, not everyone need binary formats just for speed and efficiency,
rather, they need it to be able to easily manipulate data. In other words,
the interfaces to extract data is also important.

Best wishes,
Wil

NOTES/History of Posts:

1: "Query regarding PostgreSQL date/time binary format for libpq" <
http://archives.postgresql.org/pgsql-interfaces/2007-01/msg00040.php> One of
the many (clueless) individuals who wants to get the binary format of the
date/time struct (I know that there's a way to do this be converting the
time to epoch using extract(epoch from time) to convert it to somthing akin
to time_t)
2. "Bytea network traffic: binary vs text result format" <
http://archives.postgresql.org/pgsql-interfaces/2007-06/msg00000.php> One of
the many Binary vs. Text debates.
3. "How do you convert PostgreSQL internal binary field to C datatypes" <
http://archives.postgresql.org/pgsql-interfaces/2007-05/msg00046.php> An
individual disgruntled because of the "half baked C API" of PostgreSQL.
Although he may be wrong in some or many aspects, he has a point with
regards to the binary format support. Moreover, he is probably one of the
many individuals who are disappointed on PostgreSQL because of this.
4. "Array handling in libpq" <
http://archives.postgresql.org/pgsql-interfaces/2007-01/msg00027.php> One of
the common scenarios for the "need" of a binary format (or rather, a better
interface): arrays. Also, the reply of this is one of the many/redundant
assurances that the overhead of text is minimal.
5. "libpq PQexecParams and arrays" <
http://archives.postgresql.org/pgsql-interfaces/2006-06/msg00008.php>
Another one of those array issues. This time, the poster/s have expressed
that the documentation for binary formats is "poorly documented :-("
6. "PQgetvalue failed to return column value for non-text data in binary
format" <
http://archives.postgresql.org/pgsql-interfaces/2007-05/msg00045.php>
Another issue about binary formats paired with the assurance (again) that
the overhead of using text is minimal.
--
(<_<)(>_>)(>_<)(<.<)(>.>)(>.<)
Life is too short for dial-up.


From: Richard Huxton <dev(at)archonet(dot)com>
To: Wilhansen Li <willi(dot)t1(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: libpq and Binary Data Formats
Date: 2007-06-04 16:23:14
Message-ID: 46643C72.9090107@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Wilhansen Li wrote:
> First of all, apologies if this was not meant to be a feedback/wishlist
> mailing list.
>
> Binary formats in libpq has been (probably) a long
> issue (refer to the listings below) and I want to express my hope that the
> next revision of PostgreSQL would have better support for binary data types
> in libpq.

Um - speaking as a user, not a developer, I don't actually see a
description of what problem(s) you are suggesting be solved. Are you
saying there should be better documentation, or a new format?

--
Richard Huxton
Archonet Ltd


From: "Wilhansen Li" <willi(dot)t1(at)gmail(dot)com>
To: "Richard Huxton" <dev(at)archonet(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: libpq and Binary Data Formats
Date: 2007-06-04 17:19:42
Message-ID: bc9549a50706041019t71f8d02ew2c1cc6b8bb28d43e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Basically, better support for binary formats which includes, but not limited
to:
1) functions for converting to and from various datatypes
2) reducing the need to convert to and from network byte order
3) better documentation

My suggestion on using ASN.1 was merely a naive suggestion on how in can be
implemented properly without breaking (future) compatibility because that
seems to be the main problem which prevents the use of binary formats.

On 6/5/07, Richard Huxton <dev(at)archonet(dot)com> wrote:
>
> Wilhansen Li wrote:
> > First of all, apologies if this was not meant to be a feedback/wishlist
> > mailing list.
> >
> > Binary formats in libpq has been (probably) a long
> > issue (refer to the listings below) and I want to express my hope that
> the
> > next revision of PostgreSQL would have better support for binary data
> types
> > in libpq.
>
> Um - speaking as a user, not a developer, I don't actually see a
> description of what problem(s) you are suggesting be solved. Are you
> saying there should be better documentation, or a new format?
>
> --
> Richard Huxton
> Archonet Ltd
>

--
(<_<)(>_>)(>_<)(<.<)(>.>)(>.<)
Life is too short for dial-up.


From: Richard Huxton <dev(at)archonet(dot)com>
To: Wilhansen Li <willi(dot)t1(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: libpq and Binary Data Formats
Date: 2007-06-05 09:45:54
Message-ID: 466530D2.80009@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Wilhansen Li wrote:
> Basically, better support for binary formats which includes, but not
> limited
> to:
> 1) functions for converting to and from various datatypes
> 2) reducing the need to convert to and from network byte order
> 3) better documentation
>
> My suggestion on using ASN.1 was merely a naive suggestion on how in can be
> implemented properly without breaking (future) compatibility because that
> seems to be the main problem which prevents the use of binary formats.

Well, it sounds to me like this is two separate items: (1+2), (3).

For (3) there is the pgsql-docs mailing list. If you have
additions/changes, that's the place you want. Submissions in text are
fine, you don't need to worry about SGML formatting, but do discuss them
first. The documentation relies on people saying "I don't think this bit
is clear", so help is always welcome.

For (1+2) it sounds like what you actually want is a "native binary for
my application" protocol rather than "internal binary format" which is
sort of what's available now. Clearly "application binary" is an
addition rather than a replacement (unless everyone using binary
transfers thinks it's so much better they're happy to switch immediately).

A few obvious questions leap out at me:
1. What languages are you seeking to target: just "C"?
2. What platforms are you seeking to target: intel 32 bit? 64 bit?
powerpc? arm?
3. How much do I gain (and lose) over text transfer, and under what
circumstances?
4. What will happen with custom/user-defined types? Will they need their
own "adaptor" written to support this?

Crucially, I think you want to demonstrate #3 - that there's a clear
gain for all the work that's involved in defining a separate transfer
encoding. If you can demonstrate the gains are felt by all the
Perl/PHP/Java applications too that'd obviously help.

Bear in mind I'm just another user of PostgreSQL, not a developer, so
you could do everything I've said and still not interest core in making
changes. However, I've seen a lot of changes come and go and I think
you'll need to make progress on those 4 points to get anywhere.

--
Richard Huxton
Archonet Ltd


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Wilhansen Li" <willi(dot)t1(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: libpq and Binary Data Formats
Date: 2007-06-05 14:45:35
Message-ID: b42b73150706050745q5d2e04a0gf0b0a01bbadbf1e3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6/4/07, Wilhansen Li <willi(dot)t1(at)gmail(dot)com> wrote:
> First of all, apologies if this was not meant to be a feedback/wishlist
> mailing list.
>
> Binary formats in libpq has been (probably) a long issue (refer to the
> listings below) and I want to express my hope that the next
> revision of PostgreSQL would have better support for binary data types in
> libpq. I am in no doubt that those binary vs. text debates sprouted because
> of PostgreSQL's (or rather libpq's) ambiguity when it comes to binary data
> support. One instance is the documentation itself: it didn't really say
> (correct me if I'm wrong) that binary data is poorly/not supported and that
> textual data is preferred. Moreover, those ambiguities are only cleared up
> in mailing lists/irc/forums which make it seem that the arguments for text
> data is just an excuse to not have proper support for binary data ( e.x.
> C:"Elephant doesn't support Hammer!" P: "You don't really need Hammer (we
> don't support it yet), you can do it with Screwdriver."). This is not meant
> to be a binary vs. text post so I'll reserve my comments for them.
> Nevertheless, they each have their own advantages and disadvantages
> especially when it comes to strongly typed languages that neither shouldn't
> be ignored.
>
> I am well-aware of the problems associated with binary formats and
> backward/forward compatibility:
> http://archives.postgresql.org/pgsql-hackers/1999-08/msg00374.php
> but nevertheless, that shouldn't stop PostgreSQL/libpq's
> hardworking developers from coming up with a solution. The
> earling link showed the interest of using CORBA to handle PostgreSQL objects
> but I belive that it's an overkill and would like to propose using ASN.1
> instead. However, what's important is not really the binary/text
> representation. If we look again the the list below, not everyone need
> binary formats just for speed and efficiency, rather, they need it to be
> able to easily manipulate data. In other words, the interfaces to extract
> data is also important.

Personally, I wouldn't mind seeing the libpq API extended to support
arrays and record structures. PostgreSQL 8.3 is bringing arrays of
composite types and the lack of client side support of these
structures is becoming increasingly glaring. If set up with
text/binary switch, this would deal with at least part of your
objections.

I think most people here would agree that certain aspects of the
documentation of binary formats are a bit weak and could use
improvement (although, it's possible that certain formats were
deliberately not documented because they may change). A classy move
would be to make specific suggestions in -docs and produce a patch.

ISTM to me that many if not most people who are looking at binary
interfaces to the database are doing it for the wrong reasons and you
should consider that when reviewing historical discussions :-). Also,
dealing with large bytea types in the databases which is probably the
most common use case, is pretty well covered in libpq documentation
IMO.

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: Wilhansen Li <willi(dot)t1(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: libpq and Binary Data Formats
Date: 2007-06-05 15:30:37
Message-ID: 17583.1181057437@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Richard Huxton <dev(at)archonet(dot)com> writes:
> Wilhansen Li wrote:
>> Basically, better support for binary formats which includes, but not
>> limited
>> to:
>> 1) functions for converting to and from various datatypes
>> 2) reducing the need to convert to and from network byte order
>> 3) better documentation

> Well, it sounds to me like this is two separate items: (1+2), (3).

I could see adding more support in libpq for converting native int and
float types to and from the existing on-the-wire binary formats, rather
than making applications do it for themselves as is the case now. But I
think you've got 0 chance of persuading anyone that we should try to
support platform-dependent on-the-wire formats --- the potential
performance advantages are minimal and the added complexity large.
IOW, 1, 3 yes, 2 no.

regards, tom lane