Re: Bytea/Base64 encoders for libpq - interested?

Lists: pgsql-hackerspgsql-patches
From: Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Bytea/Base64 encoders for libpq - interested?
Date: 2001-08-28 09:07:32
Message-ID: 5.1.0.14.0.20010828110111.00b07bd0@192.168.0.1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi!

Please find attached some very simple encoders/decoders for bytea and base64.
Bytea encoder is very picky about what it leaves unescaped - basically the
base64
char set ;-)

Since this seems to be a very poorly documented but much asked-for thing, I
thought
you would maybe like to add this code to libpq (so that everyone benefits).

I'm aware that function renames might be necessary, though.
If you like, I could make the code fit into libpq, and send diffs.

Any comments/interests?

Greetings,
Joerg

Attachment Content-Type Size
simple-enc.tar.gz application/octet-stream 3.7 KB

From: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
To: Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-08-28 09:55:28
Message-ID: 20010828115528.A12403@zf.jcu.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Aug 28, 2001 at 11:07:32AM +0200, Joerg Hessdoerfer wrote:
> Hi!
>
> Please find attached some very simple encoders/decoders for bytea and base64.
> Bytea encoder is very picky about what it leaves unescaped - basically the
> base64
> char set ;-)
>
> Since this seems to be a very poorly documented but much asked-for thing, I
> thought
> you would maybe like to add this code to libpq (so that everyone benefits).
>
> I'm aware that function renames might be necessary, though.
> If you like, I could make the code fit into libpq, and send diffs.
>
> Any comments/interests?

What implement base64 PostgreSQL datetype that use externaly base64 and
internaly same things as bytea. It prevent FE and parser problems with
"bad" chars and internaly for data storage save less space than text
with base64. Of course it doesn't solve a problem with encoding/decoding
data in your application to/from base64. May be implement for this
datetype cast to/from bytea too.

SELECT my_bytea::base64 FROM foo;

INSERT INTO foo (my_bytea) VALUES ('some_base64_string'::bytea);

And you can still fetch all data directly in batea by binary cursor.

Comments?

Karel
--
Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
http://home.zf.jcu.cz/~zakkr/

C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz


From: Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>
To: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>, Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-08-29 01:58:12
Message-ID: 3.0.5.32.20010829095812.008453f0@192.228.128.13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

At 11:55 AM 28-08-2001 +0200, Karel Zak wrote:
>
> What implement base64 PostgreSQL datetype that use externaly base64 and
>internaly same things as bytea. It prevent FE and parser problems with
>"bad" chars and internaly for data storage save less space than text
>with base64. Of course it doesn't solve a problem with encoding/decoding
>data in your application to/from base64. May be implement for this
>datetype cast to/from bytea too.
>
> SELECT my_bytea::base64 FROM foo;
>
> INSERT INTO foo (my_bytea) VALUES ('some_base64_string'::bytea);
>
> And you can still fetch all data directly in batea by binary cursor.
>
> Comments?

Sounds good to me. Even better if the base64 parser is bulletproof and
tolerant of junk. That way base64 email attachments may not even need to be
processed much - just filter a bit and shove it in :).

But shouldn't there be a ::base64 somewhere in the insert statement?

Cheerio,
Link.


From: Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>
To: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>, Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-08-29 02:22:22
Message-ID: 3.0.5.32.20010829102222.00853760@192.228.128.13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

At 11:55 AM 28-08-2001 +0200, Karel Zak wrote:
> What implement base64 PostgreSQL datetype that use externaly base64 and
>internaly same things as bytea. It prevent FE and parser problems with

Another point:

I have no problems with base64[1]. However I was thinking that it might be
far easier for the C/C++/Java (and other low level languages) bunch to do
hexadecimal. e.g. zero zero for null, zero A for line feed.

It expands things in the input/output stream, but it might be worth some
consideration. Simplicity, cpu usage etc.

Cheerio,
Link.

[1] OK, I can't convert base64 to ASCII mentally yet. But I don't think
that should really a factor.


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
Cc: Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 00:16:52
Message-ID: 200109040016.f840Gqa04205@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Where did we leave this?

> On Tue, Aug 28, 2001 at 11:07:32AM +0200, Joerg Hessdoerfer wrote:
> > Hi!
> >
> > Please find attached some very simple encoders/decoders for bytea and base64.
> > Bytea encoder is very picky about what it leaves unescaped - basically the
> > base64
> > char set ;-)
> >
> > Since this seems to be a very poorly documented but much asked-for thing, I
> > thought
> > you would maybe like to add this code to libpq (so that everyone benefits).
> >
> > I'm aware that function renames might be necessary, though.
> > If you like, I could make the code fit into libpq, and send diffs.
> >
> > Any comments/interests?
>
> What implement base64 PostgreSQL datetype that use externaly base64 and
> internaly same things as bytea. It prevent FE and parser problems with
> "bad" chars and internaly for data storage save less space than text
> with base64. Of course it doesn't solve a problem with encoding/decoding
> data in your application to/from base64. May be implement for this
> datetype cast to/from bytea too.
>
> SELECT my_bytea::base64 FROM foo;
>
> INSERT INTO foo (my_bytea) VALUES ('some_base64_string'::bytea);
>
> And you can still fetch all data directly in batea by binary cursor.
>
> Comments?
>
> Karel
> --
> Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
> http://home.zf.jcu.cz/~zakkr/
>
> C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>, Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 00:48:22
Message-ID: 15728.999564502@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Where did we leave this?

I don't think adding a datatype just to provide base64 encoding is
a wise approach. The overhead of a new datatype (in the sense of
providing operators/functions for it) will be much more than the
benefit. I think providing encode/decode functions is sufficient...
and we have those already, don't we?

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>, Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 02:14:50
Message-ID: 200109040214.f842EoT11100@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Where did we leave this?
>
> I don't think adding a datatype just to provide base64 encoding is
> a wise approach. The overhead of a new datatype (in the sense of
> providing operators/functions for it) will be much more than the
> benefit. I think providing encode/decode functions is sufficient...
> and we have those already, don't we?

Agreed.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: "Joe Conway" <joseph(dot)conway(at)home(dot)com>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Karel Zak" <zakkr(at)zf(dot)jcu(dot)cz>, "Joerg Hessdoerfer" <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 03:25:29
Message-ID: 009701c134f1$3fe799b0$0705a8c0@jecw2k1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> I don't think adding a datatype just to provide base64 encoding is
> a wise approach. The overhead of a new datatype (in the sense of
> providing operators/functions for it) will be much more than the
> benefit. I think providing encode/decode functions is sufficient...
> and we have those already, don't we?
>

It might be nice to have a PQbyteaEscape or some such function available in
the libpq client library so that arbitrary binary could be escaped on the
client side and used in a sql statement. I actually wrote this already as an
addition to the PHP PostgreSQL extension, but it would make more sense, now
that I think about it, for it to be in libpq and called from PHP (or
whatever). Comments?

On a related note, are there any other bytea functions we should have in the
backend before freezing for 7.2? I was thinking it would be nice to have a
way to cast bytea into text and vice-versa, so that the normal text
functions could be used for things like LIKE and concatenation. Any interest
in this? If so, any guidance WRT how it should be implemented?

-- Joe


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Joe Conway <joseph(dot)conway(at)home(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>, Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 04:12:35
Message-ID: 200109040412.f844CZF19157@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> > I don't think adding a datatype just to provide base64 encoding is
> > a wise approach. The overhead of a new datatype (in the sense of
> > providing operators/functions for it) will be much more than the
> > benefit. I think providing encode/decode functions is sufficient...
> > and we have those already, don't we?
> >
>
> It might be nice to have a PQbyteaEscape or some such function available in
> the libpq client library so that arbitrary binary could be escaped on the
> client side and used in a sql statement. I actually wrote this already as an
> addition to the PHP PostgreSQL extension, but it would make more sense, now
> that I think about it, for it to be in libpq and called from PHP (or
> whatever). Comments?

Good idea. I will commit the non-bytea escape in a day and you can base
a bytea one on that. You will have to pass in the length of the field
because of course it is not null terminated.

> On a related note, are there any other bytea functions we should have in the
> backend before freezing for 7.2? I was thinking it would be nice to have a
> way to cast bytea into text and vice-versa, so that the normal text
> functions could be used for things like LIKE and concatenation. Any interest
> in this? If so, any guidance WRT how it should be implemented?

I can't see why you can't do that. The only problem is passing a \0
(null byte) back to the client.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joe Conway" <joseph(dot)conway(at)home(dot)com>
Cc: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Karel Zak" <zakkr(at)zf(dot)jcu(dot)cz>, "Joerg Hessdoerfer" <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 04:45:05
Message-ID: 24827.999578705@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Joe Conway" <joseph(dot)conway(at)home(dot)com> writes:
> I was thinking it would be nice to have a
> way to cast bytea into text and vice-versa,

How will you handle a null byte in bytea data? Transforming it directly
into an embedded null in a text object is NOT an acceptable answer,
because too many of the text functions will misbehave on such data.

regards, tom lane


From: "Joe Conway" <joseph(dot)conway(at)home(dot)com>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Karel Zak" <zakkr(at)zf(dot)jcu(dot)cz>, "Joerg Hessdoerfer" <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 04:55:23
Message-ID: 016101c134fd$ce757920$0705a8c0@jecw2k1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> > It might be nice to have a PQbyteaEscape or some such function available
in
> > the libpq client library so that arbitrary binary could be escaped on
the
> > client side and used in a sql statement. I actually wrote this already
as an
> > addition to the PHP PostgreSQL extension, but it would make more sense,
now
> > that I think about it, for it to be in libpq and called from PHP (or
> > whatever). Comments?
>
> Good idea. I will commit the non-bytea escape in a day and you can base
> a bytea one on that. You will have to pass in the length of the field
> because of course it is not null terminated.

OK.

>
> > On a related note, are there any other bytea functions we should have in
the
> > backend before freezing for 7.2? I was thinking it would be nice to have
a
> > way to cast bytea into text and vice-versa, so that the normal text
> > functions could be used for things like LIKE and concatenation. Any
interest
> > in this? If so, any guidance WRT how it should be implemented?
>
> I can't see why you can't do that. The only problem is passing a \0
> (null byte) back to the client.

Well, ISTM the simplest (if not the most efficient) way to do bytea-to-text
would be a function that takes the escaped string value from byteaout, and
creates a text value directly from it. The only danger I can think of is
that very long strings might need to be truncated in length, since the
escaped string could be significantly longer than the binary.

Text-to-bytea should be a straight copy, since nothing that can be
represented as text cannot be represented as bytea.

Any comments or concerns?

-- Joe


From: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 08:11:45
Message-ID: 20010904101145.A18929@zf.jcu.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Sep 03, 2001 at 08:48:22PM -0400, Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Where did we leave this?
>
> I don't think adding a datatype just to provide base64 encoding is
> a wise approach. The overhead of a new datatype (in the sense of
> providing operators/functions for it) will be much more than the
> benefit. I think providing encode/decode functions is sufficient...
> and we have those already, don't we?

Agree too. But 1000 "bad" chars encoded by base64 vs. encoded by
escape, what is longer and more expensive for transfer between FE
and BE?

A base64 problem is that encode all chars in string, but in the
real usage some data contains "bad" chars occasional only.

Karel

--
Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
http://home.zf.jcu.cz/~zakkr/

C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Joe Conway <joseph(dot)conway(at)home(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>, Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 10:25:48
Message-ID: Pine.LNX.4.30.0109041222481.828-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Joe Conway writes:

> On a related note, are there any other bytea functions we should have in the
> backend before freezing for 7.2?

The SQL standards has a lot of functions for BLOB...

> I was thinking it would be nice to have a
> way to cast bytea into text and vice-versa, so that the normal text
> functions could be used for things like LIKE and concatenation.

Better write a native LIKE function for bytea, now that some parts are
threatening to make the text-LIKE function use the locale collating
sequence. (Multibyte aware text could also have interesting effects.)

--
Peter Eisentraut peter_e(at)gmx(dot)net http://funkturm.homeip.net/~peter


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joe Conway" <joseph(dot)conway(at)home(dot)com>
Cc: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Karel Zak" <zakkr(at)zf(dot)jcu(dot)cz>, "Joerg Hessdoerfer" <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 13:50:49
Message-ID: 26073.999611449@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Joe Conway" <joseph(dot)conway(at)home(dot)com> writes:
> Well, ISTM the simplest (if not the most efficient) way to do bytea-to-text
> would be a function that takes the escaped string value from byteaout, and
> creates a text value directly from it. The only danger I can think of is
> that very long strings might need to be truncated in length, since the
> escaped string could be significantly longer than the binary.

> Text-to-bytea should be a straight copy, since nothing that can be
> represented as text cannot be represented as bytea.

Ugh ... if the conversion functions are not inverses then I think they
lose much of their value. I could see doing either of these:

1. Conversion functions based on byteaout/byteain.

2. Bytea to text escapes *only* null bytes, text to bytea treats only
"\0" as an escape sequence.

Or maybe both, with two pairs of conversion functions.

In any case, we have to decide whether these coercion functions should
be named after the types --- ie, should they be made invokable as
implicit coercions? I'm dubious that that's a good idea; if we do it
then all sorts of textual operations will suddenly be allowed for bytea
without any explicit conversion, which is likely to do more harm than
good. The reason for having a separate bytea type is exactly so that
you *can't* apply text ops to it without thinking.

regards, tom lane


From: "Joe Conway" <joseph(dot)conway(at)home(dot)com>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Cc: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Karel Zak" <zakkr(at)zf(dot)jcu(dot)cz>, "Joerg Hessdoerfer" <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 15:37:35
Message-ID: 00d301c13557$85b0d4f0$0705a8c0@jecw2k1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> > On a related note, are there any other bytea functions we should have in
the
> > backend before freezing for 7.2?
>
> The SQL standards has a lot of functions for BLOB...
>

OK - thanks. I'll take a look.

> > I was thinking it would be nice to have a
> > way to cast bytea into text and vice-versa, so that the normal text
> > functions could be used for things like LIKE and concatenation.
>
> Better write a native LIKE function for bytea, now that some parts are
> threatening to make the text-LIKE function use the locale collating
> sequence. (Multibyte aware text could also have interesting effects.)
>

Sounds like good advice. I'll try to get both the cast functions and a
native bytea LIKE function done.

-- Joe


From: "Joe Conway" <joseph(dot)conway(at)home(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Karel Zak" <zakkr(at)zf(dot)jcu(dot)cz>, "Joerg Hessdoerfer" <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 15:46:44
Message-ID: 00dc01c13558$cd6ad790$0705a8c0@jecw2k1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> Ugh ... if the conversion functions are not inverses then I think they
> lose much of their value. I could see doing either of these:
>
> 1. Conversion functions based on byteaout/byteain.
>
> 2. Bytea to text escapes *only* null bytes, text to bytea treats only
> "\0" as an escape sequence.
>
> Or maybe both, with two pairs of conversion functions.
>
> In any case, we have to decide whether these coercion functions should
> be named after the types --- ie, should they be made invokable as
> implicit coercions? I'm dubious that that's a good idea; if we do it
> then all sorts of textual operations will suddenly be allowed for bytea
> without any explicit conversion, which is likely to do more harm than
> good. The reason for having a separate bytea type is exactly so that
> you *can't* apply text ops to it without thinking.
>
> regards, tom lane

You're right, as usual (I was tired when I wrote this last night ;). But I
think we have to escape/unescape both null and '\', don't we?

I agree that it would be better to *not* allow implicit coercions. Given
that, any preferences on function names? Are text_to_bytea() and
bytea_to_text() too ugly?

-- Joe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joe Conway" <joseph(dot)conway(at)home(dot)com>
Cc: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Karel Zak" <zakkr(at)zf(dot)jcu(dot)cz>, "Joerg Hessdoerfer" <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 16:01:11
Message-ID: 26723.999619271@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Joe Conway" <joseph(dot)conway(at)home(dot)com> writes:
> You're right, as usual (I was tired when I wrote this last night ;). But I
> think we have to escape/unescape both null and '\', don't we?

Yeah, you're right. My turn to have not thought hard enough.

> I agree that it would be better to *not* allow implicit coercions. Given
> that, any preferences on function names? Are text_to_bytea() and
> bytea_to_text() too ugly?

They're pretty ugly, but more importantly they're only suitable if we
have exactly one conversion function each way. If we have two, what
will we call the second one?

I think it's okay to let the argument type be implicit in the function
argument list. Something like text_escaped(bytea) and text_direct(bytea)
(with inverses bytea_escaped(text) and bytea_direct(text)) might do.
I'm not totally happy with "direct" to suggest minimum escaping, though.
Better ideas anyone?

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Joe Conway <joseph(dot)conway(at)home(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>, Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 16:44:49
Message-ID: 200109041644.f84Ginw16685@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> You're right, as usual (I was tired when I wrote this last night ;). But I
> think we have to escape/unescape both null and '\', don't we?

Yes, I think backslashes need special escapes too.

Let me ask a bigger question. We have the length of the text string in
the varlena header. Are we concerned about backend code not handling
NULL in text fields, or frontend code returning strings with embedded
nulls?

I see problems in the text() functions for nulls, but is such a
limitation required for text types?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Larry Rosenman <ler(at)lerctr(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <joseph(dot)conway(at)home(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>, Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 17:13:00
Message-ID: 20010904121300.A24138@lerami.lerctr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [010904 12:01]:
> They're pretty ugly, but more importantly they're only suitable if we
> have exactly one conversion function each way. If we have two, what
> will we call the second one?
>
> I think it's okay to let the argument type be implicit in the function
> argument list. Something like text_escaped(bytea) and text_direct(bytea)
> (with inverses bytea_escaped(text) and bytea_direct(text)) might do.
> I'm not totally happy with "direct" to suggest minimum escaping, though.
> Better ideas anyone?
Cooked vs raw?

LER

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Joe Conway <joseph(dot)conway(at)home(dot)com>, Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>, Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 17:33:26
Message-ID: 27436.999624806@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Let me ask a bigger question. We have the length of the text string in
> the varlena header. Are we concerned about backend code not handling
> NULL in text fields, or frontend code returning strings with embedded
> nulls?

The former.

> I see problems in the text() functions for nulls, but is such a
> limitation required for text types?

Unless you want to re-implement strcoll() and friends from scratch.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <joseph(dot)conway(at)home(dot)com>, Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>, Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 17:34:37
Message-ID: 200109041734.f84HYb819619@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Let me ask a bigger question. We have the length of the text string in
> > the varlena header. Are we concerned about backend code not handling
> > NULL in text fields, or frontend code returning strings with embedded
> > nulls?
>
> The former.
>
> > I see problems in the text() functions for nulls, but is such a
> > limitation required for text types?
>
> Unless you want to re-implement strcoll() and friends from scratch.

Yes, I saw strcoll().

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Joe Conway <joseph(dot)conway(at)home(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>, Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 22:56:06
Message-ID: 7506.999644166@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Why not just stick these things into encode() and name them
> "my-cool-encoding" or whatever.

Sounds good to me ...

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joe Conway <joseph(dot)conway(at)home(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>, Joerg Hessdoerfer <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 22:58:21
Message-ID: Pine.LNX.4.30.0109050054260.828-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane writes:

> > I agree that it would be better to *not* allow implicit coercions. Given
> > that, any preferences on function names? Are text_to_bytea() and
> > bytea_to_text() too ugly?
>
> They're pretty ugly, but more importantly they're only suitable if we
> have exactly one conversion function each way. If we have two, what
> will we call the second one?

Why not just stick these things into encode() and name them
"my-cool-encoding" or whatever. There is no truly natural conversion
between text and bytea, so encode/decode seem like the proper place.

--
Peter Eisentraut peter_e(at)gmx(dot)net http://funkturm.homeip.net/~peter


From: "Joe Conway" <joseph(dot)conway(at)home(dot)com>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Karel Zak" <zakkr(at)zf(dot)jcu(dot)cz>, "Joerg Hessdoerfer" <Joerg(dot)Hessdoerfer(at)sea-gmbh(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bytea/Base64 encoders for libpq - interested?
Date: 2001-09-04 23:09:17
Message-ID: 01d101c13596$9f404380$0705a8c0@jecw2k1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > Why not just stick these things into encode() and name them
> > "my-cool-encoding" or whatever.
>
> Sounds good to me ...
>
> regards, tom lane
>

Sounds good to me too. Patch forthcoming . . .

-- Joe


From: "Joe Conway" <joseph(dot)conway(at)home(dot)com>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-patches(at)postgresql(dot)org>
Subject: Bytea string operator support
Date: 2001-09-05 20:34:06
Message-ID: 017801c1364a$1bf6bf40$0705a8c0@jecw2k1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> > > I agree that it would be better to *not* allow implicit coercions.
Given
> > > that, any preferences on function names? Are text_to_bytea() and
> > > bytea_to_text() too ugly?
> >
> > They're pretty ugly, but more importantly they're only suitable if we
> > have exactly one conversion function each way. If we have two, what
> > will we call the second one?
>
> Why not just stick these things into encode() and name them
> "my-cool-encoding" or whatever. There is no truly natural conversion
> between text and bytea, so encode/decode seem like the proper place.
>
(I'm sending directly to Peter, Tom, and Bruce because you were all involved
in this thread, and the list seems to be down)

Here's a patch for bytea string functions. As discussed:

text encode(bytea, 'escape')
bytea decode(text, 'escape')

to allow conversion bytea-text/text-bytea conversion. Also implemented
(SQL99 defines Binary Strings with all of these operators):

byteacat and "||" operator
substring
trim (only did trim(bytea, bytea) since there is no default trim character
for bunary per SQL99)
length (just aliased octet_length, which is correct for bytea, I think)
position
like and "~~" operator
not like and "!~~" operator

I think that's it.

Passes all regression tests. Based on the discussion, I did not create
functions to allow casting text-to-bytea or bytea-to-text -- it sounded like
we just want people to use encode/decode. I'm still planning to write
PQescapeBytea, but that will come later as a seperate patch. One operator
defined by SQL99, but not implemented here (or for text datatype, that I
could see) is the "overlay" function (modifies string argument by replacing
a substring given start and length with a replacement string). It sounds
useful -- any interest?

Review and comments much appreciated!

-- Joe

Attachment Content-Type Size
bytea_string_funcs_r00.diff application/octet-stream 20.6 KB

From: Marko Kreen <marko(at)l-t(dot)ee>
To: Joe Conway <joseph(dot)conway(at)home(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Bytea string operator support
Date: 2001-09-06 17:39:49
Message-ID: 20010906193949.A9357@l-t.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, Sep 05, 2001 at 01:34:06PM -0700, Joe Conway wrote:
> > Why not just stick these things into encode() and name them
> > "my-cool-encoding" or whatever. There is no truly natural conversion
> > between text and bytea, so encode/decode seem like the proper place.
>
> Here's a patch for bytea string functions. As discussed:
>
> text encode(bytea, 'escape')
> bytea decode(text, 'escape')

Why are you using \xxx encoding there? As the 'escape' encoding
is supposed to be 'minimalistic' as it escapes only 2
problematic values, then IMHO it would be better to use
\0 and \\ as escapes - takes less room.

--
marko


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Marko Kreen <marko(at)l-t(dot)ee>
Cc: Joe Conway <joseph(dot)conway(at)home(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Bytea string operator support
Date: 2001-09-06 17:43:53
Message-ID: 200109061743.f86Hhrh05987@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> On Wed, Sep 05, 2001 at 01:34:06PM -0700, Joe Conway wrote:
> > > Why not just stick these things into encode() and name them
> > > "my-cool-encoding" or whatever. There is no truly natural conversion
> > > between text and bytea, so encode/decode seem like the proper place.
> >
> > Here's a patch for bytea string functions. As discussed:
> >
> > text encode(bytea, 'escape')
> > bytea decode(text, 'escape')
>
> Why are you using \xxx encoding there? As the 'escape' encoding
> is supposed to be 'minimalistic' as it escapes only 2
> problematic values, then IMHO it would be better to use
> \0 and \\ as escapes - takes less room.

Agreed, and I have documented this in the SGML pages. Knowing this,
bytea becomes a much easier format to use.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: "Joe Conway" <joseph(dot)conway(at)home(dot)com>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Marko Kreen" <marko(at)l-t(dot)ee>
Cc: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Bytea string operator support
Date: 2001-09-06 18:08:26
Message-ID: 012701c136fe$ece22040$0705a8c0@jecw2k1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> > > Here's a patch for bytea string functions. As discussed:
> > >
> > > text encode(bytea, 'escape')
> > > bytea decode(text, 'escape')
> >
> > Why are you using \xxx encoding there? As the 'escape' encoding
> > is supposed to be 'minimalistic' as it escapes only 2
> > problematic values, then IMHO it would be better to use
> > \0 and \\ as escapes - takes less room.
>
> Agreed, and I have documented this in the SGML pages. Knowing this,
> bytea becomes a much easier format to use.

No problem -- I kind of like the octal style better, but I can see your
point. I'll wait for awhile for more comments, and then send in a new patch.

-- Joe


From: "Joe Conway" <joseph(dot)conway(at)home(dot)com>
To: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Bytea string operator support
Date: 2001-09-07 06:47:21
Message-ID: 031c01c13768$f1adf300$0705a8c0@jecw2k1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> > > > Here's a patch for bytea string functions. As discussed:
> > > >
> > > > text encode(bytea, 'escape')
> > > > bytea decode(text, 'escape')
> > >
> > > Why are you using \xxx encoding there? As the 'escape' encoding
> > > is supposed to be 'minimalistic' as it escapes only 2
> > > problematic values, then IMHO it would be better to use
> > > \0 and \\ as escapes - takes less room.
> >
> > Agreed, and I have documented this in the SGML pages. Knowing this,
> > bytea becomes a much easier format to use.
>
> No problem -- I kind of like the octal style better, but I can see your
> point. I'll wait for awhile for more comments, and then send in a new
patch.

Here's a revised patch. Changes:

1. Now outputs '\\' instead of '\134' when using encode(bytea, 'escape')
Note that I ended up leaving \0 as \000 so that there are no ambiguities
when decoding something like, for example, \0123.

2. Fixed bug in byteain which allowed input values which were not valid
octals (e.g. \789), to be parsed as if they were octals.

Joe

Attachment Content-Type Size
bytea_ops_r01.diff application/octet-stream 13.8 KB