Gripe: bytea_output default => data corruption

Lists: pgsql-docspgsql-general
From: ljb <ljb9832(at)pobox(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Gripe: bytea_output default => data corruption
Date: 2010-10-13 00:37:33
Message-ID: i92v0d$qfe$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-general

Defaulting bytea output from the backend to use hex mode encoding, which is
incompatible with pre-9.0 interfaces, wasn't a friendly thing to do. The
default should have been escape mode. Or else you needed a big warning in
HISTORY that we must either change bytea_output, or upgrade all clients
before servers. Because using a 9.0 server with a 8.x libpq-based client
results in undetected data corruption when selecting BYTEA objects.

By default, the 9.0 server encodes a bytea using hex mode, but an 8.x
libpq-based client will decode that using escape mode, with no error detected
on either end. For example, start with "A", encode to "\x40" decode to "x40".

There are good reasons to break backward compatibility, like security or
standards compliance, but not performance. Please think twice next time you
consider breaking stuff just because you think the new way should be faster.


From: Raymond O'Donnell <rod(at)iol(dot)ie>
To: ljb <ljb9832(at)pobox(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Gripe: bytea_output default => data corruption
Date: 2010-10-13 07:45:58
Message-ID: 4CB563B6.8020803@iol.ie
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-general

On 13/10/2010 01:37, ljb wrote:
> Defaulting bytea output from the backend to use hex mode encoding, which is
> incompatible with pre-9.0 interfaces, wasn't a friendly thing to do. The
> default should have been escape mode. Or else you needed a big warning in
> HISTORY that we must either change bytea_output, or upgrade all clients
> before servers. Because using a 9.0 server with a 8.x libpq-based client
> results in undetected data corruption when selecting BYTEA objects.
>
> By default, the 9.0 server encodes a bytea using hex mode, but an 8.x
> libpq-based client will decode that using escape mode, with no error detected
> on either end. For example, start with "A", encode to "\x40" decode to "x40".
>
> There are good reasons to break backward compatibility, like security or
> standards compliance, but not performance. Please think twice next time you
> consider breaking stuff just because you think the new way should be faster.

In fairness, it *is* flagged in the release note - it's the first item
under "data types" in the list of incompatibilities.

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod(at)iol(dot)ie


From: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
To: rod(at)iol(dot)ie
Cc: ljb <ljb9832(at)pobox(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Gripe: bytea_output default => data corruption
Date: 2010-10-13 12:48:41
Message-ID: AANLkTinWoQ1QG7TSoDj52UTZ8sj4cwGewFDA_4gnz9LJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-general

Hey all,

And it is 9.0, i.e another major version. IMO the higher major versions not
necessarily must be 100% backward compatible. So, think twice next time
you update the major version.

2010/10/13 Raymond O'Donnell <rod(at)iol(dot)ie>

> On 13/10/2010 01:37, ljb wrote:
>
>> Defaulting bytea output from the backend to use hex mode encoding, which
>> is
>> incompatible with pre-9.0 interfaces, wasn't a friendly thing to do. The
>> default should have been escape mode. Or else you needed a big warning in
>> HISTORY that we must either change bytea_output, or upgrade all clients
>> before servers. Because using a 9.0 server with a 8.x libpq-based client
>> results in undetected data corruption when selecting BYTEA objects.
>>
>> By default, the 9.0 server encodes a bytea using hex mode, but an 8.x
>> libpq-based client will decode that using escape mode, with no error
>> detected
>> on either end. For example, start with "A", encode to "\x40" decode to
>> "x40".
>>
>> There are good reasons to break backward compatibility, like security or
>> standards compliance, but not performance. Please think twice next time
>> you
>> consider breaking stuff just because you think the new way should be
>> faster.
>>
>
> In fairness, it *is* flagged in the release note - it's the first item
> under "data types" in the list of incompatibilities.
>
> Ray.
>
> --
> Raymond O'Donnell :: Galway :: Ireland
> rod(at)iol(dot)ie
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
// Dmitriy.


From: ljb <ljb9832(at)pobox(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Gripe: bytea_output default => data corruption
Date: 2010-10-13 21:03:43
Message-ID: i956rf$228u$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-general

rod(at)iol(dot)ie wrote:
>...
> In fairness, it *is* flagged in the release note - it's the first item
> under "data types" in the list of incompatibilities.

Quote:
"bytea output now appears in hex format by default (Peter Eisentraut)
The server parameter bytea_output can be used to select the
traditional output format if needed for compatibility."

This is inadequate, because it fails to warn that pre-9.0 clients will
decode the data incorrectly without reporting an error.


From: Raymond O'Donnell <rod(at)iol(dot)ie>
To: ljb <ljb9832(at)pobox(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Gripe: bytea_output default => data corruption
Date: 2010-10-14 09:15:46
Message-ID: 4CB6CA42.2020305@iol.ie
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-general

On 13/10/2010 22:03, ljb wrote:
> rod(at)iol(dot)ie wrote:
>> ...
>> In fairness, it *is* flagged in the release note - it's the first item
>> under "data types" in the list of incompatibilities.
>
> Quote:
> "bytea output now appears in hex format by default (Peter Eisentraut)
> The server parameter bytea_output can be used to select the
> traditional output format if needed for compatibility."
>
> This is inadequate, because it fails to warn that pre-9.0 clients will
> decode the data incorrectly without reporting an error.

I thought myself that the word "incompatibility" was fairly
self-explanatory... :-)

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod(at)iol(dot)ie


From: Rajesh Kumar Mallah <mallah(dot)rajesh(at)gmail(dot)com>
To: ljb <ljb9832(at)pobox(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Gripe: bytea_output default => data corruption
Date: 2010-10-14 09:19:13
Message-ID: AANLkTi=SH6s3FcsCf0quZm09-XgncTnQPR18-Z6GVqsU@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-general

Dear Griper!,

fortunately someone showed a easy 'fix'

ALTER DATABASE foo SET bytea_output='escape' ;

Regds
Rajesh Kumar Mallah.

On Wed, Oct 13, 2010 at 5:03 PM, ljb <ljb9832(at)pobox(dot)com> wrote:
> rod(at)iol(dot)ie wrote:
>>...
>> In fairness, it *is* flagged in the release note - it's the first item
>> under "data types" in the list of incompatibilities.
>
> Quote:
>    "bytea output now appears in hex format by default (Peter Eisentraut)
>     The server parameter bytea_output can be used to select the
>     traditional output format if needed for compatibility."
>
> This is inadequate, because it fails to warn that pre-9.0 clients will
> decode the data incorrectly without reporting an error.
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Rajesh Kumar Mallah <mallah(dot)rajesh(at)gmail(dot)com>
Cc: ljb <ljb9832(at)pobox(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Gripe: bytea_output default => data corruption
Date: 2010-10-21 21:29:11
Message-ID: 201010212129.o9LLTBl21612@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-general

Rajesh Kumar Mallah wrote:
> Dear Griper!,
>
> fortunately someone showed a easy 'fix'
>
> ALTER DATABASE foo SET bytea_output='escape' ;

Yes, we mentioned that setting in the release notes too:

E.2.3.5. Data Types

*

Allow bytea values to be written in hex notation (Peter
Eisentraut)

The server parameter bytea_output controls whether hex or
traditional format is used for bytea output. Libpq's PQescapeByteaConn()
function automatically uses the hex format when connected to PostgreSQL
9.0 or newer servers.

The new hex format will be directly compatible with more
applications that use binary data, allowing them to store and retrieve
it without extra conversion. It is also significantly faster to read and
write than the traditional format.

While the "Incompatibilities" section mentions only the first paragraph,
this remention lower down has even more details. Not sure what else you
wanted us to do.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: ljb <ljb9832(at)pobox(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Gripe: bytea_output default => data corruption
Date: 2010-10-21 23:12:40
Message-ID: i9qhd8$1v7u$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-general

bruce(at)momjian(dot)us wrote:
>...
> Yes, we mentioned that setting in the release notes too:
>...
> While the "Incompatibilities" section mentions only the first paragraph,
> this remention lower down has even more details. Not sure what else you
> wanted us to do.

Here's how I would have written that first paragraph. My addition starts
at "Libpq's PQunescapeBytea() function..."

| * Allow bytea values to be written in hex notation (Peter Eisentraut)
|
| The server parameter bytea_output controls whether hex or
| traditional format is used for bytea output. Libpq's PQescapeByteaConn()
| function automatically uses the hex format when connected to PostgreSQL
| 9.0 or newer servers. Libpq's PQunescapeBytea() function from version 9.0
| and newer will properly decode both hex and traditional format. However,
| in versions of Libpq older than 9.0, the PQunescapeByte() function can only
| decode traditional format, and will corrupt bytea data received in hex
| format without reporting an error. To avoid loss of data, you must either
| upgrade all clients to 9.0.x, or set the server's bytea_output parameter
| to 'escape'.

Again: My complaint is that pre-9.0 libpq-based clients mis-decode the new
default hex format bytea data without reporting an error, and this danger is
insufficiently documented in the release notes.

Speaking of documentation, go read the 9.0.x reference manual sections for
Libpq's PQescapeByteaConn() and PQunescapeBytea(). These descriptions of
escaping and unescaping are incorrect for 9.0, which can add to any confusion.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: ljb <ljb9832(at)pobox(dot)com>
Cc: PostgreSQL-documentation <pgsql-docs(at)postgresql(dot)org>
Subject: Re: [GENERAL] Gripe: bytea_output default => data corruption
Date: 2010-10-21 23:42:14
Message-ID: 201010212342.o9LNgEu17871@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-general

ljb wrote:
> bruce(at)momjian(dot)us wrote:
> >...
> > Yes, we mentioned that setting in the release notes too:
> >...
> > While the "Incompatibilities" section mentions only the first paragraph,
> > this remention lower down has even more details. Not sure what else you
> > wanted us to do.
>
> Here's how I would have written that first paragraph. My addition starts
> at "Libpq's PQunescapeBytea() function..."
>
> | * Allow bytea values to be written in hex notation (Peter Eisentraut)
> |
> | The server parameter bytea_output controls whether hex or
> | traditional format is used for bytea output. Libpq's PQescapeByteaConn()
> | function automatically uses the hex format when connected to PostgreSQL
> | 9.0 or newer servers. Libpq's PQunescapeBytea() function from version 9.0
> | and newer will properly decode both hex and traditional format. However,
> | in versions of Libpq older than 9.0, the PQunescapeByte() function can only
> | decode traditional format, and will corrupt bytea data received in hex
> | format without reporting an error. To avoid loss of data, you must either
> | upgrade all clients to 9.0.x, or set the server's bytea_output parameter
> | to 'escape'.
>
> Again: My complaint is that pre-9.0 libpq-based clients mis-decode the new
> default hex format bytea data without reporting an error, and this danger is
> insufficiently documented in the release notes.

[ Thread moved to docs.]

Yes, I don't believe we were aware of the silent error behavior; I
certainly was not, and if I was, I would have mentioned it in the
release notes.

> Speaking of documentation, go read the 9.0.x reference manual sections for
> Libpq's PQescapeByteaConn() and PQunescapeBytea(). These descriptions of
> escaping and unescaping are incorrect for 9.0, which can add to any confusion.

Yikes, you are right! I see:

http://www.postgresql.org/docs/9.0/static/libpq-exec.html#LIBPQ-EXEC-ESCAPE-STRING

PQescapeByteaConn
...
Certain byte values must be escaped (but all byte values can be escaped)
when used as part of a bytea literal in an SQL statement. In general, to
--> escape a byte, it is converted into the three digit octal number equal
to the octet value, and preceded by usually two backslashes. The single
quote (') and backslash (\) characters have special alternative escape
sequences. See Section 8.4 for more information. PQescapeByteaConn
performs this operation, escaping only the minimally required bytes.

Can someone suggest some updated wording? Thanks.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Vick Khera <vivek(at)khera(dot)org>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Gripe: bytea_output default => data corruption
Date: 2010-10-22 16:19:33
Message-ID: AANLkTikV2u1+YwO+hzcoLyiWeH=hR0okY8Jz8YjwY0KA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-general

On Thu, Oct 21, 2010 at 7:12 PM, ljb <ljb9832(at)pobox(dot)com> wrote:

> Again: My complaint is that pre-9.0 libpq-based clients mis-decode the new
> default hex format bytea data without reporting an error, and this danger
> is
> insufficiently documented in the release notes.
>

I had some hoops thru which I had to jump to make our app compatible with
both 8.x and 9.x so we could safely migrate our servers without having to
coordinate a code push. It wasn't that bad, but part of the problem is that
the DBD::Pg driver does not understand the new format... but it does
unescape the "\x" marker for me to a simple "x", since that follows the
traditional un-escaping rules for values returned from Pg. I don't like
overriding defaults in the DB settings unless I *really* have to.

So my code now looks something like this:

if (substr($value,2,100) =~ m/^[0-9a-f]+$/) {
# hex coding of bytea from Postgres 9.0+
$self->log_debug('bytea hex decode');
# remove the leading \x. DBD::Pg descapes \x to x, so just remove x
$value =~ s/^[^0-9a-f]+//;
$value = pack('H*',$value); # convert hex to bytes
} else {
$self->log_debug('bytea escape decoded');
# Postgres < 9.0 encode auto handled by DBD::Pg
}

Unfortunately, Greg is saying that he has no time right now to release an
updated DBD::Pg despite the fact that the code is written and merged into
the development source tree... so it may be a while before perl people are
happy. Luckily my above code will work even when DBD::Pg learns to do
auto-escaping of bytea new format.

At least the current $dbh->quote() method still seems to create acceptable
escaping for postgres 9.0 when you tell it you have a bytea field type.


From: ljb <ljb9832(at)pobox(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Gripe: bytea_output default => data corruption
Date: 2010-10-22 22:06:07
Message-ID: i9t1sf$2vvn$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-general

vivek(at)khera(dot)org wrote:
>...
> I had some hoops thru which I had to jump to make our app compatible with
> both 8.x and 9.x so we could safely migrate our servers without having to
> coordinate a code push. It wasn't that bad, but part of the problem is that
> the DBD::Pg driver does not understand the new format... but it does
> unescape the "\x" marker for me to a simple "x", since that follows the
> traditional un-escaping rules for values returned from Pg. I don't like
> overriding defaults in the DB settings unless I *really* have to.

Although DBD::Pg uses libpq, linking it with the PostgreSQL-9.0 libpq
doesn't fix the escape/unescape problem. (I'm sure you already know this.)
That's because DBD::Pg implements its own string and bytea escape and
unescape functions, for some reason, and does not use PQescapeStringConn
etc. (I wonder if that makes it vulnerable to the security problems fixed
8.1.4?)

I probably would have just gone ahead and deployed 9.0 servers with
"bytea_output='escape'" in their postgresql.conf files. Permanent work-around.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: ljb <ljb9832(at)pobox(dot)com>, PostgreSQL-documentation <pgsql-docs(at)postgresql(dot)org>
Subject: Re: [GENERAL] Gripe: bytea_output default => data corruption
Date: 2010-10-25 23:18:49
Message-ID: 8604.1288048729@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-general

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Yikes, you are right! I see:

> http://www.postgresql.org/docs/9.0/static/libpq-exec.html#LIBPQ-EXEC-ESCAPE-STRING

> PQescapeByteaConn
> ...
> Certain byte values must be escaped (but all byte values can be escaped)
> when used as part of a bytea literal in an SQL statement. In general, to
> --> escape a byte, it is converted into the three digit octal number equal
> to the octet value, and preceded by usually two backslashes. The single
> quote (') and backslash (\) characters have special alternative escape
> sequences. See Section 8.4 for more information. PQescapeByteaConn
> performs this operation, escaping only the minimally required bytes.

> Can someone suggest some updated wording? Thanks.

I think we should simply remove the description of *how* the escaping is
performed, and state only that the function produces a suitably escaped
literal string. Anything else is not future-proof, and could someday
break the way this wording did.

regards, tom lane


From: ljb <ljb9832(at)pobox(dot)com>
To: pgsql-docs(at)postgresql(dot)org
Subject: Re: [GENERAL] Gripe: bytea_output default => data corruption
Date: 2010-10-27 01:11:42
Message-ID: ia7u8e$bl9$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-general

tgl(at)sss(dot)pgh(dot)pa(dot)us wrote:
> I think we should simply remove the description of *how* the escaping is
> performed, and state only that the function produces a suitably escaped
> literal string. Anything else is not future-proof, and could someday
> break the way this wording did.

Perhaps it would be best to remove escaping details here. But the
description of PQescapeBytea() might need to be rewritten, too. Without
describing exactly what PQescapeByteaConn() does, it is hard to understand
what PQescapeBytea() does not do, and why it therefore "might give the
wrong results".

I think the actual function behavior should be documented somewhere. Even
though it might change again.