Re: Re: [COMMITTERS] pgsql: Close previously open holes for invalidly encoded data to enter

Lists: pgsql-committerspgsql-hackers
From: adunstan(at)postgresql(dot)org (Andrew Dunstan)
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Close previously open holes for invalidly encoded data to enter
Date: 2007-09-18 17:41:17
Message-ID: 20070918174117.CA5F7753E4C@cvs.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Log Message:
-----------
Close previously open holes for invalidly encoded data to enter the
database via builtin functions, as recently discussed on -hackers.

chr() now returns a character in the database encoding. For UTF8 encoded databases
the argument is treated as a Unicode code point. For other multi-byte encodings
the argument must designate a strict ascii character, or an error is raised,
as is also the case if the argument is 0.

ascii() is adjusted so that it remains the inverse of chr().

The two argument form of convert() is gone, and the three argument form now
takes a bytea first argument and returns a bytea. To cover this loss three new
functions are introduced:
. convert_from(bytea, name) returns text - converts the first argument from the
named encoding to the database encoding
. convert_to(text, name) returns bytea - converts the first argument from the
database encoding to the named encoding
. length(bytea, name) returns int - gives the length of the first argument in
characters in the named encoding

Modified Files:
--------------
pgsql/doc/src/sgml:
func.sgml (r1.395 -> r1.396)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/func.sgml?r1=1.395&r2=1.396)
pgsql/src/backend/catalog:
pg_conversion.c (r1.36 -> r1.37)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/pg_conversion.c?r1=1.36&r2=1.37)
pgsql/src/backend/utils/adt:
oracle_compat.c (r1.70 -> r1.71)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/oracle_compat.c?r1=1.70&r2=1.71)
pgsql/src/backend/utils/mb:
mbutils.c (r1.63 -> r1.64)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/mb/mbutils.c?r1=1.63&r2=1.64)
wchar.c (r1.63 -> r1.64)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/mb/wchar.c?r1=1.63&r2=1.64)
pgsql/src/include/catalog:
catversion.h (r1.424 -> r1.425)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/catversion.h?r1=1.424&r2=1.425)
pg_proc.h (r1.469 -> r1.470)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/pg_proc.h?r1=1.469&r2=1.470)
pgsql/src/include/mb:
pg_wchar.h (r1.72 -> r1.73)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/mb/pg_wchar.h?r1=1.72&r2=1.73)
pgsql/src/include/utils:
builtins.h (r1.302 -> r1.303)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/include/utils/builtins.h?r1=1.302&r2=1.303)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Close previously open holes for invalidly encoded data to enter
Date: 2007-09-18 22:18:34
Message-ID: 11125.1190153914@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

adunstan(at)postgresql(dot)org (Andrew Dunstan) writes:
> Log Message:
> The two argument form of convert() is gone,

Um ... so that means CONVERT(c USING y) now fails entirely? That might
be going a bit far. If we do want to get rid of that syntax then I'm
noting a lack of parser changes in this patch.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Close previously open holes for invalidly encoded data to enter
Date: 2007-09-18 22:32:33
Message-ID: 46F05201.6050305@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Tom Lane wrote:
> adunstan(at)postgresql(dot)org (Andrew Dunstan) writes:
>
>> Log Message:
>> The two argument form of convert() is gone,
>>
>
> Um ... so that means CONVERT(c USING y) now fails entirely? That might
> be going a bit far. If we do want to get rid of that syntax then I'm
> noting a lack of parser changes in this patch.
>
>
>

No, that's still there. The only change there is that it returns a
bytea. I forgot to put that in the commit notes. :-( And the fact that
it's still there is confirmed by the fact that we don't have regression
failures ;-)

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Close previously open holes for invalidly encoded data to enter
Date: 2007-09-18 23:02:53
Message-ID: 11672.1190156573@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> Um ... so that means CONVERT(c USING y) now fails entirely? That might
>> be going a bit far. If we do want to get rid of that syntax then I'm
>> noting a lack of parser changes in this patch.

> No, that's still there. The only change there is that it returns a
> bytea. I forgot to put that in the commit notes. :-(

The SQL99 spec is pretty definite that it returns text. I think we have
a problem here.

<form-of-use conversion> ::=
CONVERT <left paren> <character value expression>
USING <form-of-use conversion name> <right paren>

a) <form-of-use conversion> shall be simply contained in a
<value expression> that is immediately contained in a
<derived column> that is immediately contained in a <select
sublist> or shall immediately contain either a <simply value
specification> that is a <host parameter name> or a <value
specification> that is a <host parameter specification>.

b) A <form-of-use conversion name> shall identify a form-of-use
conversion.

c) The declared type of the result is variable-length character
string with implementation-defined maximum length. The
character set of the result is the same as the character
repertoire of the <character value expression> and form-of-
use determined by the form-of-use conversion identified by
the <form-of-use conversion name>. Let CR be that character
repertoire. The result has the Implicit coercibility
characteristic and its collating sequence is the default
collating sequence of CR.

OTOH we may be talking at cross-purposes --- on looking into gram.y
I see that this syntax is transformed to a call of convert_using(),
which may mean it has nothing to do with your changes.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Close previously open holes for invalidly encoded data to enter
Date: 2007-09-18 23:08:55
Message-ID: 46F05A87.1050306@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Tom Lane wrote:
> OTOH we may be talking at cross-purposes --- on looking into gram.y
> I see that this syntax is transformed to a call of convert_using(),
> which may mean it has nothing to do with your changes.
>
>
>

No, I changed convert_using -
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/pg_conversion.c?r1=1.36&r2=1.37

We can revert that if necessary. It will open up a hole, though. Take
your pick - spec compliance or validly coded data.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [COMMITTERS] pgsql: Close previously open holes for invalidly encoded data to enter
Date: 2007-09-18 23:25:36
Message-ID: 11947.1190157936@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> We can revert that if necessary. It will open up a hole, though. Take
> your pick - spec compliance or validly coded data.

I would rather take CONVERT USING out altogether, than have an
implementation that so clearly disregards the spec as to not even return
a compatible datatype.

Other than the fact that it's supposed to return varchar, the spec's
description of what it converts to what seems about as clear as mud.
I suspect however that it can't really be implemented properly without
support for per-value (or at least per-column) encoding, which is
something we're nowhere near having. Maybe we *should* take it out
instead of using spec-defined syntax for a behavior that we made up
out of whole cloth.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [COMMITTERS] pgsql: Close previously open holes for invalidly encoded data to enter
Date: 2007-09-19 00:09:00
Message-ID: 46F0689C.5030500@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> We can revert that if necessary. It will open up a hole, though. Take
>> your pick - spec compliance or validly coded data.
>>
>
> I would rather take CONVERT USING out altogether, than have an
> implementation that so clearly disregards the spec as to not even return
> a compatible datatype.
>
> Other than the fact that it's supposed to return varchar, the spec's
> description of what it converts to what seems about as clear as mud.
> I suspect however that it can't really be implemented properly without
> support for per-value (or at least per-column) encoding, which is
> something we're nowhere near having. Maybe we *should* take it out
> instead of using spec-defined syntax for a behavior that we made up
> out of whole cloth.
>
>
>

Works for me. If there's no objection I'll start on that in a few days.

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [COMMITTERS] pgsql: Close previously open holes for invalidly encoded data to enter
Date: 2007-11-01 22:41:41
Message-ID: 200711012241.lA1Mff112092@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Andrew Dunstan wrote:
>
>
> Tom Lane wrote:
> > OTOH we may be talking at cross-purposes --- on looking into gram.y
> > I see that this syntax is transformed to a call of convert_using(),
> > which may mean it has nothing to do with your changes.
> >
> >
> >
>
> No, I changed convert_using -
> http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/pg_conversion.c?r1=1.36&r2=1.37
>
> We can revert that if necessary. It will open up a hole, though. Take
> your pick - spec compliance or validly coded data.

Is this a TODO?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [COMMITTERS] pgsql: Close previously open holes for invalidly encoded data to enter
Date: 2007-11-01 22:52:16
Message-ID: 472A58A0.5020900@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Bruce Momjian wrote:
> Andrew Dunstan wrote:
>
>> Tom Lane wrote:
>>
>>> OTOH we may be talking at cross-purposes --- on looking into gram.y
>>> I see that this syntax is transformed to a call of convert_using(),
>>> which may mean it has nothing to do with your changes.
>>>
>>>
>>>
>>>
>> No, I changed convert_using -
>> http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/pg_conversion.c?r1=1.36&r2=1.37
>>
>> We can revert that if necessary. It will open up a hole, though. Take
>> your pick - spec compliance or validly coded data.
>>
>
> Is this a TODO?
>

No, we're long past this point. We've dropped 'convert ... using' entirely.

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [COMMITTERS] pgsql: Close previously open holes for invalidly encoded data to enter
Date: 2007-11-01 23:23:14
Message-ID: 200711012323.lA1NNEq12594@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Andrew Dunstan wrote:
>
>
> Bruce Momjian wrote:
> > Andrew Dunstan wrote:
> >
> >> Tom Lane wrote:
> >>
> >>> OTOH we may be talking at cross-purposes --- on looking into gram.y
> >>> I see that this syntax is transformed to a call of convert_using(),
> >>> which may mean it has nothing to do with your changes.
> >>>
> >>>
> >>>
> >>>
> >> No, I changed convert_using -
> >> http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/pg_conversion.c?r1=1.36&r2=1.37
> >>
> >> We can revert that if necessary. It will open up a hole, though. Take
> >> your pick - spec compliance or validly coded data.
> >>
> >
> > Is this a TODO?
> >
>
> No, we're long past this point. We've dropped 'convert ... using' entirely.

The question is whether re-adding it should be a TODO.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [COMMITTERS] pgsql: Close previously open holes for invalidly encoded data to enter
Date: 2007-11-01 23:26:41
Message-ID: 17313.1193959601@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Andrew Dunstan wrote:
>> No, we're long past this point. We've dropped 'convert ... using' entirely.

> The question is whether re-adding it should be a TODO.

Not unless someone wants it and can explain the spec convincingly.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [COMMITTERS] pgsql: Close previously open holes for invalidly encoded data to enter
Date: 2007-11-01 23:28:09
Message-ID: 472A6109.2070601@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Bruce Momjian wrote:
>>>
>>> Is this a TODO?
>>>
>>>
>> No, we're long past this point. We've dropped 'convert ... using' entirely.
>>
>
> The question is whether re-adding it should be a TODO.
>

One of the reasons we dropped it was that the spec didn't seem to make
sense. So if there's a proposal, first it would have to wait until we
can get sane multi-locale support and then it would have to explain how
it actually does what the spec says.

I don't think there's a definite TODO in that.

We don't seem to have had a single squawk since we dropped it, so I
suspect it will be little mourned.

cheers

andrew