Re: [HACKERS] patches for items from TODO list

Lists: pgsql-hackerspgsql-patches
From: Sergey Ten <sergey(at)sourcelabs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: jason(at)sourcelabs(dot)com, Sergey Ten <sergey(at)sourcelabs(dot)com>
Subject: patches for items from TODO list
Date: 2005-05-12 02:01:50
Message-ID: 4282B90E.8040707@sourcelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hello all,

We would like to contribute to the Postgresql community by implementing
the following items from the TODO list
(http://developer.postgresql.org/todo.php):
. Allow COPY to understand \x as a hex byte . Allow COPY to optionally
include column headings in the first line . Add XML output to COPY

The changes are straightforward and include implementation of the
features as well as modification of the regression tests and documentation.

Before sending a diff file with the changes, we would like to know if
these features have been already implemented.

Best regards,
Jason Lucas and Sergey Ten
SourceLabs

Dependable Open Source Systems


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Sergey Ten <sergey(at)sourcelabs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, jason(at)sourcelabs(dot)com
Subject: Re: patches for items from TODO list
Date: 2005-05-12 02:10:53
Message-ID: 200505120210.j4C2ArB29895@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Sergey Ten wrote:
> Hello all,
>
> We would like to contribute to the Postgresql community by implementing
> the following items from the TODO list
> (http://developer.postgresql.org/todo.php):
> . Allow COPY to understand \x as a hex byte . Allow COPY to optionally
> include column headings in the first line . Add XML output to COPY
>
> The changes are straightforward and include implementation of the
> features as well as modification of the regression tests and documentation.
>
> Before sending a diff file with the changes, we would like to know if
> these features have been already implemented.

Please check the web site version. Someone has already implemented
"Allow COPY to optionally include column headings in the first line".

As far as XML, there has been discussion on where that should be done?
In the backend, libpq, or psql. It will need discussion on hackers. I
assume you have read the developer's FAQ too.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Sergey Ten <sergey(at)sourcelabs(dot)com>, pgsql-hackers(at)postgresql(dot)org, jason(at)sourcelabs(dot)com
Subject: Re: patches for items from TODO list
Date: 2005-05-12 02:35:31
Message-ID: 4282C0F3.5060508@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> Please check the web site version. Someone has already implemented
> "Allow COPY to optionally include column headings in the first line".
>
> As far as XML, there has been discussion on where that should be done?
> In the backend, libpq, or psql. It will need discussion on hackers. I
> assume you have read the developer's FAQ too.

The other issue is 'what XML format'? Find me a standard data dump XML
DTD and I'll change phpPgAdmin to use it as well.

Otherwise, phpPgAdmin's is quite simple.

Chris


From: "Sergey Ten" <sergey(at)sourcelabs(dot)com>
To: "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <jason(at)sourcelabs(dot)com>
Subject: Re: patches for items from TODO list
Date: 2005-05-12 18:41:54
Message-ID: 200505121837.j4CIbIXW026780@sourcelabs.sourcelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Thank you to all who replied for your time.

We have checked http://momjian.postgresql.org/cgi-bin/pgpatches and
http://momjian.postgresql.org/cgi-bin/pgpatches2, and could not find patches
with words "copy" or "xml" in the subject. Could you please clarify what the
website version is and where it can be found? Is it sources available at
ftp://ftp.postgresql.org?

Best regards,
Jason, Sergey

> -----Original Message-----
> From: Christopher Kings-Lynne [mailto:chriskl(at)familyhealth(dot)com(dot)au]
> Sent: Wednesday, May 11, 2005 7:36 PM
> To: Bruce Momjian
> Cc: Sergey Ten; pgsql-hackers(at)postgresql(dot)org; jason(at)sourcelabs(dot)com
> Subject: Re: [HACKERS] patches for items from TODO list
>
> > Please check the web site version. Someone has already implemented
> > "Allow COPY to optionally include column headings in the first line".
> >
> > As far as XML, there has been discussion on where that should be done?
> > In the backend, libpq, or psql. It will need discussion on hackers. I
> > assume you have read the developer's FAQ too.
>
> The other issue is 'what XML format'? Find me a standard data dump XML
> DTD and I'll change phpPgAdmin to use it as well.
>
> Otherwise, phpPgAdmin's is quite simple.
>
> Chris


From: "Michael Paesold" <mpaesold(at)gmx(dot)at>
To: "Sergey Ten" <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <jason(at)sourcelabs(dot)com>
Subject: Re: patches for items from TODO list
Date: 2005-05-12 19:06:28
Message-ID: 007701c55725$b8052ac0$0f01a8c0@zaphod
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Sergey Ten wrote:
> Thank you to all who replied for your time.
>
> We have checked http://momjian.postgresql.org/cgi-bin/pgpatches and
> http://momjian.postgresql.org/cgi-bin/pgpatches2, and could not find
> patches
> with words "copy" or "xml" in the subject. Could you please clarify what
> the
> website version is and where it can be found? Is it sources available at
> ftp://ftp.postgresql.org?

I think the website version of the TODO is that one on www.postgresql.org
(see http://www.postgresql.org/docs/faqs.TODO.html, from
Developers->Roadmap).
"-Allow COPY to optionally include column headings in the first line" is
already completed (-). That is what Bruce referenced (below).

The patch has already been commited and is therefore not on the patches list
anymore:
http://archives.postgresql.org/pgsql-committers/2005-05/msg00075.php

(I searched for "copy" in the mailing list archives of pgsql-committers.)

Bruce Momjian wrote:
>> > Please check the web site version. Someone has already implemented
>> > "Allow COPY to optionally include column headings in the first line".

Hope that clarifies the situation a little bit.

Best Regards,
Michael Paesold


From: "Sergey Ten" <sergey(at)sourcelabs(dot)com>
To: "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <jason(at)sourcelabs(dot)com>
Subject: Re: patches for items from TODO list
Date: 2005-05-13 23:01:48
Message-ID: 200505132256.j4DMumXW007233@sourcelabs.sourcelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hello all,

Thank you to all who replied for suggestions and help. Enclosed please find
code changes for the following items:
- Allow COPY to understand \x as a hex byte, and
- Add XML output to COPY
The changes include implementation of the features as well as modification
of the copy regression test.

After a careful consideration we decided to
- put XML implementation in the backend and
- use XML format described below, with justification of our decision.

The XML schema used by the COPY TO command was designed for ease of use and
to avoid the problem of column names appearing in XML element names.
XML doesn't allow spaces and punctuation in element names but Postgres does
allow these characters in column names; therefore, a direct mapping would be
problematic.

The solution selected places the column names into attribute fields where
any special characters they contain can be properly escaped using XML
entities. An additional attribute is used to distinguish null fields from
empty ones.

The example below is taken from the test suite. It demonstrates some basic
XML escaping in row 2. Row 3 demonstrates the difference between an empty
string (in col2) and a null string (in col3). If a field is null it will
always be empty but a field which is empty may or may not be null.
Always check the value of the 'null' attribute to be sure when a field is
truly null.

<?xml version='1.0'?>
<table>
<row>
<col name='col1' null='n'>Jackson, Sam</col>
<col name='col2' null='n'>\h</col>
</row>
<row>
<col name='col1' null='n'>It is &quot;perfect&quot;.</col>
<col name='col2' null='n'>&#09;</col>
</row>
<row>
<col name='col1' null='n'></col>
<col name='col2' null='y'></col>
</row>
</table>

Please let us know if about any concerns, objections the proposed change may
cause.

Best regards,
Jason Lucas, Sergey Ten
SourceLabs

> -----Original Message-----
> From: Bruce Momjian [mailto:pgman(at)candle(dot)pha(dot)pa(dot)us]
> Sent: Wednesday, May 11, 2005 7:11 PM
> To: Sergey Ten
> Cc: pgsql-hackers(at)postgresql(dot)org; jason(at)sourcelabs(dot)com
> Subject: Re: [HACKERS] patches for items from TODO list
>
> Sergey Ten wrote:
> > Hello all,
> >
> > We would like to contribute to the Postgresql community by implementing
> > the following items from the TODO list
> > (http://developer.postgresql.org/todo.php):
> > . Allow COPY to understand \x as a hex byte . Allow COPY to optionally
> > include column headings in the first line . Add XML output to COPY
> >
> > The changes are straightforward and include implementation of the
> > features as well as modification of the regression tests and
> documentation.
> >
> > Before sending a diff file with the changes, we would like to know if
> > these features have been already implemented.
>
> Please check the web site version. Someone has already implemented
> "Allow COPY to optionally include column headings in the first line".
>
> As far as XML, there has been discussion on where that should be done?
> In the backend, libpq, or psql. It will need discussion on hackers. I
> assume you have read the developer's FAQ too.
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania
> 19073

Attachment Content-Type Size
diff.txt text/plain 20.8 KB

From: Markus Bertheau <twanger(at)bluetwanger(dot)de>
To: Sergey Ten <sergey(at)sourcelabs(dot)com>
Cc: "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, jason(at)sourcelabs(dot)com
Subject: Re: patches for items from TODO list
Date: 2005-05-18 00:59:38
Message-ID: 1116377979.7656.5.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Dnia 13-05-2005, pią o godzinie 16:01 -0700, Sergey Ten napisał(a):

> <?xml version='1.0'?>
> <table>
> <row>
> <col name='col1' null='n'>Jackson, Sam</col>
> <col name='col2' null='n'>\h</col>
> </row>
> <row>
> <col name='col1' null='n'>It is &quot;perfect&quot;.</col>
> <col name='col2' null='n'>&#09;</col>
> </row>
> <row>
> <col name='col1' null='n'></col>
> <col name='col2' null='y'></col>
> </row>
> </table>

Why didn't you do something to the effect of

<?xml version='1.0'?>
<table>
<cols>
<col name='col1'/>
<col name='col2'/>
</cols>
<row>
<col null='n'>Jackson, Sam</col>
<col null='n'>\h</col>
</row>
<row>
<col null='n'>It is &quot;perfect&quot;.</col>
<col null='n'>&#09;</col>
</row>
<row>
<col null='n'></col>
<col null='y'></col>
</row>
</table>

This avoids repeating the column names in every row, which don't change
over the rows anyway. By reducing redundant information it also makes
structurally invalid XML less likely (whether that is relevant depends
on what people do with the XML data).

Also you could encode the XML output as UTF-8, which would make the
files more readable for humans if the text data is not ASCII.

Markus

--
Markus Bertheau <twanger(at)bluetwanger(dot)de>


From: Neil Conway <neilc(at)samurai(dot)com>
To: Sergey Ten <sergey(at)sourcelabs(dot)com>
Cc: 'Bruce Momjian' <pgman(at)candle(dot)pha(dot)pa(dot)us>, 'Christopher Kings-Lynne' <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, jason(at)sourcelabs(dot)com
Subject: Re: patches for items from TODO list
Date: 2005-05-18 02:11:17
Message-ID: 428AA445.6010007@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Sergey Ten wrote:
> After a careful consideration we decided to
> - put XML implementation in the backend

What advantage does putting the XML output mode in the backend provide?

-Neil


From: "Sergey Ten" <sergey(at)sourcelabs(dot)com>
To: "'Markus Bertheau'" <twanger(at)bluetwanger(dot)de>
Cc: "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, <pgsql-hackers(at)postgresql(dot)org>, <jason(at)sourcelabs(dot)com>
Subject: Re: patches for items from TODO list
Date: 2005-05-18 20:27:55
Message-ID: 200505182021.j4IKLJXW026519@sourcelabs.sourcelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Markus,

Thank you for your reply.
We considered embedding of an XML schema first followed by data. We decided
to stick to our current data format to make sure stateless XML parsers can
process it as well. Would it be better to add an option to the COPY command,
to allow embedding an XML schema, so more advanced XML parsers can take
advantage of it?

Thanks,
Jason, Sergey

> -----Original Message-----
> From: Markus Bertheau [mailto:twanger(at)bluetwanger(dot)de]
> Sent: Tuesday, May 17, 2005 6:00 PM
> To: Sergey Ten
> Cc: 'Bruce Momjian'; 'Christopher Kings-Lynne'; pgsql-
> hackers(at)postgresql(dot)org; jason(at)sourcelabs(dot)com
> Subject: Re: [HACKERS] patches for items from TODO list
>
> Dnia 13-05-2005, pią o godzinie 16:01 -0700, Sergey Ten napisał(a):
>
> > <?xml version='1.0'?>
> > <table>
> > <row>
> > <col name='col1' null='n'>Jackson, Sam</col>
> > <col name='col2' null='n'>\h</col>
> > </row>
> > <row>
> > <col name='col1' null='n'>It is &quot;perfect&quot;.</col>
> > <col name='col2' null='n'>&#09;</col>
> > </row>
> > <row>
> > <col name='col1' null='n'></col>
> > <col name='col2' null='y'></col>
> > </row>
> > </table>
>
> Why didn't you do something to the effect of
>
> <?xml version='1.0'?>
> <table>
> <cols>
> <col name='col1'/>
> <col name='col2'/>
> </cols>
> <row>
> <col null='n'>Jackson, Sam</col>
> <col null='n'>\h</col>
> </row>
> <row>
> <col null='n'>It is &quot;perfect&quot;.</col>
> <col null='n'>&#09;</col>
> </row>
> <row>
> <col null='n'></col>
> <col null='y'></col>
> </row>
> </table>
>
> This avoids repeating the column names in every row, which don't change
> over the rows anyway. By reducing redundant information it also makes
> structurally invalid XML less likely (whether that is relevant depends
> on what people do with the XML data).
>
> Also you could encode the XML output as UTF-8, which would make the
> files more readable for humans if the text data is not ASCII.
>
> Markus
>
> --
> Markus Bertheau <twanger(at)bluetwanger(dot)de>


From: "Sergey Ten" <sergey(at)sourcelabs(dot)com>
To: "'Neil Conway'" <neilc(at)samurai(dot)com>
Cc: "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, <pgsql-hackers(at)postgresql(dot)org>, <jason(at)sourcelabs(dot)com>
Subject: Re: patches for items from TODO list
Date: 2005-05-18 20:42:57
Message-ID: 200505182036.j4IKaOXW028632@sourcelabs.sourcelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Neil,

We think that putting it in the backend will make access from other
components easier.

Thank you,
Sergey

> -----Original Message-----
> From: Neil Conway [mailto:neilc(at)samurai(dot)com]
> Sent: Tuesday, May 17, 2005 7:11 PM
> To: Sergey Ten
> Cc: 'Bruce Momjian'; 'Christopher Kings-Lynne'; pgsql-
> hackers(at)postgresql(dot)org; jason(at)sourcelabs(dot)com
> Subject: Re: [HACKERS] patches for items from TODO list
>
> Sergey Ten wrote:
> > After a careful consideration we decided to
> > - put XML implementation in the backend
>
> What advantage does putting the XML output mode in the backend provide?
>
> -Neil


From: Neil Conway <neilc(at)samurai(dot)com>
To: Sergey Ten <sergey(at)sourcelabs(dot)com>
Cc: 'Bruce Momjian' <pgman(at)candle(dot)pha(dot)pa(dot)us>, 'Christopher Kings-Lynne' <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, jason(at)sourcelabs(dot)com
Subject: Re: patches for items from TODO list
Date: 2005-05-19 01:40:32
Message-ID: 428BEE90.9000705@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Sergey Ten wrote:
> We think that putting it in the backend will make access from other
> components easier.

In what way?

It seems to me that this can be done just as easily in a client
application / library, without cluttering the backend with yet another
COPY output format. It would also avoid the need to mandate a single XML
schema -- different clients will likely have different requirements.
Since I am skeptical of the value of this feature in the first place, I
think it would do less damage if implemented outside the backend.

-Neil


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Sergey Ten <sergey(at)sourcelabs(dot)com>
Cc: "'Markus Bertheau'" <twanger(at)bluetwanger(dot)de>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, jason(at)sourcelabs(dot)com
Subject: Re: patches for items from TODO list
Date: 2005-05-20 14:17:45
Message-ID: 200505201417.j4KEHjb10919@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Sergey Ten wrote:
> Markus,
>
> Thank you for your reply.
> We considered embedding of an XML schema first followed by data. We decided
> to stick to our current data format to make sure stateless XML parsers can
> process it as well. Would it be better to add an option to the COPY command,
> to allow embedding an XML schema, so more advanced XML parsers can take
> advantage of it?

Current CVS has a WITH CSV HEADER option. I wonder if we should add a
HEADER option to XML to output the schema --- seems to make sense.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, jason(at)sourcelabs(dot)com
Subject: Re: patches for items from TODO list
Date: 2005-05-20 14:19:02
Message-ID: 200505201419.j4KEJ2m11133@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Neil Conway wrote:
> Sergey Ten wrote:
> > We think that putting it in the backend will make access from other
> > components easier.
>
> In what way?
>
> It seems to me that this can be done just as easily in a client
> application / library, without cluttering the backend with yet another
> COPY output format. It would also avoid the need to mandate a single XML
> schema -- different clients will likely have different requirements.
> Since I am skeptical of the value of this feature in the first place, I
> think it would do less damage if implemented outside the backend.

We considered putting XML in psql or libpq in the past, but the problem
is that interfaces like jdbc couldn't take advantage of it. I do think
it needs to be in the backend, and I think that is the agreement we had
in the past.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Sergey Ten <sergey(at)sourcelabs(dot)com>
Cc: "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, jason(at)sourcelabs(dot)com
Subject: Re: patches for items from TODO list
Date: 2005-05-20 21:33:46
Message-ID: 428E57BA.8000803@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


I've been reviewing this patch and some of the following discussion.

First, postgresql patches are usually sent as context diffs. I don't
object to unidiffs myself, but you should do what everybody else does.

Second, it's best not to combine features in one patch. The \x escape
piece should be broken out.

I'm also a rather worried about COPY producing output which it can't
itself parse. We can read our own binary, text and CSV formats, and I
think that's a useful validation tool. I know the TODO item only
mentions output, but I believe we should rethink that. In any case, if
it's valid for us to hand XML to other programs why shouldn't we accept
it too. This is all about playing nicely in the playground.

One advantage of XML is that, being hierarchical, it can easily express
nested composites (records, arrays) in a way that our present text and
CSV formats really can't. But unless I missed something this patch
doesn't in fact do anything to break out nested composites.

Finally, I don't know if there is a standard on this, or even a
convention. What do other DBs do? I'm not keen on us just inventing our
own XML dialect for something that should after all be most useful in
data exchange.

Bottom line, much as I would like to see XML input/output, I think this
needs lots more thought and discussion.

cheers

andrew

Sergey Ten wrote:

>Hello all,
>
>Thank you to all who replied for suggestions and help. Enclosed please find
>code changes for the following items:
>- Allow COPY to understand \x as a hex byte, and
>- Add XML output to COPY
>The changes include implementation of the features as well as modification
>of the copy regression test.
>
>After a careful consideration we decided to
>- put XML implementation in the backend and
>- use XML format described below, with justification of our decision.
>
>The XML schema used by the COPY TO command was designed for ease of use and
>to avoid the problem of column names appearing in XML element names.
>XML doesn't allow spaces and punctuation in element names but Postgres does
>allow these characters in column names; therefore, a direct mapping would be
>problematic.
>
>The solution selected places the column names into attribute fields where
>any special characters they contain can be properly escaped using XML
>entities. An additional attribute is used to distinguish null fields from
>empty ones.
>
>The example below is taken from the test suite. It demonstrates some basic
>XML escaping in row 2. Row 3 demonstrates the difference between an empty
>string (in col2) and a null string (in col3). If a field is null it will
>always be empty but a field which is empty may or may not be null.
>Always check the value of the 'null' attribute to be sure when a field is
>truly null.
>
><?xml version='1.0'?>
><table>
> <row>
> <col name='col1' null='n'>Jackson, Sam</col>
> <col name='col2' null='n'>\h</col>
> </row>
> <row>
> <col name='col1' null='n'>It is &quot;perfect&quot;.</col>
> <col name='col2' null='n'>&#09;</col>
> </row>
> <row>
> <col name='col1' null='n'></col>
> <col name='col2' null='y'></col>
> </row>
></table>
>
>Please let us know if about any concerns, objections the proposed change may
>cause.
>
>Best regards,
>Jason Lucas, Sergey Ten
>SourceLabs
>
>
>
>>-----Original Message-----
>>From: Bruce Momjian [mailto:pgman(at)candle(dot)pha(dot)pa(dot)us]
>>Sent: Wednesday, May 11, 2005 7:11 PM
>>To: Sergey Ten
>>Cc: pgsql-hackers(at)postgresql(dot)org; jason(at)sourcelabs(dot)com
>>Subject: Re: [HACKERS] patches for items from TODO list
>>
>>Sergey Ten wrote:
>>
>>
>>>Hello all,
>>>
>>>We would like to contribute to the Postgresql community by implementing
>>>the following items from the TODO list
>>>(http://developer.postgresql.org/todo.php):
>>>. Allow COPY to understand \x as a hex byte . Allow COPY to optionally
>>>include column headings in the first line . Add XML output to COPY
>>>
>>>The changes are straightforward and include implementation of the
>>>features as well as modification of the regression tests and
>>>
>>>
>>documentation.
>>
>>
>>>Before sending a diff file with the changes, we would like to know if
>>>these features have been already implemented.
>>>
>>>
>>Please check the web site version. Someone has already implemented
>>"Allow COPY to optionally include column headings in the first line".
>>
>>As far as XML, there has been discussion on where that should be done?
>>In the backend, libpq, or psql. It will need discussion on hackers. I
>>assume you have read the developer's FAQ too.
>>
>>--
>> Bruce Momjian | http://candle.pha.pa.us
>> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
>> + If your life is a hard drive, | 13 Roberts Road
>> + Christ can be your backup. | Newtown Square, Pennsylvania
>>19073
>>
>>
>>------------------------------------------------------------------------
>>
>>Index: src/backend/commands/copy.c
>>===================================================================
>>RCS file: /projects/cvsroot/pgsql/src/backend/commands/copy.c,v
>>retrieving revision 1.244
>>diff -u -r1.244 copy.c
>>--- src/backend/commands/copy.c 7 May 2005 02:22:46 -0000 1.244
>>+++ src/backend/commands/copy.c 13 May 2005 22:21:00 -0000
>>@@ -84,6 +84,16 @@
>> EOL_CRNL
>> } EolType;
>>
>>+/*
>>+ * Represents the format of the file to be read or written
>>+ */
>>+typedef enum CopyFmt
>>+{
>>+ FMT_TXT,
>>+ FMT_BIN,
>>+ FMT_CSV,
>>+ FMT_XML
>>+} CopyFmt;
>>
>> static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
>>
>>@@ -129,14 +139,14 @@
>> static bool line_buf_converted;
>>
>> /* non-export function prototypes */
>>-static void DoCopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
>>- char *delim, char *null_print, bool csv_mode, char *quote,
>>+static void DoCopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
>>+ char *delim, char *null_print, char *quote,
>> char *escape, List *force_quote_atts, bool header_line, bool fe_copy);
>>-static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
>>- char *delim, char *null_print, bool csv_mode, char *quote, char *escape,
>>+static void CopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
>>+ char *delim, char *null_print, char *quote, char *escape,
>> List *force_quote_atts, bool header_line);
>>-static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
>>- char *delim, char *null_print, bool csv_mode, char *quote, char *escape,
>>+static void CopyFrom(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
>>+ char *delim, char *null_print, char *quote, char *escape,
>> List *force_notnull_atts, bool header_line);
>> static bool CopyReadLine(char * quote, char * escape);
>> static char *CopyReadAttribute(const char *delim, const char *null_print,
>>@@ -171,6 +181,11 @@
>> static void CopySendInt16(int16 val);
>> static int16 CopyGetInt16(void);
>>
>>+static int GetDecimalFromHex(char hex);
>>+
>>+static void CopyAttributeOutXML (char *colname, char *string);
>>+static void CopySendStringXML(char *string);
>>+static char *CopyGetXMLEntity(char c, char *buf);
>>
>> /*
>> * Send copy start/stop messages for frontend copies. These have changed
>>@@ -692,10 +707,9 @@
>> List *attnamelist = stmt->attlist;
>> List *attnumlist;
>> bool fe_copy = false;
>>- bool binary = false;
>> bool oids = false;
>>- bool csv_mode = false;
>>- bool header_line = false;
>>+ bool header_line = false;
>>+ CopyFmt fmt = FMT_TXT;
>> char *delim = NULL;
>> char *quote = NULL;
>> char *escape = NULL;
>>@@ -715,11 +729,11 @@
>>
>> if (strcmp(defel->defname, "binary") == 0)
>> {
>>- if (binary)
>>+ if (fmt != FMT_TXT)
>> ereport(ERROR,
>> (errcode(ERRCODE_SYNTAX_ERROR),
>> errmsg("conflicting or redundant options")));
>>- binary = intVal(defel->arg);
>>+ fmt = FMT_BIN;
>> }
>> else if (strcmp(defel->defname, "oids") == 0)
>> {
>>@@ -747,11 +761,19 @@
>> }
>> else if (strcmp(defel->defname, "csv") == 0)
>> {
>>- if (csv_mode)
>>+ if (fmt != FMT_TXT)
>> ereport(ERROR,
>> (errcode(ERRCODE_SYNTAX_ERROR),
>> errmsg("conflicting or redundant options")));
>>- csv_mode = intVal(defel->arg);
>>+ fmt = FMT_CSV;
>>+ }
>>+ else if (strcmp(defel->defname, "xml") == 0)
>>+ {
>>+ if (fmt != FMT_TXT)
>>+ ereport(ERROR,
>>+ (errcode(ERRCODE_SYNTAX_ERROR),
>>+ errmsg("conflicting or redundant options")));
>>+ fmt = FMT_XML;
>> }
>> else if (strcmp(defel->defname, "header") == 0)
>> {
>>@@ -798,29 +820,39 @@
>> defel->defname);
>> }
>>
>>- if (binary && delim)
>>+ if (fmt == FMT_BIN && delim)
>> ereport(ERROR,
>> (errcode(ERRCODE_SYNTAX_ERROR),
>> errmsg("cannot specify DELIMITER in BINARY mode")));
>>
>>- if (binary && csv_mode)
>>+ if (fmt == FMT_BIN && null_print)
>> ereport(ERROR,
>> (errcode(ERRCODE_SYNTAX_ERROR),
>>- errmsg("cannot specify CSV in BINARY mode")));
>>+ errmsg("cannot specify NULL in BINARY mode")));
>>
>>- if (binary && null_print)
>>+ if (fmt == FMT_XML && is_from)
>> ereport(ERROR,
>> (errcode(ERRCODE_SYNTAX_ERROR),
>>- errmsg("cannot specify NULL in BINARY mode")));
>>+ errmsg("XML mode is not available in COPY FROM")));
>>+
>>+ if (fmt == FMT_XML && delim)
>>+ ereport(ERROR,
>>+ (errcode(ERRCODE_SYNTAX_ERROR),
>>+ errmsg("cannot specify DELIMITER in XML mode")));
>>+
>>+ if (fmt == FMT_XML && null_print)
>>+ ereport(ERROR,
>>+ (errcode(ERRCODE_SYNTAX_ERROR),
>>+ errmsg("cannot specify NULL in XML mode")));
>>
>> /* Set defaults */
>> if (!delim)
>>- delim = csv_mode ? "," : "\t";
>>+ delim = (fmt == FMT_CSV) ? "," : "\t";
>>
>> if (!null_print)
>>- null_print = csv_mode ? "" : "\\N";
>>+ null_print = (fmt == FMT_CSV) ? "" : "\\N";
>>
>>- if (csv_mode)
>>+ if (fmt == FMT_CSV)
>> {
>> if (!quote)
>> quote = "\"";
>>@@ -835,35 +867,35 @@
>> errmsg("COPY delimiter must be a single character")));
>>
>> /* Check header */
>>- if (!csv_mode && header_line)
>>+ if (fmt != FMT_CSV && header_line)
>> ereport(ERROR,
>> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> errmsg("COPY HEADER available only in CSV mode")));
>>
>> /* Check quote */
>>- if (!csv_mode && quote != NULL)
>>+ if (fmt != FMT_CSV && quote != NULL)
>> ereport(ERROR,
>> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> errmsg("COPY quote available only in CSV mode")));
>>
>>- if (csv_mode && strlen(quote) != 1)
>>+ if (fmt == FMT_CSV && strlen(quote) != 1)
>> ereport(ERROR,
>> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> errmsg("COPY quote must be a single character")));
>>
>> /* Check escape */
>>- if (!csv_mode && escape != NULL)
>>+ if (fmt != FMT_CSV && escape != NULL)
>> ereport(ERROR,
>> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> errmsg("COPY escape available only in CSV mode")));
>>
>>- if (csv_mode && strlen(escape) != 1)
>>+ if (fmt == FMT_CSV && strlen(escape) != 1)
>> ereport(ERROR,
>> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> errmsg("COPY escape must be a single character")));
>>
>> /* Check force_quote */
>>- if (!csv_mode && force_quote != NIL)
>>+ if (fmt != FMT_CSV && force_quote != NIL)
>> ereport(ERROR,
>> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> errmsg("COPY force quote available only in CSV mode")));
>>@@ -873,7 +905,7 @@
>> errmsg("COPY force quote only available using COPY TO")));
>>
>> /* Check force_notnull */
>>- if (!csv_mode && force_notnull != NIL)
>>+ if (fmt != FMT_CSV && force_notnull != NIL)
>> ereport(ERROR,
>> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> errmsg("COPY force not null available only in CSV mode")));
>>@@ -889,7 +921,7 @@
>> errmsg("COPY delimiter must not appear in the NULL specification")));
>>
>> /* Don't allow the csv quote char to appear in the null string. */
>>- if (csv_mode && strchr(null_print, quote[0]) != NULL)
>>+ if (fmt == FMT_CSV && strchr(null_print, quote[0]) != NULL)
>> ereport(ERROR,
>> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> errmsg("CSV quote character must not appear in the NULL specification")));
>>@@ -1004,7 +1036,7 @@
>> if (pipe)
>> {
>> if (whereToSendOutput == Remote)
>>- ReceiveCopyBegin(binary, list_length(attnumlist));
>>+ ReceiveCopyBegin(fmt == FMT_BIN, list_length(attnumlist));
>> else
>> copy_file = stdin;
>> }
>>@@ -1029,7 +1061,7 @@
>> errmsg("\"%s\" is a directory", filename)));
>> }
>> }
>>- CopyFrom(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
>>+ CopyFrom(rel, attnumlist, fmt, oids, delim, null_print,
>> quote, escape, force_notnull_atts, header_line);
>> }
>> else
>>@@ -1093,7 +1125,7 @@
>> }
>> }
>>
>>- DoCopyTo(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
>>+ DoCopyTo(rel, attnumlist, fmt, oids, delim, null_print,
>> quote, escape, force_quote_atts, header_line, fe_copy);
>> }
>>
>>@@ -1124,20 +1156,20 @@
>> * so we don't need to plaster a lot of variables with "volatile".
>> */
>> static void
>>-DoCopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
>>- char *delim, char *null_print, bool csv_mode, char *quote,
>>+DoCopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
>>+ char *delim, char *null_print, char *quote,
>> char *escape, List *force_quote_atts, bool header_line, bool fe_copy)
>> {
>> PG_TRY();
>> {
>> if (fe_copy)
>>- SendCopyBegin(binary, list_length(attnumlist));
>>+ SendCopyBegin(fmt == FMT_BIN, list_length(attnumlist));
>>
>>- CopyTo(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
>>+ CopyTo(rel, attnumlist, fmt, oids, delim, null_print,
>> quote, escape, force_quote_atts, header_line);
>>
>> if (fe_copy)
>>- SendCopyEnd(binary);
>>+ SendCopyEnd(fmt == FMT_BIN);
>> }
>> PG_CATCH();
>> {
>>@@ -1156,8 +1188,8 @@
>> * Copy from relation TO file.
>> */
>> static void
>>-CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
>>- char *delim, char *null_print, bool csv_mode, char *quote,
>>+CopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
>>+ char *delim, char *null_print, char *quote,
>> char *escape, List *force_quote_atts, bool header_line)
>> {
>> HeapTuple tuple;
>>@@ -1187,7 +1219,7 @@
>> Oid out_func_oid;
>> bool isvarlena;
>>
>>- if (binary)
>>+ if (fmt == FMT_BIN)
>> getTypeBinaryOutputInfo(attr[attnum - 1]->atttypid,
>> &out_func_oid,
>> &isvarlena);
>>@@ -1215,7 +1247,7 @@
>> ALLOCSET_DEFAULT_INITSIZE,
>> ALLOCSET_DEFAULT_MAXSIZE);
>>
>>- if (binary)
>>+ if (fmt == FMT_BIN)
>> {
>> /* Generate header for a binary copy */
>> int32 tmp;
>>@@ -1233,6 +1265,14 @@
>> }
>> else
>> {
>>+ if (fmt == FMT_XML)
>>+ {
>>+ CopySendString("<?xml version='1.0'?>");
>>+ CopySendEndOfRow(false);
>>+ CopySendString("<table>");
>>+ CopySendEndOfRow(false);
>>+ }
>>+
>> /*
>> * For non-binary copy, we need to convert null_print to client
>> * encoding, because it will be sent directly with CopySendString.
>>@@ -1262,7 +1302,7 @@
>> strcmp(colname, null_print) == 0);
>> }
>>
>>- CopySendEndOfRow(binary);
>>+ CopySendEndOfRow(fmt == FMT_BIN);
>>
>> }
>> }
>>@@ -1278,7 +1318,7 @@
>> MemoryContextReset(mycontext);
>> oldcontext = MemoryContextSwitchTo(mycontext);
>>
>>- if (binary)
>>+ if (fmt == FMT_BIN)
>> {
>> /* Binary per-tuple header */
>> CopySendInt16(attr_count);
>>@@ -1294,25 +1334,34 @@
>> }
>> else
>> {
>>+ if (fmt == FMT_XML)
>>+ CopySendString("<row>");
>>+
>> /* Text format has no per-tuple header, but send OID if wanted */
>> if (oids)
>> {
>> string = DatumGetCString(DirectFunctionCall1(oidout,
>> ObjectIdGetDatum(HeapTupleGetOid(tuple))));
>>- CopySendString(string);
>>+
>>+ if (fmt == FMT_XML)
>>+ CopyAttributeOutXML("oid", string);
>>+ else
>>+ CopySendString(string);
>>+
>> need_delim = true;
>> }
>> }
>>
>> foreach(cur, attnumlist)
>> {
>>- int attnum = lfirst_int(cur);
>>+ int attnum = lfirst_int(cur);
>> Datum value;
>> bool isnull;
>>+ char *colname = NameStr(attr[attnum - 1]->attname);
>>
>> value = heap_getattr(tuple, attnum, tupDesc, &isnull);
>>
>>- if (!binary)
>>+ if (fmt == FMT_TXT || fmt == FMT_CSV)
>> {
>> if (need_delim)
>> CopySendChar(delim[0]);
>>@@ -1321,53 +1370,71 @@
>>
>> if (isnull)
>> {
>>- if (!binary)
>>- CopySendString(null_print); /* null indicator */
>>- else
>>- CopySendInt32(-1); /* null marker */
>>+ switch (fmt)
>>+ {
>>+ case FMT_BIN:
>>+ CopySendInt32(-1); /* null marker */
>>+ break;
>>+ case FMT_XML:
>>+ CopyAttributeOutXML(colname, NULL); /* null entity */
>>+ break;
>>+ default:
>>+ CopySendString(null_print); /* null indicator */
>>+ break;
>>+ }
>>+
>> }
>> else
>> {
>>- if (!binary)
>>+ if (fmt == FMT_BIN)
>> {
>>- string = DatumGetCString(FunctionCall1(&out_functions[attnum - 1],
>>- value));
>>- if (csv_mode)
>>- {
>>- CopyAttributeOutCSV(string, delim, quote, escape,
>>- (strcmp(string, null_print) == 0 ||
>>- force_quote[attnum - 1]));
>>- }
>>- else
>>- CopyAttributeOut(string, delim);
>>-
>>+ bytea *outputbytes =
>>+ DatumGetByteaP(FunctionCall1(&out_functions[attnum - 1], value));
>>+ /* We assume the result will not have been toasted */
>>+ CopySendInt32(VARSIZE(outputbytes) - VARHDRSZ);
>>+ CopySendData(VARDATA(outputbytes), VARSIZE(outputbytes) - VARHDRSZ);
>> }
>> else
>> {
>>- bytea *outputbytes;
>>-
>>- outputbytes = DatumGetByteaP(FunctionCall1(&out_functions[attnum - 1],
>>- value));
>>- /* We assume the result will not have been toasted */
>>- CopySendInt32(VARSIZE(outputbytes) - VARHDRSZ);
>>- CopySendData(VARDATA(outputbytes),
>>- VARSIZE(outputbytes) - VARHDRSZ);
>>+ string = DatumGetCString(FunctionCall1(&out_functions[attnum - 1], value));
>>+ switch (fmt)
>>+ {
>>+ case FMT_CSV:
>>+ CopyAttributeOutCSV(string, delim, quote, escape,
>>+ (strcmp(string, null_print) == 0
>>+ || force_quote[attnum - 1]));
>>+ break;
>>+ case FMT_XML:
>>+ CopyAttributeOutXML(colname, string);
>>+ break;
>>+ default:
>>+ CopyAttributeOut(string, delim);
>>+ break;
>>+ }
>> }
>> }
>> }
>>
>>- CopySendEndOfRow(binary);
>>+ if (fmt == FMT_XML)
>>+ CopySendString("</row>");
>>+
>>+ CopySendEndOfRow(fmt == FMT_BIN);
>>
>> MemoryContextSwitchTo(oldcontext);
>> }
>>
>> heap_endscan(scandesc);
>>
>>- if (binary)
>>+ if (fmt == FMT_BIN)
>> {
>> /* Generate trailer for a binary copy */
>> CopySendInt16(-1);
>> }
>>+ else if (fmt == FMT_XML)
>>+ {
>>+ CopySendString("</table>");
>>+ CopySendEndOfRow(false);
>>+ }
>>
>> MemoryContextDelete(mycontext);
>>
>>@@ -1464,8 +1531,8 @@
>> * Copy FROM file to relation.
>> */
>> static void
>>-CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
>>- char *delim, char *null_print, bool csv_mode, char *quote,
>>+CopyFrom(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
>>+ char *delim, char *null_print, char *quote,
>> char *escape, List *force_notnull_atts, bool header_line)
>> {
>> HeapTuple tuple;
>>@@ -1549,7 +1616,7 @@
>> continue;
>>
>> /* Fetch the input function and typioparam info */
>>- if (binary)
>>+ if (fmt == FMT_BIN)
>> getTypeBinaryInputInfo(attr[attnum - 1]->atttypid,
>> &in_func_oid, &typioparams[attnum - 1]);
>> else
>>@@ -1620,7 +1687,7 @@
>> */
>> ExecBSInsertTriggers(estate, resultRelInfo);
>>
>>- if (!binary)
>>+ if (fmt != FMT_BIN)
>> file_has_oids = oids; /* must rely on user to tell us this... */
>> else
>> {
>>@@ -1663,7 +1730,7 @@
>> }
>> }
>>
>>- if (file_has_oids && binary)
>>+ if (file_has_oids && fmt == FMT_BIN)
>> {
>> getTypeBinaryInputInfo(OIDOID,
>> &in_func_oid, &oid_typioparam);
>>@@ -1681,7 +1748,7 @@
>> /* Initialize static variables */
>> fe_eof = false;
>> eol_type = EOL_UNKNOWN;
>>- copy_binary = binary;
>>+ copy_binary = (fmt == FMT_BIN);
>> copy_relname = RelationGetRelationName(rel);
>> copy_lineno = 0;
>> copy_attname = NULL;
>>@@ -1718,14 +1785,14 @@
>> MemSet(values, 0, num_phys_attrs * sizeof(Datum));
>> MemSet(nulls, 'n', num_phys_attrs * sizeof(char));
>>
>>- if (!binary)
>>+ if (fmt != FMT_BIN)
>> {
>> CopyReadResult result = NORMAL_ATTR;
>> char *string;
>> ListCell *cur;
>>
>> /* Actually read the line into memory here */
>>- done = csv_mode ?
>>+ done = (fmt == FMT_CSV) ?
>> CopyReadLine(quote, escape) : CopyReadLine(NULL, NULL);
>>
>> /*
>>@@ -1776,7 +1843,7 @@
>> errmsg("missing data for column \"%s\"",
>> NameStr(attr[m]->attname))));
>>
>>- if (csv_mode)
>>+ if (fmt == FMT_CSV)
>> {
>> string = CopyReadAttributeCSV(delim, null_print, quote,
>> escape, &result, &isnull);
>>@@ -1789,7 +1856,7 @@
>> string = CopyReadAttribute(delim, null_print,
>> &result, &isnull);
>>
>>- if (csv_mode && isnull && force_notnull[m])
>>+ if (fmt == FMT_CSV && isnull && force_notnull[m])
>> {
>> string = null_print; /* set to NULL string */
>> isnull = false;
>>@@ -2275,6 +2342,27 @@
>> }
>>
>> /*----------
>>+ * Returns decimal value for a hexadecimal digit.
>>+*----------
>>+ */
>>+static int GetDecimalFromHex(char hex)
>>+{
>>+ if (isdigit(hex))
>>+ {
>>+ // If it is a digit
>>+ return hex - '0';
>>+ }
>>+ if (hex < 'a')
>>+ {
>>+ return hex - 'A' + 10;
>>+ }
>>+ else
>>+ {
>>+ return hex - 'a' + 10;
>>+ }
>>+}
>>+
>>+/*----------
>> * Read the value of a single attribute, performing de-escaping as needed.
>> *
>> * delim is the column delimiter string (must be just one byte for now).
>>@@ -2378,6 +2466,29 @@
>> case 'v':
>> c = '\v';
>> break;
>>+ case 'x':
>>+ case 'X':
>>+ if (line_buf.cursor < line_buf.len)
>>+ {
>>+ char hexchar = line_buf.data[line_buf.cursor];
>>+ if (isxdigit(hexchar))
>>+ {
>>+ int val = GetDecimalFromHex(hexchar);
>>+ line_buf.cursor++;
>>+ if (line_buf.cursor < line_buf.len)
>>+ {
>>+ hexchar = line_buf.data[line_buf.cursor];
>>+ if (isxdigit(hexchar))
>>+ {
>>+ line_buf.cursor++;
>>+ val = (val << 4) + GetDecimalFromHex(hexchar);
>>+ }
>>+ }
>>+
>>+ c = val & 0xff;
>>+ }
>>+ }
>>+ break;
>>
>> /*
>> * in all other cases, take the char after '\'
>>@@ -2760,3 +2871,84 @@
>>
>> return attnums;
>> }
>>+
>>+/*
>>+ * Send XML representation of one attribute, with element tagging, null
>>+ * marking, and entity escaping.
>>+ */
>>+
>>+static void
>>+CopyAttributeOutXML (char *colname, char *string)
>>+{
>>+ CopySendString("<col name='");
>>+ CopySendStringXML(colname);
>>+ CopySendString("' null='");
>>+ CopySendChar((string == NULL) ? 'y' : 'n');
>>+ CopySendString("'>");
>>+
>>+ if (string != NULL)
>>+ CopySendStringXML(string);
>>+
>>+ CopySendString("</col>");
>>+}
>>+
>>+/*
>>+ * Sends a string with entity escaping.
>>+ */
>>+
>>+static void
>>+CopySendStringXML (char *string)
>>+{
>>+ char *csr;
>>+ for (csr = string; *csr; ++csr)
>>+ {
>>+ char buf[10];
>>+ char *entity = CopyGetXMLEntity(*csr, buf);
>>+ if (entity)
>>+ CopySendString(entity);
>>+ else
>>+ CopySendChar(*csr);
>>+ }
>>+}
>>+
>>+/*
>>+ * Locates or creates an XML entity for the given character.
>>+ * If that character doesn't require an entity, then the
>>+ * function returns NULL.
>>+ */
>>+
>>+static char *
>>+CopyGetXMLEntity (char c, char *buf)
>>+{
>>+ char *entity;
>>+
>>+ switch (c)
>>+ {
>>+ case '<':
>>+ entity = "&lt;";
>>+ break;
>>+ case '>':
>>+ entity = "&gt;";
>>+ break;
>>+ case '&':
>>+ entity = "&amp;";
>>+ break;
>>+ case '\'':
>>+ entity = "&apos;";
>>+ break;
>>+ case '"':
>>+ entity = "&quot;";
>>+ break;
>>+ default:
>>+ if (!isgraph(c) && c != ' ')
>>+ {
>>+ sprintf(buf, "&#%02x;", (unsigned char)c);
>>+ entity = buf;
>>+ }
>>+ else
>>+ entity = NULL;
>>+ break;
>>+ }
>>+
>>+ return entity;
>>+}
>>Index: src/backend/parser/gram.y
>>===================================================================
>>RCS file: /projects/cvsroot/pgsql/src/backend/parser/gram.y,v
>>retrieving revision 2.491
>>diff -u -r2.491 gram.y
>>--- src/backend/parser/gram.y 7 May 2005 02:22:46 -0000 2.491
>>+++ src/backend/parser/gram.y 13 May 2005 22:21:01 -0000
>>@@ -413,6 +413,8 @@
>>
>> WHEN WHERE WITH WITHOUT WORK WRITE
>>
>>+ XML
>>+
>> YEAR_P
>>
>> ZONE
>>@@ -1448,6 +1450,10 @@
>> {
>> $$ = makeDefElem("header", (Node *)makeInteger(TRUE));
>> }
>>+ | XML
>>+ {
>>+ $$ = makeDefElem("xml", (Node *)makeInteger(TRUE));
>>+ }
>> | QUOTE opt_as Sconst
>> {
>> $$ = makeDefElem("quote", (Node *)makeString($3));
>>Index: src/backend/parser/keywords.c
>>===================================================================
>>RCS file: /projects/cvsroot/pgsql/src/backend/parser/keywords.c,v
>>retrieving revision 1.155
>>diff -u -r1.155 keywords.c
>>--- src/backend/parser/keywords.c 7 May 2005 02:22:47 -0000 1.155
>>+++ src/backend/parser/keywords.c 13 May 2005 22:21:01 -0000
>>@@ -342,6 +342,7 @@
>> {"without", WITHOUT},
>> {"work", WORK},
>> {"write", WRITE},
>>+ {"xml", XML},
>> {"year", YEAR_P},
>> {"zone", ZONE},
>> };
>>Index: src/test/regress/expected/copy2.out
>>===================================================================
>>RCS file: /projects/cvsroot/pgsql/src/test/regress/expected/copy2.out,v
>>retrieving revision 1.21
>>diff -u -r1.21 copy2.out
>>--- src/test/regress/expected/copy2.out 13 May 2005 06:33:40 -0000 1.21
>>+++ src/test/regress/expected/copy2.out 13 May 2005 22:21:01 -0000
>>@@ -194,6 +194,28 @@
>> --test that we read consecutive LFs properly
>> CREATE TEMP TABLE testnl (a int, b text, c int);
>> COPY testnl FROM stdin CSV;
>>-DROP TABLE x, y;
>>+CREATE TABLE z (
>>+ col1 text,
>>+ col2 text
>>+);
>>+COPY z from stdin;
>>+COPY z TO stdout;
>>+Jackson, Sam \\h
>>+ABC \\\\\t
>>+It is "perfect". \t
>>+ NULL
>>+COPY z TO stdout WITH CSV;
>>+"Jackson, Sam",\h
>>+ABC,\\
>>+"It is ""perfect"".",
>>+"",NULL
>>+COPY y TO stdout WITH XML;
>>+<?xml version='1.0'?>
>>+<table>
>>+<row><col name='col1' null='n'>Jackson, Sam</col><col name='col2' null='n'>\h</col></row>
>>+<row><col name='col1' null='n'>It is &quot;perfect&quot;.</col><col name='col2' null='n'>&#09;</col></row>
>>+<row><col name='col1' null='n'></col><col name='col2' null='y'></col></row>
>>+</table>
>>+DROP TABLE x, y, z;
>> DROP FUNCTION fn_x_before();
>> DROP FUNCTION fn_x_after();
>>Index: src/test/regress/sql/copy2.sql
>>===================================================================
>>RCS file: /projects/cvsroot/pgsql/src/test/regress/sql/copy2.sql,v
>>retrieving revision 1.12
>>diff -u -r1.12 copy2.sql
>>--- src/test/regress/sql/copy2.sql 13 May 2005 06:33:40 -0000 1.12
>>+++ src/test/regress/sql/copy2.sql 13 May 2005 22:21:01 -0000
>>@@ -139,7 +139,22 @@
>> inside",2
>> \.
>>
>>+CREATE TABLE z (
>>+ col1 text,
>>+ col2 text
>>+);
>>
>>-DROP TABLE x, y;
>>+COPY z from stdin;
>>+Jackson, Sam \\h
>>+\x41\x42\x43\xa0\x1 \x5c\x5c\x9
>>+It is "perfect". \t
>>+ NULL
>>+\.
>>+
>>+COPY z TO stdout;
>>+COPY z TO stdout WITH CSV;
>>+COPY y TO stdout WITH XML;
>>+
>>+DROP TABLE x, y, z;
>> DROP FUNCTION fn_x_before();
>> DROP FUNCTION fn_x_after();
>>
>>
>>------------------------------------------------------------------------
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 6: Have you searched our list archives?
>>
>> http://archives.postgresql.org
>>
>>


From: Neil Conway <neilc(at)samurai(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Sergey Ten <sergey(at)sourcelabs(dot)com>, 'Christopher Kings-Lynne' <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, jason(at)sourcelabs(dot)com
Subject: Re: patches for items from TODO list
Date: 2005-05-21 01:01:53
Message-ID: 428E8881.4060505@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:
> We considered putting XML in psql or libpq in the past, but the problem
> is that interfaces like jdbc couldn't take advantage of it.

Well, you could implement it as a C UDF and use SPI. Or write it as a C
client library, and use JNI. Or just provide a Java implementation --
it's not like the COPY -> XML transformation is very complex.

To restate the case:

- I don't see how this feature is useful. Perhaps I'm mistaken, but I
don't think there's a lot of user demand for it (feel free to
demonstrate the contrary)

- The COPY -> XML transformation is trivial -- it would be easy for
clients to roll their own. At the same time, there is no standard or
canonical XML representation for COPY output, and I can easily imagine
different clients needing different representations. So there is limited
value in providing a single, inflexible backend implementation.

- There's no need for it to be in the backend, anyway. Perhaps if there
were overwhelming demand for this functionality, we would need to
provide it for all client libraries and the easiest solution would be to
put it in the backend, but I don't think that's the case.

If people really think XML COPY output mode is useful, why not implement
it client-side first and host it on pgfoundry? If lots of people are
using the client-side code, we can consider moving it into the core
distribution or the backend itself at that point.

-Neil


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, jason(at)sourcelabs(dot)com
Subject: Re: patches for items from TODO list
Date: 2005-05-21 02:30:51
Message-ID: 200505210230.j4L2Upj28087@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Neil Conway wrote:
> Bruce Momjian wrote:
> > We considered putting XML in psql or libpq in the past, but the problem
> > is that interfaces like jdbc couldn't take advantage of it.
>
> Well, you could implement it as a C UDF and use SPI. Or write it as a C
> client library, and use JNI. Or just provide a Java implementation --
> it's not like the COPY -> XML transformation is very complex.
>
> To restate the case:
>
> - I don't see how this feature is useful. Perhaps I'm mistaken, but I
> don't think there's a lot of user demand for it (feel free to
> demonstrate the contrary)
>
> - The COPY -> XML transformation is trivial -- it would be easy for
> clients to roll their own. At the same time, there is no standard or
> canonical XML representation for COPY output, and I can easily imagine
> different clients needing different representations. So there is limited
> value in providing a single, inflexible backend implementation.
>
> - There's no need for it to be in the backend, anyway. Perhaps if there
> were overwhelming demand for this functionality, we would need to
> provide it for all client libraries and the easiest solution would be to
> put it in the backend, but I don't think that's the case.
>
> If people really think XML COPY output mode is useful, why not implement
> it client-side first and host it on pgfoundry? If lots of people are
> using the client-side code, we can consider moving it into the core
> distribution or the backend itself at that point.

All I can say is that we rejected an XML in the client patch a long time
ago and the discussion was that it belongs in the backend so everyone
can use it. I don't use XML myself so I have no opinion. We need
people who need XML output to comment in this thread.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Undisclosed(dot)Recipients: ;
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: patches for items from TODO list
Date: 2005-05-21 02:41:44
Message-ID: 200505201941.44205.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Folks,

> - The COPY -> XML transformation is trivial -- it would be easy for
> clients to roll their own. At the same time, there is no standard or
> canonical XML representation for COPY output, and I can easily imagine
> different clients needing different representations. So there is limited
> value in providing a single, inflexible backend implementation.

I'm going to second Neil here. This feature becomes useful *only* when there
is a certified or de-facto universal standard XML representation for database
data. Then I could see a case for it. But there isn't.

Feel free to throw it on pgFoundry, though.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: patches for items from TODO list
Date: 2005-05-21 03:23:06
Message-ID: 5689.1116645786@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> I'm going to second Neil here.

I think the same --- given the points about lack of standardization,
it seems premature to put this into the backend. I'd be for it if
there were a clear standard, but ...

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: patches for items from TODO list
Date: 2005-05-21 04:39:59
Message-ID: 428EBB9F.1040807@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> I'm going to second Neil here. This feature becomes useful *only* when there
> is a certified or de-facto universal standard XML representation for database
> data. Then I could see a case for it. But there isn't.

We've done it in phpPgAdmin (we made up our own standard), and a couple
of people use it. I also do not think that it should be in the backend
until there is a standard. Here is what phpPgAdmin produces (note NULL
handling):

<?xml version="1.0" encoding="UTF-8" ?>
<data>
<header>
<column name="feature_id" type="varchar" />
<column name="feature_name" type="varchar" />
<column name="is_supported" type="varchar" />
<column name="is_verified_by" type="varchar" />
<column name="comments" type="varchar" />
</header>
<records>
<row>
<column name="feature_id">PKG000</column>
<column name="feature_name">Core</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG001</column>
<column name="feature_name">Enhanced datetime facilities</column>
<column name="is_supported">YES</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG002</column>
<column name="feature_name">Enhanced integrity management</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG003</column>
<column name="feature_name">OLAP facilities</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG004</column>
<column name="feature_name">PSM</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments">PL/pgSQL is similar.</column>
</row>
<row>
<column name="feature_id">PKG005</column>
<column name="feature_name">CLI</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments">ODBC is similar.</column>
</row>
<row>
<column name="feature_id">PKG006</column>
<column name="feature_name">Basic object support</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG007</column>
<column name="feature_name">Enhanced object support</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG008</column>
<column name="feature_name">Active database</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG010</column>
<column name="feature_name">OLAP</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments">NO</column>
</row>
</records>
</data>


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: patches for items from TODO list
Date: 2005-05-21 13:15:23
Message-ID: 428F346B.9080108@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


minor nit: the null attribute should take XMLSchema boolean type values:
{true, false, 1, 0}

More importantly, how do you handle array or record type fields? If they
are just opaque text I don't think that's what I at least would want
from XML output routines.

cheers

andrew

Christopher Kings-Lynne wrote:

>> I'm going to second Neil here. This feature becomes useful *only*
>> when there is a certified or de-facto universal standard XML
>> representation for database data. Then I could see a case for it.
>> But there isn't.
>
>
> We've done it in phpPgAdmin (we made up our own standard), and a
> couple of people use it. I also do not think that it should be in the
> backend until there is a standard. Here is what phpPgAdmin produces
> (note NULL handling):
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <data>
> <header>
> <column name="feature_id" type="varchar" />
> <column name="feature_name" type="varchar" />
> <column name="is_supported" type="varchar" />
> <column name="is_verified_by" type="varchar" />
> <column name="comments" type="varchar" />
> </header>
> <records>
> <row>
> <column name="feature_id">PKG000</column>
> <column name="feature_name">Core</column>
> <column name="is_supported">NO</column>
> <column name="is_verified_by" null="null"></column>
> <column name="comments" null="null"></column>
> </row>
> <row>
> <column name="feature_id">PKG001</column>
> <column name="feature_name">Enhanced datetime
> facilities</column>
> <column name="is_supported">YES</column>
> <column name="is_verified_by" null="null"></column>
> <column name="comments" null="null"></column>
> </row>
> <row>
> <column name="feature_id">PKG002</column>
> <column name="feature_name">Enhanced integrity
> management</column>
> <column name="is_supported">NO</column>
> <column name="is_verified_by" null="null"></column>
> <column name="comments" null="null"></column>
> </row>
> <row>
> <column name="feature_id">PKG003</column>
> <column name="feature_name">OLAP facilities</column>
> <column name="is_supported">NO</column>
> <column name="is_verified_by" null="null"></column>
> <column name="comments" null="null"></column>
> </row>
> <row>
> <column name="feature_id">PKG004</column>
> <column name="feature_name">PSM</column>
> <column name="is_supported">NO</column>
> <column name="is_verified_by" null="null"></column>
> <column name="comments">PL/pgSQL is similar.</column>
> </row>
> <row>
> <column name="feature_id">PKG005</column>
> <column name="feature_name">CLI</column>
> <column name="is_supported">NO</column>
> <column name="is_verified_by" null="null"></column>
> <column name="comments">ODBC is similar.</column>
> </row>
> <row>
> <column name="feature_id">PKG006</column>
> <column name="feature_name">Basic object support</column>
> <column name="is_supported">NO</column>
> <column name="is_verified_by" null="null"></column>
> <column name="comments" null="null"></column>
> </row>
> <row>
> <column name="feature_id">PKG007</column>
> <column name="feature_name">Enhanced object support</column>
> <column name="is_supported">NO</column>
> <column name="is_verified_by" null="null"></column>
> <column name="comments" null="null"></column>
> </row>
> <row>
> <column name="feature_id">PKG008</column>
> <column name="feature_name">Active database</column>
> <column name="is_supported">NO</column>
> <column name="is_verified_by" null="null"></column>
> <column name="comments" null="null"></column>
> </row>
> <row>
> <column name="feature_id">PKG010</column>
> <column name="feature_name">OLAP</column>
> <column name="is_supported">NO</column>
> <column name="is_verified_by" null="null"></column>
> <column name="comments">NO</column>
> </row>
> </records>
> </data>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Sergey Ten <sergey(at)sourcelabs(dot)com>
Cc: "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] patches for items from TODO list
Date: 2005-05-28 04:08:49
Message-ID: 200505280408.j4S48nM02458@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


I have extracted the COPY \x part of your patch, and have attached it
for later application to CVS for 8.1.

---------------------------------------------------------------------------

Sergey Ten wrote:
> Hello all,
>
> Thank you to all who replied for suggestions and help. Enclosed please find
> code changes for the following items:
> - Allow COPY to understand \x as a hex byte, and
> - Add XML output to COPY
> The changes include implementation of the features as well as modification
> of the copy regression test.
>
> After a careful consideration we decided to
> - put XML implementation in the backend and
> - use XML format described below, with justification of our decision.
>
> The XML schema used by the COPY TO command was designed for ease of use and
> to avoid the problem of column names appearing in XML element names.
> XML doesn't allow spaces and punctuation in element names but Postgres does
> allow these characters in column names; therefore, a direct mapping would be
> problematic.
>
> The solution selected places the column names into attribute fields where
> any special characters they contain can be properly escaped using XML
> entities. An additional attribute is used to distinguish null fields from
> empty ones.
>
> The example below is taken from the test suite. It demonstrates some basic
> XML escaping in row 2. Row 3 demonstrates the difference between an empty
> string (in col2) and a null string (in col3). If a field is null it will
> always be empty but a field which is empty may or may not be null.
> Always check the value of the 'null' attribute to be sure when a field is
> truly null.
>
> <?xml version='1.0'?>
> <table>
> <row>
> <col name='col1' null='n'>Jackson, Sam</col>
> <col name='col2' null='n'>\h</col>
> </row>
> <row>
> <col name='col1' null='n'>It is &quot;perfect&quot;.</col>
> <col name='col2' null='n'>&#09;</col>
> </row>
> <row>
> <col name='col1' null='n'></col>
> <col name='col2' null='y'></col>
> </row>
> </table>
>
> Please let us know if about any concerns, objections the proposed change may
> cause.
>
> Best regards,
> Jason Lucas, Sergey Ten
> SourceLabs
>
> > -----Original Message-----
> > From: Bruce Momjian [mailto:pgman(at)candle(dot)pha(dot)pa(dot)us]
> > Sent: Wednesday, May 11, 2005 7:11 PM
> > To: Sergey Ten
> > Cc: pgsql-hackers(at)postgresql(dot)org; jason(at)sourcelabs(dot)com
> > Subject: Re: [HACKERS] patches for items from TODO list
> >
> > Sergey Ten wrote:
> > > Hello all,
> > >
> > > We would like to contribute to the Postgresql community by implementing
> > > the following items from the TODO list
> > > (http://developer.postgresql.org/todo.php):
> > > . Allow COPY to understand \x as a hex byte . Allow COPY to optionally
> > > include column headings in the first line . Add XML output to COPY
> > >
> > > The changes are straightforward and include implementation of the
> > > features as well as modification of the regression tests and
> > documentation.
> > >
> > > Before sending a diff file with the changes, we would like to know if
> > > these features have been already implemented.
> >
> > Please check the web site version. Someone has already implemented
> > "Allow COPY to optionally include column headings in the first line".
> >
> > As far as XML, there has been discussion on where that should be done?
> > In the backend, libpq, or psql. It will need discussion on hackers. I
> > assume you have read the developer's FAQ too.
> >
> > --
> > Bruce Momjian | http://candle.pha.pa.us
> > pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> > + If your life is a hard drive, | 13 Roberts Road
> > + Christ can be your backup. | Newtown Square, Pennsylvania
> > 19073

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 3.0 KB

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: patches for items from TODO list
Date: 2005-05-28 04:12:44
Message-ID: 200505280412.j4S4CiT03045@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


I have removed the XML TODO item:

* Add XML output to pg_dump and COPY

We already allow XML to be stored in the database, and XPath queries
can be used on that data using /contrib/xml2. It also supports XSLT
transformations.

---------------------------------------------------------------------------

Josh Berkus wrote:
> Folks,
>
> > - The COPY -> XML transformation is trivial -- it would be easy for
> > clients to roll their own. At the same time, there is no standard or
> > canonical XML representation for COPY output, and I can easily imagine
> > different clients needing different representations. So there is limited
> > value in providing a single, inflexible backend implementation.
>
> I'm going to second Neil here. This feature becomes useful *only* when there
> is a certified or de-facto universal standard XML representation for database
> data. Then I could see a case for it. But there isn't.
>
> Feel free to throw it on pgFoundry, though.
>
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Sergey Ten <sergey(at)sourcelabs(dot)com>
Cc: "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] patches for items from TODO list
Date: 2005-05-28 17:03:52
Message-ID: 200505281703.j4SH3qk14807@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Here is an updated version of the COPY \x patch. It is the first patch
attached.

Also, I realized that if we support \x in COPY, we should also support
\x in strings to the backend. This is the second patch.

Third, I found out that psql has some unusual handling of escaped
numbers. Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
is octal, and \0xddd is decimal. It is basically following the strtol()
rules for an escaped value. This seems confusing and contradicts how
the rest of our system works. I looked at 'bash' and found that it
supports the \000 and \x00 just like C, so I am confused why we have
the current behavior. This patch makes psql consistent with the rest of
our system for back slashes. It does break backward compatibility. It
wouldn't be a big deal to fix, except we document this in the psql
manual page, and that adds confusion.

FYI, here is the current psql behavior:

test=> \set x '\42'
test=> \echo :x
*
test=> \set x '\042'
test=> \echo :x
"
test=> \set x '\0x42'
test=> \echo :x
B

The new behavior is:

test=> \set x '\42'
test=> \echo :x
"
test=> \set x '\042'
test=> \echo :x
"
test=> \set x '\x42'
test=> \echo :x
B

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 3.0 KB
unknown_filename text/plain 5.6 KB
unknown_filename text/plain 2.8 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] patches for items from TODO list
Date: 2005-05-28 19:04:46
Message-ID: 28122.1117307086@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Here is an updated version of the COPY \x patch. It is the first patch
> attached.
> Also, I realized that if we support \x in COPY, we should also support
> \x in strings to the backend. This is the second patch.

Do we really want to do any of these things? We've been getting beaten
up recently about the fact that we have non-SQL-spec string escapes
(ie, all the backslash stuff) so I'm a bit dubious about adding more,
especially when there's little if any demand for it.

I don't object too much to the COPY addition, since that's outside any
spec anyway, but I do think we ought to think twice about adding this
to SQL literal handling.

> Third, I found out that psql has some unusual handling of escaped
> numbers. Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
> is octal, and \0xddd is decimal. It is basically following the strtol()
> rules for an escaped value. This seems confusing and contradicts how
> the rest of our system works.

I agree, that's just going to confuse people.

> ! xqescape [\\][^0-7x]

If you are going to insist on this, at least make it case-insensitive.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Escape handling in COPY, strings, psql
Date: 2005-05-29 03:58:01
Message-ID: 200505290358.j4T3w1n25524@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Here is an updated version of the COPY \x patch. It is the first patch
> > attached.
> > Also, I realized that if we support \x in COPY, we should also support
> > \x in strings to the backend. This is the second patch.
>
> Do we really want to do any of these things? We've been getting beaten
> up recently about the fact that we have non-SQL-spec string escapes
> (ie, all the backslash stuff) so I'm a bit dubious about adding more,
> especially when there's little if any demand for it.

I thought about that, but adding additional escape letters isn't our
problem --- it is the escape mechanism itself that is the issue.

I have wanted to post on this issue so now is a good time. I think we
have been validly beaten up in that we pride ourselves on standards
compliance but have escape requirement on all strings. Our string
escapes are a major problem --- not the number of them but the
requirement to double backslashes on input, like 'C:\\tmp'. I am
thinking the only clean solution is to add a special keyword like ESCAPE
before strings that contain escape information. I think a GUC is too
general. You know if the string is a constant if it contains escapes
just by looking at it, and if it is a variable, hopefully you know if it
has escapes.

Basically, I think we have to deal with this somehow. I think it could
be implemented by looking for the ESCAPE keyword in parser/scan.l and
handling it all in there by ignoring backslash escapes if ESCAPE
preceeds the string. By the time you are in gram.y, it is too late.

> I don't object too much to the COPY addition, since that's outside any
> spec anyway, but I do think we ought to think twice about adding this
> to SQL literal handling.
>
> > Third, I found out that psql has some unusual handling of escaped
> > numbers. Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
> > is octal, and \0xddd is decimal. It is basically following the strtol()
> > rules for an escaped value. This seems confusing and contradicts how
> > the rest of our system works.
>
> I agree, that's just going to confuse people.
>
> > ! xqescape [\\][^0-7x]
>
> If you are going to insist on this, at least make it case-insensitive.

The submitted COPY patch also was case-insensitive, \x and \X, but I
changed that because we are case-sensitive for all backslashes in COPY,
and C is the same (\n and \N are different too, so we actually use the
case-sensitivity). Should we allow \X just so it is case-insensitive
like the SQL specification X'4f'? That is the only logic I can think of
for it to be case-insensitive, but we have to then do that at all
levels, and I am not sure it makes sense.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Michael Paesold <mpaesold(at)gmx(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sergey Ten <sergey(at)sourcelabs(dot)com>, 'Christopher Kings-Lynne' <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] patches for items from TODO list
Date: 2005-05-29 07:37:54
Message-ID: 42997152.3090703@gmx.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
>>Here is an updated version of the COPY \x patch. It is the first patch
>>attached.
>>Also, I realized that if we support \x in COPY, we should also support
>>\x in strings to the backend. This is the second patch.
>
>
> Do we really want to do any of these things? We've been getting beaten
> up recently about the fact that we have non-SQL-spec string escapes
> (ie, all the backslash stuff) so I'm a bit dubious about adding more,
> especially when there's little if any demand for it.
>
> I don't object too much to the COPY addition, since that's outside any
> spec anyway, but I do think we ought to think twice about adding this
> to SQL literal handling.

+1 from me on this for Tom -- please don't break spec compliance!

Best Regards,
Michael Paesold


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-29 11:36:54
Message-ID: 200505291336.55571.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:
> I thought about that, but adding additional escape letters isn't our
> problem --- it is the escape mechanism itself that is the issue.

In a random-encoding environment, the option to specify byte values
directly -- at any level -- is of limited value anyway and is a
potential source of errors. So let's stay away from that.

I did not find the original posts that your quotations came from, but it
has to be considered that COPY is also subject to encoding processing.
Overall, I find this proposal to be a dangerous option.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-29 13:30:06
Message-ID: 200505291330.j4TDU6C10539@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > Here is an updated version of the COPY \x patch. It is the first patch
> > > attached.
> > > Also, I realized that if we support \x in COPY, we should also support
> > > \x in strings to the backend. This is the second patch.
> >
> > Do we really want to do any of these things? We've been getting beaten
> > up recently about the fact that we have non-SQL-spec string escapes
> > (ie, all the backslash stuff) so I'm a bit dubious about adding more,
> > especially when there's little if any demand for it.
>
> I thought about that, but adding additional escape letters isn't our
> problem --- it is the escape mechanism itself that is the issue.
>
> I have wanted to post on this issue so now is a good time. I think we
> have been validly beaten up in that we pride ourselves on standards
> compliance but have escape requirement on all strings. Our string
> escapes are a major problem --- not the number of them but the
> requirement to double backslashes on input, like 'C:\\tmp'. I am
> thinking the only clean solution is to add a special keyword like ESCAPE
> before strings that contain escape information. I think a GUC is too
> general. You know if the string is a constant if it contains escapes
> just by looking at it, and if it is a variable, hopefully you know if it
> has escapes.
>
> Basically, I think we have to deal with this somehow. I think it could
> be implemented by looking for the ESCAPE keyword in parser/scan.l and
> handling it all in there by ignoring backslash escapes if ESCAPE
> preceeds the string. By the time you are in gram.y, it is too late.

One other idea would be to remove escape processing for single-quoted
strings but keep it for our $$ strings, becuase they are not ANSI
standard.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-29 15:15:49
Message-ID: 3945.1117379749@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> One other idea would be to remove escape processing for single-quoted
> strings but keep it for our $$ strings, becuase they are not ANSI
> standard.

There is *no* escape processing within $$, and never will be, because
that would destroy the entire point. You'd be right back having to
double backslashes.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-29 18:00:19
Message-ID: 200505292000.20723.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:
> > I am thinking the only clean solution is to add a special keyword
> > like ESCAPE before strings that contain escape information. I
> > think a GUC is too general. You know if the string is a constant
> > if it contains escapes just by looking at it, and if it is a
> > variable, hopefully you know if it has escapes.

I do support gradually phasing out backslash escapes in standard string
literals in the interest of portability. Most of the current escape
sequences are of limited value anyway. Let's think about ways to get
there:

Enabling escape sequences in string literals controls the formatting of
input (and output?) data, so it is akin to, say, the client encoding
and the date style, so a GUC variable isn't out of the question in my
mind. It makes most sense, though, if we want to eventually make users
switch it off all the time, that is, as a transition aid. But before
that can happen, we need to come up with an alternative mechanism to
enter weird characters.

One such way may be to provide functions (say, chr(), tab(), etc.) to
give access to unprintable characters, but that will result in terrible
performance for long strings and it also won't help with COPY or places
where only literals are allowed.

Another way would be to allow escape sequences only in specially marked
strings. The proposal above doing 'foo' ESCAPE 'x' seems fairly
elegant for SQL linguists but would be pretty weird to implement in the
lexer. It won't help with COPY either, but that is really the case for
all solutions.

A more compact represenation may be using a prefix letter, like E'foo'.
This fits the SQL syntax, is familiar with Python programmers (although
in the other direction), and can be implemented efficiently in the
lexer. I like that the best, personally.

For COPY, we would probably have to use a flag in the COPY command
itself either way (like already done for NULL AS).

Comments? Other ideas? Keep the escapes?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-29 19:10:32
Message-ID: 12092.1117393832@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> I do support gradually phasing out backslash escapes in standard string
> literals in the interest of portability. Most of the current escape
> sequences are of limited value anyway. Let's think about ways to get
> there:

I really don't think there is any way to get there without creating
gaping security holes in all kinds of client code :-(. If we change
the escaping rules, then a client that is expecting some other rule
than happens to be in force will be subject to trivial SQL-injection
attacks. This will make the autocommit fiasco pale by comparison ...

> For COPY, we would probably have to use a flag in the COPY command
> itself either way (like already done for NULL AS).

The spec-compatibility argument for removing escapes does not apply to
COPY at all, so I see no need to fool with the COPY definition in any
case.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-30 03:50:53
Message-ID: 200505300350.j4U3orF01875@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > > I am thinking the only clean solution is to add a special keyword
> > > like ESCAPE before strings that contain escape information. I
> > > think a GUC is too general. You know if the string is a constant
> > > if it contains escapes just by looking at it, and if it is a
> > > variable, hopefully you know if it has escapes.
>
> I do support gradually phasing out backslash escapes in standard string
> literals in the interest of portability. Most of the current escape
> sequences are of limited value anyway. Let's think about ways to get
> there:
>
> Enabling escape sequences in string literals controls the formatting of
> input (and output?) data, so it is akin to, say, the client encoding
> and the date style, so a GUC variable isn't out of the question in my
> mind. It makes most sense, though, if we want to eventually make users
> switch it off all the time, that is, as a transition aid. But before
> that can happen, we need to come up with an alternative mechanism to
> enter weird characters.
>
> One such way may be to provide functions (say, chr(), tab(), etc.) to
> give access to unprintable characters, but that will result in terrible
> performance for long strings and it also won't help with COPY or places
> where only literals are allowed.
>
> Another way would be to allow escape sequences only in specially marked
> strings. The proposal above doing 'foo' ESCAPE 'x' seems fairly
> elegant for SQL linguists but would be pretty weird to implement in the
> lexer. It won't help with COPY either, but that is really the case for
> all solutions.

I was suggesting ESCAPE 'string' or ESC 'string'. The marker has to be
before the string so scan.l can alter its processing of the string ---
after the string is too late --- there is no way to undo any escaping
that has happened, and it might already be used by gram.y.

I could probably hack up a sample implementation if people are
interested.

> A more compact representation may be using a prefix letter, like E'foo'.
> This fits the SQL syntax, is familiar with Python programmers (although
> in the other direction), and can be implemented efficiently in the
> lexer. I like that the best, personally.
>
> For COPY, we would probably have to use a flag in the COPY command
> itself either way (like already done for NULL AS).

I agree with Tom that COPY has to be left unchanged. The fundamental
problem is the representation of NULL values, that I don't think we can
do without some escape mechanism. Single-quote escapes works by
doubling them, but once you need to represent something more like
null's, I can't think of a solution without escapes.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-30 04:04:31
Message-ID: 200505300404.j4U44Vm13213@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > I do support gradually phasing out backslash escapes in standard string
> > literals in the interest of portability. Most of the current escape
> > sequences are of limited value anyway. Let's think about ways to get
> > there:
>
> I really don't think there is any way to get there without creating
> gaping security holes in all kinds of client code :-(. If we change
> the escaping rules, then a client that is expecting some other rule
> than happens to be in force will be subject to trivial SQL-injection
> attacks. This will make the autocommit fiasco pale by comparison ...

I looked at PQescapeString() and fortunately it escapes single quotes
by doing double-single quotes, not by using a backslash. This was
probably chosen for standards compliance.

Basically, I think our current behavior is not sustainable. I think we
are going to need to do something, and I think we should consider a
solution now rather than later. I don't think we can be as serious a
contender for portability without some kind of solution.

I am thinking we should first tell people in 8.1 that they should start
using only double-single quotes, and perhaps support the ESCAPE phrase
as a no-op, and then consider some kind of solution in 8.2 or later.

I don't think fixing this is going to be a huge security problem, but it
might be a small one. The good thing is that double-single quotes work,
so if people use only that for quote escaping, if you forget the ESCAPE
clause, you just get literal backslashes, not a security problem.

I ran the following test:

test=> select $$\$$;
?column?
----------
\
(1 row)

test=> create table test (x TEXT);
CREATE TABLE
test=> INSERT INTO test VALUES ($$\$$);
INSERT 0 1
test=> SELECT * FROM test;
x
---
\
(1 row)

and the good news is that output of backslashes is fine --- it is just
input that is the issue, and the security problem is only using \',
which we would have to tell people to avoid and start using only ''.

I think we can tell people in 8.1 that they should modify their
applications to only use '', and that \' might be a security problem in
the future. If we get to that then using ESC or not only affects input
of values and literal backslashes being entered, and my guess is that
90% of the backslash entries that want escaping are literal in the
application and not supplied by program variables. In fact, if we
disable backslash by default then strings coming in only have to deal
with single quotes (like other databases) and the system is more secure
because there is no special backslash handling by default.

> > For COPY, we would probably have to use a flag in the COPY command
> > itself either way (like already done for NULL AS).
>
> The spec-compatibility argument for removing escapes does not apply to
> COPY at all, so I see no need to fool with the COPY definition in any
> case.

Agreed.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Sergey Ten <sergey(at)sourcelabs(dot)com>, jason(at)sourcelabs(dot)com
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-30 04:10:46
Message-ID: 429A9246.7030902@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> I think we can tell people in 8.1 that they should modify their
> applications to only use '', and that \' might be a security problem in
> the future. If we get to that then using ESC or not only affects input
> of values and literal backslashes being entered, and my guess is that
> 90% of the backslash entries that want escaping are literal in the
> application and not supplied by program variables. In fact, if we
> disable backslash by default then strings coming in only have to deal
> with single quotes (like other databases) and the system is more secure
> because there is no special backslash handling by default.

I can tell you right now this will be a problem :) There are loads of
PHP ppl who use addslashes() instead of pg_escape_string() to escape data.

Chris


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-30 09:26:44
Message-ID: 200505301126.45046.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:
> I was suggesting ESCAPE 'string' or ESC 'string'. The marker has to
> be before the string so scan.l can alter its processing of the string
> --- after the string is too late --- there is no way to undo any
> escaping that has happened, and it might already be used by gram.y.

That pretty much corresponds to my E'string' proposal. Both are
probably equally trivial to implement.

> I agree with Tom that COPY has to be left unchanged. The fundamental
> problem is the representation of NULL values, that I don't think we
> can do without some escape mechanism. Single-quote escapes works by
> doubling them, but once you need to represent something more like
> null's, I can't think of a solution without escapes.

Yes, I now realize that COPY has a whole set of different rules anyway,
so we can leave that out of this discussion.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Sergey Ten <sergey(at)sourcelabs(dot)com>, jason(at)sourcelabs(dot)com
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-30 14:12:19
Message-ID: 200505301412.j4UECJ207458@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Christopher Kings-Lynne wrote:
> > I think we can tell people in 8.1 that they should modify their
> > applications to only use '', and that \' might be a security problem in
> > the future. If we get to that then using ESC or not only affects input
> > of values and literal backslashes being entered, and my guess is that
> > 90% of the backslash entries that want escaping are literal in the
> > application and not supplied by program variables. In fact, if we
> > disable backslash by default then strings coming in only have to deal
> > with single quotes (like other databases) and the system is more secure
> > because there is no special backslash handling by default.
>
> I can tell you right now this will be a problem :) There are loads of
> PHP ppl who use addslashes() instead of pg_escape_string() to escape data.

I read the PHP addslashes() manual page:

http://us3.php.net/addslashes

First, I see what people mean about PHP having most of the complex
content in comments, rather than in the actual manual text, and this
tendency is certainly something we want to avoid --- you end up having
to digest all the comments to find the details that should be in the
manual already.

On to the case at hand, the comments mention that addslashes() isn't
safe for all databases, and in fact isn't the prefered method. I do
think it could be a problem we have to have people avoid. One idea for
8.1 is to throw a warning if \' appears in a string, thereby helping
people find the places they are using the incorrect non-standard
escaping.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Sergey Ten <sergey(at)sourcelabs(dot)com>, jason(at)sourcelabs(dot)com
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-30 14:18:08
Message-ID: 429B20A0.5020707@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> I read the PHP addslashes() manual page:
>
> http://us3.php.net/addslashes
>
> First, I see what people mean about PHP having most of the complex
> content in comments, rather than in the actual manual text, and this
> tendency is certainly something we want to avoid --- you end up having
> to digest all the comments to find the details that should be in the
> manual already.

Actually, all the comments are posted on the php-doc list, with
automatic urls in them for 'fixed in cvs', 'rejected', etc.

Each comment is supposed to be acted upon (ie. fixed in source), then
deleted.

There's still a lot of old comment around that hasn't had that treatment
though...

Chris


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Sergey Ten <sergey(at)sourcelabs(dot)com>, jason(at)sourcelabs(dot)com
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-30 14:19:20
Message-ID: 200505301419.j4UEJKc08863@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Christopher Kings-Lynne wrote:
> > I read the PHP addslashes() manual page:
> >
> > http://us3.php.net/addslashes
> >
> > First, I see what people mean about PHP having most of the complex
> > content in comments, rather than in the actual manual text, and this
> > tendency is certainly something we want to avoid --- you end up having
> > to digest all the comments to find the details that should be in the
> > manual already.
>
> Actually, all the comments are posted on the php-doc list, with
> automatic urls in them for 'fixed in cvs', 'rejected', etc.
>
> Each comment is supposed to be acted upon (ie. fixed in source), then
> deleted.
>
> There's still a lot of old comment around that hasn't had that treatment
> though...

Right, they are more _usage_ comments, but still I think they could be
consolidated into manual text.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-30 14:28:17
Message-ID: 200505301428.j4UESH210277@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > I was suggesting ESCAPE 'string' or ESC 'string'. The marker has to
> > be before the string so scan.l can alter its processing of the string
> > --- after the string is too late --- there is no way to undo any
> > escaping that has happened, and it might already be used by gram.y.
>
> That pretty much corresponds to my E'string' proposal. Both are
> probably equally trivial to implement.

Right. I think your E'' idea has the benefit of fitting with our
existing X'' and B'' modifiers. It is also simpler and cleaner to do in
scan.l, so I think your idea is best.

> > I agree with Tom that COPY has to be left unchanged. The fundamental
> > problem is the representation of NULL values, that I don't think we
> > can do without some escape mechanism. Single-quote escapes works by
> > doubling them, but once you need to represent something more like
> > null's, I can't think of a solution without escapes.
>
> Yes, I now realize that COPY has a whole set of different rules anyway,
> so we can leave that out of this discussion.

Cool.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] patches for items from TODO list
Date: 2005-05-30 14:51:39
Message-ID: 200505301451.j4UEpdi25322@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> > Third, I found out that psql has some unusual handling of escaped
> > numbers. Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
> > is octal, and \0xddd is decimal. It is basically following the strtol()
> > rules for an escaped value. This seems confusing and contradicts how
> > the rest of our system works.
>
> I agree, that's just going to confuse people.

Fixed and applied, with no hex addition.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 1.4 KB

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Sergey Ten <sergey(at)sourcelabs(dot)com>, jason(at)sourcelabs(dot)com
Subject: Re: Escape handling in COPY, strings, psql
Date: 2005-05-30 15:16:48
Message-ID: 874qckyc9r.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:

> Christopher Kings-Lynne wrote:
> >
> > Each comment is supposed to be acted upon (ie. fixed in source), then
> > deleted.
>
> Right, they are more _usage_ comments, but still I think they could be
> consolidated into manual text.

If that's "supposed" to happen it certainly hasn't been the de facto
procedure.

I think they have things partly right here though. A lot of those comments
aren't actually the kinds of things that belong in the canonical reference.
They include things like "watch out for this common error" or "here's a handy
use for this function". Often the "common error" or "handy use" are pretty
bogus but every now and then there's a genuinely useful one.

These kinds of things would just clutter up a reference. A reference should
just state unambiguously this function does XYZ and give examples that help
explain XYZ.

The PHP Docs do have a bit of a problem in that often the comments include
things like "In case X, what happens is Y" which really ought to be covered by
the canonical reference. That's a problem.

--
greg


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Adding \x escape processing to COPY, psql, backend
Date: 2005-05-30 19:33:38
Message-ID: 200505301933.j4UJXcK17113@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:
> Here is an updated version of the COPY \x patch. It is the first patch
> attached.
>
> Also, I realized that if we support \x in COPY, we should also support
> \x in strings to the backend. This is the second patch.

Here is a new version of the three \x hex support patches. I have added
\x for psql variables, which is the last patch.

I have IM'ed with Peter and he is now OK with the idea of supporting \x,
with the underestanding that it doesn't take us any farther away from
compatibility than we are now.

I have already fixed the psql octal/decimal/hex behavior in another
patch.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 3.0 KB
unknown_filename text/plain 3.3 KB
unknown_filename text/plain 2.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Adding \x escape processing to COPY, psql, backend
Date: 2005-05-30 19:41:19
Message-ID: 26407.1117482079@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Here is a new version of the three \x hex support patches. I have added
> \x for psql variables, which is the last patch.

> I have IM'ed with Peter and he is now OK with the idea of supporting \x,
> with the underestanding that it doesn't take us any farther away from
> compatibility than we are now.

Peter may be OK with it, but I object strongly to adding this to SQL
literals. This is exactly *not* the direction we want to be going in.

I don't really see the point for COPY and psql, either.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Backslash handling in strings
Date: 2005-05-30 20:11:08
Message-ID: 200505302011.j4UKB8R23097@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:
> Peter Eisentraut wrote:
> > Bruce Momjian wrote:
> > > I was suggesting ESCAPE 'string' or ESC 'string'. The marker has to
> > > be before the string so scan.l can alter its processing of the string
> > > --- after the string is too late --- there is no way to undo any
> > > escaping that has happened, and it might already be used by gram.y.
> >
> > That pretty much corresponds to my E'string' proposal. Both are
> > probably equally trivial to implement.
>
> Right. I think your E'' idea has the benefit of fitting with our
> existing X'' and B'' modifiers. It is also simpler and cleaner to do in
> scan.l, so I think your idea is best.

[ CC list trimmed.]

OK, I talked to Tom and Peter and I have come up with a tentative plan.

The goal, at some point, is that we would have two types of strings, ''
strings and E'' strings. '' strings don't have any special backslash
handling for compatibility with with the ANSI spec and all other
databases except MySQL (and in MySQL it is now optional). E'' strings
behave just like our strings do now, with backslash handling.

In 8.0.X, we add support for E'' strings, but it is a noop. This is
done just for portability with future releases. We also state that
users should stop using \' to escape quotes in strings, and instead use
'', and that we will throw a warning in 8.1 if we see a \' in a non-E
string. (We could probably throw a warning for E'' use of \' too, but I
want to give users the ability to avoid the warning if they can't change
from using \' to ''.)

In 8.1, we start issuing the warning for \' in non-E strings, plus we
tell users who want escape processing that they will have to use E''
strings for this starting in release 8.2, and they should start
migrating their escaped strings over to E''.

Tom also suggested a readonly GUC variable that is sent to clients that
indicates if simple strings have backslash handling, for use by
applications that are doing escapes themselves, perhaps
'escape_all_strings'.

PQescapeString() and PQescapeBytea() can still be used, but only with
E'' strings in 8.2. We could create PQquoteString() for 8.1 and later
to allow for just single-quote doubling for non-E strings.

Tom asked about how to handle pg_dump contents that have strings, like
function bodies. We could start using E'' for those in 8.0 but it does
break backward movement of dumps, and someone upgrading from 7.1 to 8.2
would be in trouble. :-( Perhaps we will have another round of
subrelease fixes and we can bundle this into that and tell people they
have to upgrade to the newest subrelease before going to 8.2. I think
we have had that requirement in the past when we had broken pg_dump
processing.

The good news is that once everyone uses only '' to quote string, we
will not have any data security issues with this change. The only
potential problem is the mishandling of backslash characters if there is
a mismatch between what the client expects and the server uses. By
backpatching E'' perhaps even to 7.4 and earlier (as a noop), we could
minimize this problem.

Is this whole thing ugly? Yes. Can we just close our eyes and hope we
can continue with our current behavior while growing a larger userbase
--- probabably not.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [PATCHES] Adding \x escape processing to COPY, psql, backend
Date: 2005-05-30 20:15:46
Message-ID: 200505302015.j4UKFke23797@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Here is a new version of the three \x hex support patches. I have added
> > \x for psql variables, which is the last patch.
>
> > I have IM'ed with Peter and he is now OK with the idea of supporting \x,
> > with the underestanding that it doesn't take us any farther away from
> > compatibility than we are now.
>
> Peter may be OK with it, but I object strongly to adding this to SQL
> literals. This is exactly *not* the direction we want to be going in.
>
> I don't really see the point for COPY and psql, either.

We already support \n, \r, \t, and \octal. I don't see any problem with
improving it. It does not take us any closer or farther away from spec
compliance.

COPY \x has been requested by several people, and there are actually two
patches that have been submitted in the past year for this.

As you know, escapes already provide a useful mechanism on COPY and SQL
strings, and there is a plan I just posted to deal with standards
issues, but I don't see \x taking us closer or farther from this.

Please explain why this takes us in the wrong direction.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Backslash handling in strings
Date: 2005-05-31 04:14:59
Message-ID: 87sm04vxoc.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:

> The goal, at some point, is that we would have two types of strings, ''
> strings and E'' strings. '' strings don't have any special backslash
> handling for compatibility with with the ANSI spec and all other
> databases except MySQL (and in MySQL it is now optional). E'' strings
> behave just like our strings do now, with backslash handling.

The only thing I'm not clear on is what exactly is the use case for E''
strings. That is, who do you expect to actually use them?

Any new applications are recommended to be using '' strings. And any existing
applications obviously won't be using them since they don't currently exist.

The only potential candidates are existing applications being ported forward.
And that only makes sense if they're currently using some function like
addslash. Is it really easier to change all the SQL queries to use E'' (and
still have a bug) than it is to replace addslash with PQquoteString() ?

Also, I'm really confused why you would make PQescapeString require E''
strings and introduce a new function. That means existing non-buggy
applications would suddenly be buggy? And it would be impossible to write a
properly functioning application that interpolates a constant into a query
that would be portable to 8.2 and 8.0?

--
greg


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Backslash handling in strings
Date: 2005-05-31 04:20:19
Message-ID: 200505310420.j4V4KJf23222@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Greg Stark wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
> > The goal, at some point, is that we would have two types of strings, ''
> > strings and E'' strings. '' strings don't have any special backslash
> > handling for compatibility with with the ANSI spec and all other
> > databases except MySQL (and in MySQL it is now optional). E'' strings
> > behave just like our strings do now, with backslash handling.
>
>
> The only thing I'm not clear on is what exactly is the use case for E''
> strings. That is, who do you expect to actually use them?
>
> Any new applications are recommended to be using '' strings. And any existing
> applications obviously won't be using them since they don't currently exist.

We are saying to use '' to escape single quotes in all strings. E'' is
still useful if you want to use backslash escapes in your strings.

Does that answer your questions?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Backslash handling in strings
Date: 2005-05-31 05:25:49
Message-ID: 8400.1117517149@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Greg Stark <gsstark(at)mit(dot)edu> writes:
> The only thing I'm not clear on is what exactly is the use case for E''
> strings. That is, who do you expect to actually use them?

The case that convinced me we need to keep some sort of backslash
capability is this: suppose you want to put a string including a tab
into your database. Try to do it with psql:
t=> insert into foo values ('<TAB>
Guess what: you won't get anywhere, at least not unless you disable
readline. So it's nice to be able to use \t.

There are related issues involving \r and \n depending on your platform.
And this doesn't even scratch the surface of encoding-related funnies.

So there's definitely a use-case for keeping the existing backslash
behavior, and E'string' seems like a reasonable proposal for doing that
without conflicting with the SQL spec.

What I do not see at the moment is how we get there from here (ie,
dropping backslashing in regular literals) without incurring tremendous
pain --- including breaking all existing pg_dump files, introducing
security holes and/or data corruption into many existing apps that are
not presently broken, and probably some other ways of ruining your day.
I'm quite unconvinced that this particular letter of the SQL spec is
worth complying with ...

regards, tom lane


From: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Backslash handling in strings
Date: 2005-05-31 09:49:20
Message-ID: Pine.LNX.4.44.0505311145410.7072-100000@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, 31 May 2005, Tom Lane wrote:

> The case that convinced me we need to keep some sort of backslash
> capability is this: suppose you want to put a string including a tab
> into your database. Try to do it with psql:
> t=> insert into foo values ('<TAB>
> Guess what: you won't get anywhere, at least not unless you disable
> readline. So it's nice to be able to use \t.

To insert a tab using readline you can press ESC followed by TAB. This
works as least in readline as it is setup in redhat/fedora (and readline
can be setup in 1000 different ways so who knows how portable this is).

--
/Dennis Björklund


From: Tom Ivar Helbekkmo <tih(at)eunetnorge(dot)no>
To: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Backslash handling in strings
Date: 2005-05-31 10:45:31
Message-ID: 86vf4zfzck.fsf@athene.hamartun.priv.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> writes:

> To insert a tab using readline you can press ESC followed by TAB.

...or ^V followed by TAB, as per age-old tradition. :-)

-tih
--
Don't ascribe to stupidity what can be adequately explained by ignorance.


From: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To: Tom Ivar Helbekkmo <tih(at)eunetnorge(dot)no>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Backslash handling in strings
Date: 2005-05-31 10:52:17
Message-ID: Pine.LNX.4.44.0505311247000.7072-100000@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, 31 May 2005, Tom Ivar Helbekkmo wrote:

> ...or ^V followed by TAB, as per age-old tradition. :-)

Right, I forgot about that one. One can also do other control characters
instead of TAB by pressing CTRL-J and similar.

Well, I just wanted to point out that it's possible. The main problem is
still to make sure that old dumps work and can be imported. I don't see
how that can work without a GUC variable in addition to the E'foo' stuff
(but that's not so bad as it can be phased in to support old pg_dumps and
phased out again in pg 10 or something).

--
/Dennis Björklund


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Backslash handling in strings
Date: 2005-05-31 14:54:25
Message-ID: 200505311454.j4VEsPm28816@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > The only thing I'm not clear on is what exactly is the use case for E''
> > strings. That is, who do you expect to actually use them?
>
> The case that convinced me we need to keep some sort of backslash
> capability is this: suppose you want to put a string including a tab
> into your database. Try to do it with psql:
> t=> insert into foo values ('<TAB>
> Guess what: you won't get anywhere, at least not unless you disable
> readline. So it's nice to be able to use \t.
>
> There are related issues involving \r and \n depending on your platform.
> And this doesn't even scratch the surface of encoding-related funnies.
>
> So there's definitely a use-case for keeping the existing backslash
> behavior, and E'string' seems like a reasonable proposal for doing that
> without conflicting with the SQL spec.
>
> What I do not see at the moment is how we get there from here (ie,
> dropping backslashing in regular literals) without incurring tremendous
> pain --- including breaking all existing pg_dump files, introducing
> security holes and/or data corruption into many existing apps that are
> not presently broken, and probably some other ways of ruining your day.
> I'm quite unconvinced that this particular letter of the SQL spec is
> worth complying with ...

I think this is going to be like the Win32 port, where there is little
excitement from our existing users, but it is needed to grow our user
base.

I think the E'' is useful becuase it gives people a migration path for
the escapes they are already using, and the escape mechanism itself it
something useful to keep.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Backslash handling in strings
Date: 2005-05-31 15:09:43
Message-ID: 20050531150943.GA31259@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, May 31, 2005 at 11:49:20 +0200,
Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> wrote:
> On Tue, 31 May 2005, Tom Lane wrote:
>
> > The case that convinced me we need to keep some sort of backslash
> > capability is this: suppose you want to put a string including a tab
> > into your database. Try to do it with psql:
> > t=> insert into foo values ('<TAB>
> > Guess what: you won't get anywhere, at least not unless you disable
> > readline. So it's nice to be able to use \t.
>
> To insert a tab using readline you can press ESC followed by TAB. This
> works as least in readline as it is setup in redhat/fedora (and readline
> can be setup in 1000 different ways so who knows how portable this is).

There are still advantages to having printable backslashed escaped characters
in strings that are saved to files. It makes it easier to see what is really
in the string and they are less likely to get accidentally munged when
editing the file or moving it between systems with different line termination
conventions.


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Ten <sergey(at)sourcelabs(dot)com>, "'Christopher Kings-Lynne'" <chriskl(at)familyhealth(dot)com(dot)au>, jason(at)sourcelabs(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Adding \x escape processing to COPY, psql, backend
Date: 2005-06-02 01:24:11
Message-ID: 200506020124.j521OBj07023@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Patch applied. Thanks for the COPY \x patch.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > Here is a new version of the three \x hex support patches. I have added
> > > \x for psql variables, which is the last patch.
> >
> > > I have IM'ed with Peter and he is now OK with the idea of supporting \x,
> > > with the underestanding that it doesn't take us any farther away from
> > > compatibility than we are now.
> >
> > Peter may be OK with it, but I object strongly to adding this to SQL
> > literals. This is exactly *not* the direction we want to be going in.
> >
> > I don't really see the point for COPY and psql, either.
>
> We already support \n, \r, \t, and \octal. I don't see any problem with
> improving it. It does not take us any closer or farther away from spec
> compliance.
>
> COPY \x has been requested by several people, and there are actually two
> patches that have been submitted in the past year for this.
>
> As you know, escapes already provide a useful mechanism on COPY and SQL
> strings, and there is a plan I just posted to deal with standards
> issues, but I don't see \x taking us closer or farther from this.
>
> Please explain why this takes us in the wrong direction.
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Bruno Wolff III <bruno(at)wolff(dot)to>
Cc: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Backslash handling in strings
Date: 2005-06-02 03:54:08
Message-ID: 200506020354.j523s8r05372@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Here is a summary of the issues with moving to no escapes for non-E
strings:

http://candle.pha.pa.us/cgi-bin/pgescape

---------------------------------------------------------------------------

Bruno Wolff III wrote:
> On Tue, May 31, 2005 at 11:49:20 +0200,
> Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> wrote:
> > On Tue, 31 May 2005, Tom Lane wrote:
> >
> > > The case that convinced me we need to keep some sort of backslash
> > > capability is this: suppose you want to put a string including a tab
> > > into your database. Try to do it with psql:
> > > t=> insert into foo values ('<TAB>
> > > Guess what: you won't get anywhere, at least not unless you disable
> > > readline. So it's nice to be able to use \t.
> >
> > To insert a tab using readline you can press ESC followed by TAB. This
> > works as least in readline as it is setup in redhat/fedora (and readline
> > can be setup in 1000 different ways so who knows how portable this is).
>
> There are still advantages to having printable backslashed escaped characters
> in strings that are saved to files. It makes it easier to see what is really
> in the string and they are less likely to get accidentally munged when
> editing the file or moving it between systems with different line termination
> conventions.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073