Re: [PATCHES] WITH DELIMITERS in COPY

Lists: pgsql-hackerspgsql-patches
From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Bill Studenmund <wrstuden(at)netbsd(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in
Date: 2001-12-04 18:31:13
Message-ID: 3C0D1671.2030502@tm.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bill Studenmund wrote:

>>
>>But as long as COPY IN considers that delimiter spec to mean "any one of
>>these characters", and not a multicharacter string, we couldn't do that.
>>
>>If we restrict DELIMITERS strings to be exactly one character for a
>>release or three, we could think about implementing this idea of
>>multicharacter delimiter strings later on. Not sure if anyone really
>>needs it though.
>>
\r\n is quite popular (row) delimiter on some systems (and causes
sometimes a weird box
char to appear at the end of last database field :), but I doubt I can
give any examples
of multichar field delimiters

>> In any case, the current behavior is inconsistent.
>>
>
>I think this restriction sounds fine, and quite practical. :-)
>
I sincerely doubt that anyone knowingly :) uses this undocumented
feature for copy in,
as it can be found out only by trial and error.

Much better to remove it, enforce it in code as Bruce suggested, and
document it.

------------------
Hannu


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Undocumented feature costs a lot of performance in COPY IN
Date: 2001-12-04 19:49:05
Message-ID: 2841.1007495345@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I have been fooling around profiling various ways of inserting wide
(8000-byte, not all that wide) bytea fields, per Brent Verner's note
of a few days ago. COPY IN should be, and is, the fastest way to
do it. But I was rather startled to discover that 25% of the runtime
of COPY IN went to an inefficient way of fetching single bytes from
pqcomm.c (pq_getbytes(&ch, 1) instead of ch = pq_getbyte()), and
20% of what's left after fixing that is going into the strchr() call
in CopyReadAttribute.

Now the point of that strchr() call is to detect whether the current
character is the column delimiter. The COPY reference page clearly
says:

By default, a text copy uses a tab ("\t") character as a
delimiter between fields. The field delimiter may be changed to
any other single character with the keyword phrase USING
DELIMITERS. Characters in data fields which happen to match the
delimiter character will be backslash quoted. Note that the
delimiter is always a single character. If multiple characters
are specified in the delimiter string, only the first character
is used.

and indeed, only the first character is used by COPY OUT. But COPY IN
is presently coded so that if multiple characters are mentioned in
USING DELIMITERS, any one of them will be taken as a field delimiter.

I would like to change the code to just "if (c == delim[0])",
which should buy back most of that 20% and make the behavior match the
documentation. Question for the list: is this a bad change? Is anyone
out there actually using this undocumented behavior?

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in COPY
Date: 2001-12-04 20:07:01
Message-ID: 200112042007.fB4K72327600@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> I would like to change the code to just "if (c == delim[0])",
> which should buy back most of that 20% and make the behavior match the
> documentation. Question for the list: is this a bad change? Is anyone
> out there actually using this undocumented behavior?

Yes, please fix it. In fact, I think we should throw an error if more
than one character is specified as a delimiter. Saying we ignore
multiple characters in the documentation is not enough when we silently
ignore them in the code.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Bill Studenmund <wrstuden(at)netbsd(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Undocumented feature costs a lot of performance in
Date: 2001-12-04 20:12:19
Message-ID: Pine.NEB.4.33.0112041208450.1693-100000@vespasia.home-net.internetconnect.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, 4 Dec 2001, Tom Lane wrote:

> By default, a text copy uses a tab ("\t") character as a
> delimiter between fields. The field delimiter may be changed to
> any other single character with the keyword phrase USING
> DELIMITERS. Characters in data fields which happen to match the
> delimiter character will be backslash quoted. Note that the
> delimiter is always a single character. If multiple characters
> are specified in the delimiter string, only the first character
> is used.
>
> and indeed, only the first character is used by COPY OUT. But COPY IN
> is presently coded so that if multiple characters are mentioned in
> USING DELIMITERS, any one of them will be taken as a field delimiter.
>
> I would like to change the code to just "if (c == delim[0])",
> which should buy back most of that 20% and make the behavior match the
> documentation. Question for the list: is this a bad change? Is anyone
> out there actually using this undocumented behavior?

I think you should make the change. Because, as I understand it, when you
give multiple delimiter characters COPY OUT will not delimit characters
other than the first, since they won't be treated special. But COPY IN
will treat them special; you will read in more columns than you output.
Thus as it is, you can't COPY IN something you COPY OUT'd.

One alternative would be to make the code use different paths for the
just-one and many delimiter cases. But then COPY OUT would need fixing.

Take care,

Bill


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in COPY IN
Date: 2001-12-04 20:14:58
Message-ID: 2993.1007496898@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Yes, please fix it. In fact, I think we should throw an error if more
> than one character is specified as a delimiter. Saying we ignore
> multiple characters in the documentation is not enough when we silently
> ignore them in the code.

Well, it'd be an easy enough addition:

if (strlen(delim) != 1)
elog(ERROR, "COPY delimiter must be a single character");

This isn't multibyte-aware, but then neither is the implementation;
delimiters that are multibyte characters won't work at the moment.

Any comments out there?

regards, tom lane


From: Doug McNaught <doug(at)wireboard(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in COPY IN
Date: 2001-12-04 20:19:27
Message-ID: m3elmad9gw.fsf@belphigor.mcnaught.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> and indeed, only the first character is used by COPY OUT. But COPY IN
> is presently coded so that if multiple characters are mentioned in
> USING DELIMITERS, any one of them will be taken as a field delimiter.
>
> I would like to change the code to just "if (c == delim[0])",
> which should buy back most of that 20% and make the behavior match the
> documentation. Question for the list: is this a bad change? Is anyone
> out there actually using this undocumented behavior?

Not I.

As an utter nitpick, the syntax should IMHO be USING DELIMITER (no S)
if there is only one possible delimiter character. But that *would*
break lots of apps so I don't advocate it. ;)

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in COPY
Date: 2001-12-04 20:20:52
Message-ID: 200112042020.fB4KKqI28979@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Yes, please fix it. In fact, I think we should throw an error if more
> > than one character is specified as a delimiter. Saying we ignore
> > multiple characters in the documentation is not enough when we silently
> > ignore them in the code.
>
> Well, it'd be an easy enough addition:
>
> if (strlen(delim) != 1)
> elog(ERROR, "COPY delimiter must be a single character");
>
> This isn't multibyte-aware, but then neither is the implementation;
> delimiters that are multibyte characters won't work at the moment.

My point was that the documentation was saying it could only be one
character, and that we would ignore any characters after the first one,
but there was no enforcement in the code.

The right way to do it is to just say in the documentation it has to be
one character, and throw an error in the code if it isn't.

Limitations should be enforced in the code, if possible, not just
mentioned in the documenation, which may or may not get read.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bill Studenmund <wrstuden(at)netbsd(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in COPY IN
Date: 2001-12-04 20:22:58
Message-ID: 3040.1007497378@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bill Studenmund <wrstuden(at)netbsd(dot)org> writes:
> One alternative would be to make the code use different paths for the
> just-one and many delimiter cases. But then COPY OUT would need fixing.

Well, it's not clear what COPY OUT should *do* with multiple
alternatives, anyway. Pick one at random? I guess it does that now,
if you consider "always use the first one" as a random choice. The
real problem is that it will only backslash the first one, too. That
means that data emitted with DELIMITERS "|_=", say, will fail to be
reloaded correctly if that same DELIMITERS string is given to COPY IN
--- because any _ or = characters in the data won't be backslashed,
but would need to be to keep COPY IN from treating them as delimiters.

For COPY OUT's purposes, a sensible interpretation of a multicharacter
delimiter string would be that the whole string is emitted as the
delimiter. Eg,

COPY OUT WITH DELIMITERS "<TAB>";

foo<TAB>bar<TAB>baz
...

But as long as COPY IN considers that delimiter spec to mean "any one of
these characters", and not a multicharacter string, we couldn't do that.

If we restrict DELIMITERS strings to be exactly one character for a
release or three, we could think about implementing this idea of
multicharacter delimiter strings later on. Not sure if anyone really
needs it though. In any case, the current behavior is inconsistent.

regards, tom lane


From: Bill Studenmund <wrstuden(at)netbsd(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Undocumented feature costs a lot of performance in
Date: 2001-12-04 20:31:47
Message-ID: Pine.NEB.4.33.0112041230020.1693-100000@vespasia.home-net.internetconnect.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, 4 Dec 2001, Tom Lane wrote:

> Bill Studenmund <wrstuden(at)netbsd(dot)org> writes:
> > One alternative would be to make the code use different paths for the
> > just-one and many delimiter cases. But then COPY OUT would need fixing.
>
> Well, it's not clear what COPY OUT should *do* with multiple
> alternatives, anyway. Pick one at random? I guess it does that now,
> if you consider "always use the first one" as a random choice. The

I think that'd be fine.

> real problem is that it will only backslash the first one, too. That

Ick. I was thinking that if you gave multiple delimiters, it would escape
each one. Which would be slow, and is why I think seperate code paths
would be good. :-)

> means that data emitted with DELIMITERS "|_=", say, will fail to be
> reloaded correctly if that same DELIMITERS string is given to COPY IN
> --- because any _ or = characters in the data won't be backslashed,
> but would need to be to keep COPY IN from treating them as delimiters.
>
> For COPY OUT's purposes, a sensible interpretation of a multicharacter
> delimiter string would be that the whole string is emitted as the
> delimiter. Eg,
>
> COPY OUT WITH DELIMITERS "<TAB>";
>
> foo<TAB>bar<TAB>baz
> ...
>
> But as long as COPY IN considers that delimiter spec to mean "any one of
> these characters", and not a multicharacter string, we couldn't do that.
>
> If we restrict DELIMITERS strings to be exactly one character for a
> release or three, we could think about implementing this idea of
> multicharacter delimiter strings later on. Not sure if anyone really
> needs it though. In any case, the current behavior is inconsistent.

I think this restriction sounds fine, and quite practical. :-)

Take care,

Bill


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Doug McNaught <doug(at)wireboard(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in COPY
Date: 2001-12-04 21:18:34
Message-ID: 200112042118.fB4LIYL07607@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > and indeed, only the first character is used by COPY OUT. But COPY IN
> > is presently coded so that if multiple characters are mentioned in
> > USING DELIMITERS, any one of them will be taken as a field delimiter.
> >
> > I would like to change the code to just "if (c == delim[0])",
> > which should buy back most of that 20% and make the behavior match the
> > documentation. Question for the list: is this a bad change? Is anyone
> > out there actually using this undocumented behavior?
>
> Not I.
>
> As an utter nitpick, the syntax should IMHO be USING DELIMITER (no S)
> if there is only one possible delimiter character. But that *would*
> break lots of apps so I don't advocate it. ;)

We could support keywords DELIMITER and DELIMITERS and only document the first
one.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Doug McNaught <doug(at)wireboard(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in COPY IN
Date: 2001-12-04 21:22:35
Message-ID: 5816.1007500955@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> We could support keywords DELIMITER and DELIMITERS and only document
> the first one.

One could also argue that it should be WITH DELIMITER for more
consistency with the other optional clauses.

But let's put that in the TODO list, not try to get it done now...

regards, tom lane


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgman(at)candle(dot)pha(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in
Date: 2001-12-05 00:57:30
Message-ID: 20011205095730N.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> Well, it'd be an easy enough addition:
>
> if (strlen(delim) != 1)
> elog(ERROR, "COPY delimiter must be a single character");
>
> This isn't multibyte-aware, but then neither is the implementation;
> delimiters that are multibyte characters won't work at the moment.
>
> Any comments out there?

I think it will be acceptable for multibyte users only ASCII
characters could be a candidate of delimiters.
I don't think anybody wants to use Kanji as a delimiter:-)
--
Tatsuo Ishii


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Doug McNaught <doug(at)wireboard(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in COPY
Date: 2001-12-28 19:43:46
Message-ID: 200112281943.fBSJhkg06309@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > We could support keywords DELIMITER and DELIMITERS and only document
> > the first one.
>
> One could also argue that it should be WITH DELIMITER for more
> consistency with the other optional clauses.
>
> But let's put that in the TODO list, not try to get it done now...

Updated TODO:

COPY
...
o Change syntax to WITH DELIMITER, (keep old syntax around?)

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in COPY
Date: 2002-01-05 01:32:29
Message-ID: Pine.LNX.4.21.0201051223290.19760-101000@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, 28 Dec 2001, Bruce Momjian wrote:

> > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > We could support keywords DELIMITER and DELIMITERS and only document
> > > the first one.
> >
> > One could also argue that it should be WITH DELIMITER for more
> > consistency with the other optional clauses.
> >
> > But let's put that in the TODO list, not try to get it done now...
>
> Updated TODO:
>
> COPY
> ...
> o Change syntax to WITH DELIMITER, (keep old syntax around?)
>

An attached patch implements this. The problem with implementing this is
that the new syntax is:

.... WITH DELIMITERS '<delim>' WITH NULL AS '<char>'

Naturally, this leads to a shift/reduce conflict. Solution is more or less
that used with CREATE DATABASE WITH ... WITH ... etc. The only ugly bit
was mixing this with the old USING DELIMITERS ... syntax. I don't like the
solution -- I get the feeling there's a better way to do it.

The other option of course is to update yylex() to create a new token
in the same way that the UNIONJOIN terminal is created. But I think this
is a bit messy.

Ideas or is this okay?

Gavin

Attachment Content-Type Size
copy.patch.gz application/x-gzip 1.7 KB

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in COPY
Date: 2002-01-05 02:28:32
Message-ID: 200201050228.g052SWl29858@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Saved for 7.3.

---------------------------------------------------------------------------

Gavin Sherry wrote:
> On Fri, 28 Dec 2001, Bruce Momjian wrote:
>
> > > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > > We could support keywords DELIMITER and DELIMITERS and only document
> > > > the first one.
> > >
> > > One could also argue that it should be WITH DELIMITER for more
> > > consistency with the other optional clauses.
> > >
> > > But let's put that in the TODO list, not try to get it done now...
> >
> > Updated TODO:
> >
> > COPY
> > ...
> > o Change syntax to WITH DELIMITER, (keep old syntax around?)
> >
>
> An attached patch implements this. The problem with implementing this is
> that the new syntax is:
>
> .... WITH DELIMITERS '<delim>' WITH NULL AS '<char>'
>
> Naturally, this leads to a shift/reduce conflict. Solution is more or less
> that used with CREATE DATABASE WITH ... WITH ... etc. The only ugly bit
> was mixing this with the old USING DELIMITERS ... syntax. I don't like the
> solution -- I get the feeling there's a better way to do it.
>
> The other option of course is to update yylex() to create a new token
> in the same way that the UNIONJOIN terminal is created. But I think this
> is a bit messy.
>
> Ideas or is this okay?
>
> Gavin

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in COPY
Date: 2002-02-23 02:01:12
Message-ID: 200202230201.g1N21Cl29775@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Your patch has been added to the PostgreSQL unapplied patches list at:

http://candle.pha.pa.us/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------

Gavin Sherry wrote:
> On Fri, 28 Dec 2001, Bruce Momjian wrote:
>
> > > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > > We could support keywords DELIMITER and DELIMITERS and only document
> > > > the first one.
> > >
> > > One could also argue that it should be WITH DELIMITER for more
> > > consistency with the other optional clauses.
> > >
> > > But let's put that in the TODO list, not try to get it done now...
> >
> > Updated TODO:
> >
> > COPY
> > ...
> > o Change syntax to WITH DELIMITER, (keep old syntax around?)
> >
>
> An attached patch implements this. The problem with implementing this is
> that the new syntax is:
>
> .... WITH DELIMITERS '<delim>' WITH NULL AS '<char>'
>
> Naturally, this leads to a shift/reduce conflict. Solution is more or less
> that used with CREATE DATABASE WITH ... WITH ... etc. The only ugly bit
> was mixing this with the old USING DELIMITERS ... syntax. I don't like the
> solution -- I get the feeling there's a better way to do it.
>
> The other option of course is to update yylex() to create a new token
> in the same way that the UNIONJOIN terminal is created. But I think this
> is a bit messy.
>
> Ideas or is this okay?
>
> Gavin

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in COPY
Date: 2002-02-25 03:42:10
Message-ID: 200202250342.g1P3gAS23872@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Gavin, can I get documentation patches to match this patch? Thanks.

---------------------------------------------------------------------------

Gavin Sherry wrote:
> On Fri, 28 Dec 2001, Bruce Momjian wrote:
>
> > > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > > We could support keywords DELIMITER and DELIMITERS and only document
> > > > the first one.
> > >
> > > One could also argue that it should be WITH DELIMITER for more
> > > consistency with the other optional clauses.
> > >
> > > But let's put that in the TODO list, not try to get it done now...
> >
> > Updated TODO:
> >
> > COPY
> > ...
> > o Change syntax to WITH DELIMITER, (keep old syntax around?)
> >
>
> An attached patch implements this. The problem with implementing this is
> that the new syntax is:
>
> .... WITH DELIMITERS '<delim>' WITH NULL AS '<char>'
>
> Naturally, this leads to a shift/reduce conflict. Solution is more or less
> that used with CREATE DATABASE WITH ... WITH ... etc. The only ugly bit
> was mixing this with the old USING DELIMITERS ... syntax. I don't like the
> solution -- I get the feeling there's a better way to do it.
>
> The other option of course is to update yylex() to create a new token
> in the same way that the UNIONJOIN terminal is created. But I think this
> is a bit messy.
>
> Ideas or is this okay?
>
> Gavin

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: WITH DELIMITERS in COPY
Date: 2002-03-05 06:38:06
Message-ID: 200203050638.g256c6725323@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Seems the original title about "feature causes performance in COPY" was
confusing. This patch merely fixes the identified TODO item in the
grammar about using WITH in COPY.

I will apply tomorrow.

---------------------------------------------------------------------------

Gavin Sherry wrote:
> On Fri, 28 Dec 2001, Bruce Momjian wrote:
>
> > > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > > We could support keywords DELIMITER and DELIMITERS and only document
> > > > the first one.
> > >
> > > One could also argue that it should be WITH DELIMITER for more
> > > consistency with the other optional clauses.
> > >
> > > But let's put that in the TODO list, not try to get it done now...
> >
> > Updated TODO:
> >
> > COPY
> > ...
> > o Change syntax to WITH DELIMITER, (keep old syntax around?)
> >
>
> An attached patch implements this. The problem with implementing this is
> that the new syntax is:
>
> .... WITH DELIMITERS '<delim>' WITH NULL AS '<char>'
>
> Naturally, this leads to a shift/reduce conflict. Solution is more or less
> that used with CREATE DATABASE WITH ... WITH ... etc. The only ugly bit
> was mixing this with the old USING DELIMITERS ... syntax. I don't like the
> solution -- I get the feeling there's a better way to do it.
>
> The other option of course is to update yylex() to create a new token
> in the same way that the UNIONJOIN terminal is created. But I think this
> is a bit messy.
>
> Ideas or is this okay?
>
> Gavin

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-03-05 10:21:58
Message-ID: Pine.LNX.4.21.0203052058060.18954-100000@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi Bruce,

On Tue, 5 Mar 2002, Bruce Momjian wrote:

>
> Seems the original title about "feature causes performance in COPY" was
> confusing.

Oops.

> This patch merely fixes the identified TODO item in the
> grammar about using WITH in COPY.

Now that I look at this patch again I don't think I like the
syntax.

COPY [BINARY] <relation> [WITH OIDS] TO | FROM <file> [[USING DELIMITERS |
WITH DELIMITER] <delimiter> [WITH NULL AS <char>]

It isn't very elegant.

1) I forced the parser to be able to handle multiple WITHs, but that
doesn't mean its right. I can't remember why I didn't propose a better
syntax back then, such as:

... [WITH [DELIMITER <delimiter>,] [NULL AS <char>]]

2) Given (1), Why does WITH OIDS belong where it is now? Why not have it
as an 'option' at the end?

Anyone have any opinion on this?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-03-05 16:17:16
Message-ID: 15865.1015345036@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> Now that I look at this patch again I don't think I like the
> syntax.

> COPY [BINARY] <relation> [WITH OIDS] TO | FROM <file> [[USING DELIMITERS |
> WITH DELIMITER] <delimiter> [WITH NULL AS <char>]

> It isn't very elegant.

> 1) I forced the parser to be able to handle multiple WITHs, but that
> doesn't mean its right.

It seems wrong to me. The other statements that use WITH use only one
WITH to introduce a list of option clauses.

I can't remember why I didn't propose a better
> syntax back then, such as:

> ... [WITH [DELIMITER <delimiter>,] [NULL AS <char>]]

The other statements you might model this on don't use commas either.
The closest thing to the precedents would be

... [WITH copyoption [copyoption ...]]

copyoption := DELIMITER delim
| NULL AS nullstring
| etc

To get some modicum of backwards compatibility, we could allow either
DELIMITER or DELIMITERS as a copy-option keyword, and we could allow
USING as a substitute for the initial WITH. This still won't be quite
backwards compatible for statements that use both of the old option
clauses; how worried are we about that? Maybe we could persuade the
parser to handle

... [ WITH | USING copyoption [ [WITH] copyoption ... ]]

but my that seems ugly.

> 2) Given (1), Why does WITH OIDS belong where it is now? Why not have it
> as an 'option' at the end?

Historical precedent, mainly. Changing this would break existing
pg_dump files, so I'm not eager to do it. (AFAIR pg_dump never uses
DELIMITERS nor NULL AS, so it doesn't care if you change that part
of the syntax.)

If we were working in a green field I'd vote for moving BINARY into the
WITH-options too, but we aren't. Again that seems too likely to break
things in the name of a small amount of added consistency.

regards, tom lane


From: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-03-05 17:07:14
Message-ID: 15492.64322.981365.159706@kelvin.csl.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane writes:
> [ COPY syntax ]
> If we were working in a green field I'd vote for moving BINARY into the
> WITH-options too, but we aren't. Again that seems too likely to break
> things in the name of a small amount of added consistency.

Hawabout 'COPY ...' retains the existing syntax but if used like 'COPY
TABLE ...' sane syntax is used?


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-03-05 18:01:36
Message-ID: 200203051801.g25I1a928368@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Lee Kindness wrote:
> Tom Lane writes:
> > [ COPY syntax ]
> > If we were working in a green field I'd vote for moving BINARY into the
> > WITH-options too, but we aren't. Again that seems too likely to break
> > things in the name of a small amount of added consistency.
>
> Hawabout 'COPY ...' retains the existing syntax but if used like 'COPY
> TABLE ...' sane syntax is used?

Interesting idea, but TABLE just seems to be a noise word to me. COPY
is one of those commands that it is hard to change because pg_dump
relies on it.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Undocumented feature costs a lot of performance in COPY
Date: 2002-03-05 18:02:16
Message-ID: 200203051802.g25I2Gx28433@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Here is the original patch, which is now rejected as we discuss a new
patch on hackers.

---------------------------------------------------------------------------

Gavin Sherry wrote:
> On Fri, 28 Dec 2001, Bruce Momjian wrote:
>
> > > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > > We could support keywords DELIMITER and DELIMITERS and only document
> > > > the first one.
> > >
> > > One could also argue that it should be WITH DELIMITER for more
> > > consistency with the other optional clauses.
> > >
> > > But let's put that in the TODO list, not try to get it done now...
> >
> > Updated TODO:
> >
> > COPY
> > ...
> > o Change syntax to WITH DELIMITER, (keep old syntax around?)
> >
>
> An attached patch implements this. The problem with implementing this is
> that the new syntax is:
>
> .... WITH DELIMITERS '<delim>' WITH NULL AS '<char>'
>
> Naturally, this leads to a shift/reduce conflict. Solution is more or less
> that used with CREATE DATABASE WITH ... WITH ... etc. The only ugly bit
> was mixing this with the old USING DELIMITERS ... syntax. I don't like the
> solution -- I get the feeling there's a better way to do it.
>
> The other option of course is to update yylex() to create a new token
> in the same way that the UNIONJOIN terminal is created. But I think this
> is a bit messy.
>
> Ideas or is this okay?
>
> Gavin

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-04-14 05:02:59
Message-ID: 200204140502.g3E52xj02970@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Gavin, I will do the legwork on this if you wish. I think we need to
use DefElem to store the COPY params, rather than using specific fields
in CopyStmt.

Would you send me your original patch so I am make sure I hit
everything. I can't seem to find a copy. If you would like to work on
it, I can give you what I have and walk you through the process.

---------------------------------------------------------------------------

Gavin Sherry wrote:
> Hi Bruce,
>
> On Tue, 5 Mar 2002, Bruce Momjian wrote:
>
> >
> > Seems the original title about "feature causes performance in COPY" was
> > confusing.
>
> Oops.
>
> > This patch merely fixes the identified TODO item in the
> > grammar about using WITH in COPY.
>
> Now that I look at this patch again I don't think I like the
> syntax.
>
> COPY [BINARY] <relation> [WITH OIDS] TO | FROM <file> [[USING DELIMITERS |
> WITH DELIMITER] <delimiter> [WITH NULL AS <char>]
>
> It isn't very elegant.
>
> 1) I forced the parser to be able to handle multiple WITHs, but that
> doesn't mean its right. I can't remember why I didn't propose a better
> syntax back then, such as:
>
> ... [WITH [DELIMITER <delimiter>,] [NULL AS <char>]]
>
> 2) Given (1), Why does WITH OIDS belong where it is now? Why not have it
> as an 'option' at the end?
>
> Anyone have any opinion on this?
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-04-14 17:34:06
Message-ID: Pine.LNX.4.21.0204150132380.30483-101000@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sun, 14 Apr 2002, Bruce Momjian wrote:

>
> Gavin, I will do the legwork on this if you wish. I think we need to

No matter. I intended to submit a patch to fix this.

> use DefElem to store the COPY params, rather than using specific fields
> in CopyStmt.

DefElem would have required modification of code outside the parser (to
keep utility.c and DoCopy() happy) or otherwise an even messier loop
executed as a result of CopyStmt than I have given in the attached patch,
plus other issues with Yacc.

The patch attached maintains backward compatibility. The syntax is as
follows:

COPY [BINARY] <relname> [WITH OIDS] FROM/TO
[USING DELIMITERS <delimiter>]
[WITH [ DELIMITER <delimiter> | NULL AS <char> | OIDS ]]

I was also going to allow BINARY in the WITH list, but there seems to be
little point.

Note that if you execute a query such as:

COPY pg_class TO '/some/path/file/out'
USING DELIMITERS <tab>
WITH DELIMITER '|';

The code will give preference to WITH DELIMITER.

If no one can find fault with this or my implementation, I'll follow up
with documentation and psql patches (not sure that there is much point
patching pg_dump).

Gavin

Attachment Content-Type Size
copy2.diff.gz application/x-gzip 1.9 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-04-14 17:46:39
Message-ID: Pine.LNX.4.30.0204141343470.717-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gavin Sherry writes:

> The patch attached maintains backward compatibility. The syntax is as
> follows:
>
> COPY [BINARY] <relname> [WITH OIDS] FROM/TO
> [USING DELIMITERS <delimiter>]
> [WITH [ DELIMITER <delimiter> | NULL AS <char> | OIDS ]]

I think we should lose the WITH altogether. It's not any better than
USING.

But just saying "OIDS" is not very clear. In this case the WITH is
necessary.

> Note that if you execute a query such as:
>
> COPY pg_class TO '/some/path/file/out'
> USING DELIMITERS <tab>
> WITH DELIMITER '|';
>
> The code will give preference to WITH DELIMITER.

That should be an error.

> If no one can find fault with this or my implementation, I'll follow up
> with documentation and psql patches (not sure that there is much point
> patching pg_dump).

pg_dump should use the new syntax.

--
Peter Eisentraut peter_e(at)gmx(dot)net


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-04-14 19:03:11
Message-ID: 200204141903.g3EJ3Bd22332@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Gavin, I see where you are going with the patch; creating a list in
gram.y and stuffing CopyStmt directly there. However, I can't find any
other instance of our stuffing things like that in gram.y. We do have
cases using options like COPY in CREATE USER, and we do use DefElem.

I realize it will require changes to other files like copy.c. However,
it seems like the cleanest solution. I guess I am not excited about
adding another way to handle WITH options into the code. Now, if you
want to argue that CREATE USER shouldn't use DefElem either, we can
discuss that, but I think we need to be consistent in how we handle COPY
vs. the other commands that use parameters.

See commands/user.c for an example of how it uses DefElem, and the tests
done to make sure conflicting arguments are not used or in copy's case,
specified twice. It just seems like that is the cleanest way to go.

One idea I had for the code is to allow BINARY as $2, and WITH OIDS in
its current place, and all options in the new WITH location, and
concatentate them together into one DefElem list in gram.y, and pass
that to copy.c. That way, you can allow BINARY and others at the end
too and the list is in one central place.

---------------------------------------------------------------------------

Gavin Sherry wrote:
> On Sun, 14 Apr 2002, Bruce Momjian wrote:
>
> >
> > Gavin, I will do the legwork on this if you wish. I think we need to
>
> No matter. I intended to submit a patch to fix this.
>
> > use DefElem to store the COPY params, rather than using specific fields
> > in CopyStmt.
>
> DefElem would have required modification of code outside the parser (to
> keep utility.c and DoCopy() happy) or otherwise an even messier loop
> executed as a result of CopyStmt than I have given in the attached patch,
> plus other issues with Yacc.
>
> The patch attached maintains backward compatibility. The syntax is as
> follows:
>
> COPY [BINARY] <relname> [WITH OIDS] FROM/TO
> [USING DELIMITERS <delimiter>]
> [WITH [ DELIMITER <delimiter> | NULL AS <char> | OIDS ]]
>
> I was also going to allow BINARY in the WITH list, but there seems to be
> little point.
>
> Note that if you execute a query such as:
>
> COPY pg_class TO '/some/path/file/out'
> USING DELIMITERS <tab>
> WITH DELIMITER '|';
>
> The code will give preference to WITH DELIMITER.
>
> If no one can find fault with this or my implementation, I'll follow up
> with documentation and psql patches (not sure that there is much point
> patching pg_dump).
>
> Gavin

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-04-15 02:02:46
Message-ID: Pine.LNX.4.21.0204151201200.18978-100000@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sun, 14 Apr 2002, Bruce Momjian wrote:

>
> Gavin, I see where you are going with the patch; creating a list in
> gram.y and stuffing CopyStmt directly there. However, I can't find any
> other instance of our stuffing things like that in gram.y. We do have
> cases using options like COPY in CREATE USER, and we do use DefElem.

CREATE DATABASE also fills out a list in the same fashion =). I will
however have a look at revising this patch to use DefElem later today.

Thanks,

Gavin


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-04-15 02:08:14
Message-ID: 200204150208.g3F28Ed12552@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gavin Sherry wrote:
> On Sun, 14 Apr 2002, Bruce Momjian wrote:
>
> >
> > Gavin, I see where you are going with the patch; creating a list in
> > gram.y and stuffing CopyStmt directly there. However, I can't find any
> > other instance of our stuffing things like that in gram.y. We do have
> > cases using options like COPY in CREATE USER, and we do use DefElem.
>
> CREATE DATABASE also fills out a list in the same fashion =). I will
> however have a look at revising this patch to use DefElem later today.

Oh, I see that now. Which method do people prefer. We should probably
make them all use the same mechanism.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-04-15 03:19:51
Message-ID: 6368.1018840791@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Gavin Sherry wrote:
>> CREATE DATABASE also fills out a list in the same fashion =). I will
>> however have a look at revising this patch to use DefElem later today.

> Oh, I see that now. Which method do people prefer. We should probably
> make them all use the same mechanism.

Consistency? Who needs consistency ;-) ?

Seriously, I do not see a need to change either of these approaches
just for the sake of changing it. CREATE DATABASE is okay as-is, and
so are the statements that use DefElem. I tend to like DefElem better
for the statements that we change around frequently ... for instance
the recent changes to the set of volatility keywords for functions
didn't require any changes to the grammar or the parsenode definitions.
But I think that a simple struct definition is easier to understand,
so I favor that for stable feature sets.

As for which one is better suited for COPY, I don't have a strong
opinion, but lean to DefElem. Seems like COPY will probably keep
accreting new features.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-04-16 21:27:21
Message-ID: 200204162127.g3GLRLo09097@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Gavin Sherry wrote:
> >> CREATE DATABASE also fills out a list in the same fashion =). I will
> >> however have a look at revising this patch to use DefElem later today.
>
> > Oh, I see that now. Which method do people prefer. We should probably
> > make them all use the same mechanism.
>
> Consistency? Who needs consistency ;-) ?
>
> Seriously, I do not see a need to change either of these approaches
> just for the sake of changing it. CREATE DATABASE is okay as-is, and
> so are the statements that use DefElem. I tend to like DefElem better
> for the statements that we change around frequently ... for instance
> the recent changes to the set of volatility keywords for functions
> didn't require any changes to the grammar or the parsenode definitions.
> But I think that a simple struct definition is easier to understand,
> so I favor that for stable feature sets.
>
> As for which one is better suited for COPY, I don't have a strong
> opinion, but lean to DefElem. Seems like COPY will probably keep
> accreting new features.

The code that bothered me about the CREATE DATABASE param processing
was:

/* process additional options */
foreach(l, $5)
{
List *optitem = (List *) lfirst(l);

switch (lfirsti(optitem))
{
case 1:
n->dbpath = (char *) lsecond(optitem);
break;
case 2:
n->dbtemplate = (char *) lsecond(optitem);
break;
case 3:
n->encoding = lfirsti(lnext(optitem));
break;
case 4:
n->dbowner = (char *) lsecond(optitem);
break;
}
}

I see what it is doing, but it seems quite unclear. Seeing that people
are using this as a pattern for other param processing, I will work on a
patch to convert this to DefElem.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-04-16 23:34:17
Message-ID: 1894.1019000057@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> The code that bothered me about the CREATE DATABASE param processing
> was:

> /* process additional options */
> foreach(l, $5)
> {
> List *optitem = (List *) lfirst(l);

> switch (lfirsti(optitem))
> {
> case 1:
> n->dbpath = (char *) lsecond(optitem);
> break;
> case 2:
> n->dbtemplate = (char *) lsecond(optitem);
> break;
> case 3:
> n->encoding = lfirsti(lnext(optitem));
> break;
> case 4:
> n->dbowner = (char *) lsecond(optitem);
> break;
> }
> }

> I see what it is doing, but it seems quite unclear. Seeing that people
> are using this as a pattern for other param processing, I will work on a
> patch to convert this to DefElem.

Oh, I think we were talking at cross-purposes then. What you're really
unhappy about is that this uses a list of two-element sublists? Yeah,
I agree, that's a messy data structure; a list of DefElem would be
perhaps cleaner. Not sure if it matters all that much though, since the
list only exists in the context of a few productions in gram.y. Perhaps
adding a couple of lines of documentation would be better than changing
the code.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-04-16 23:49:45
Message-ID: 200204162349.g3GNnj527600@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Oh, I think we were talking at cross-purposes then. What you're really
> unhappy about is that this uses a list of two-element sublists? Yeah,
> I agree, that's a messy data structure; a list of DefElem would be
> perhaps cleaner. Not sure if it matters all that much though, since the
> list only exists in the context of a few productions in gram.y. Perhaps
> adding a couple of lines of documentation would be better than changing
> the code.

Yea, documenation and/or a list of DefElem would be nicer. The problem
is that people are going to copy this in the future, so I may as well do
it right. Can't take more than 20 minutes.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-04-16 23:55:25
Message-ID: 200204162355.g3GNtQW28665@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gavin Sherry wrote:
> > I see what it is doing, but it seems quite unclear. Seeing that people
> > are using this as a pattern for other param processing, I will work on a
> > patch to convert this to DefElem.
>
> Wouldn't a few macros clean this up better (ie, make it clearer)?
>
> #define CDBOPTDBPATH 1
>
> #define optparam(l) (char *)lsecond(l)
> #define optparami(l) (int)lfirsti(lnext(l))
>
> foreach(l, $5)
> {
> List *optitem = (List *) lfirst(l);
>
> switch (lfirsti(optitem))
> {
> case CDBOPTDBPATH:
> n->dbpath = optparam(optitem);
> break;
>
> ...
>
>
> Regardless, I guess that code is pointless since the consensus seems to be
> that the use of DefElem is better since it allows for the abstraction of
> the parameters list. Obviously a good thing if CREATE DATABASE, COPY etc
> are to be extended often enough.
>

Yes, macros would be the way to go if we didn't have a cleaner
alternative.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WITH DELIMITERS in COPY
Date: 2002-04-16 23:56:03
Message-ID: Pine.LNX.4.21.0204170917300.13432-100000@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, 16 Apr 2002, Bruce Momjian wrote:

> The code that bothered me about the CREATE DATABASE param processing
> was:
>
> /* process additional options */
> foreach(l, $5)
> {
> List *optitem = (List *) lfirst(l);
>
> switch (lfirsti(optitem))
> {
> case 1:
> n->dbpath = (char *) lsecond(optitem);
> break;
> case 2:
> n->dbtemplate = (char *) lsecond(optitem);
> break;
> case 3:
> n->encoding = lfirsti(lnext(optitem));
> break;
> case 4:
> n->dbowner = (char *) lsecond(optitem);
> break;
> }
> }
>
> I see what it is doing, but it seems quite unclear. Seeing that people
> are using this as a pattern for other param processing, I will work on a
> patch to convert this to DefElem.

Wouldn't a few macros clean this up better (ie, make it clearer)?

#define CDBOPTDBPATH 1

#define optparam(l) (char *)lsecond(l)
#define optparami(l) (int)lfirsti(lnext(l))

foreach(l, $5)
{
List *optitem = (List *) lfirst(l);

switch (lfirsti(optitem))
{
case CDBOPTDBPATH:
n->dbpath = optparam(optitem);
break;

...

Regardless, I guess that code is pointless since the consensus seems to be
that the use of DefElem is better since it allows for the abstraction of
the parameters list. Obviously a good thing if CREATE DATABASE, COPY etc
are to be extended often enough.

Gavin


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] WITH DELIMITERS in COPY
Date: 2002-04-21 17:39:46
Message-ID: Pine.LNX.4.21.0204220334570.25580-101000@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce,

Attached is a modified patch, using DefElem instead of the 'roll your own'
method of collecting optional parameters to CopyStmt.

Naturally, DoCopy() as well as a few support functions needed to be
modified to get this going.

In order to check if parameters were being passed more than once (COPY
... WITH OIDS FROM ... WITH DELIMITER '\t' OIDS), I have added a function
defelemmember() to list.c. I could not think, off the top of my head, of a
more elegant way to do this.

Gavin

On Sun, 14 Apr 2002, Bruce Momjian wrote:

>
> Gavin, I will do the legwork on this if you wish. I think we need to
> use DefElem to store the COPY params, rather than using specific fields
> in CopyStmt.
>
> Would you send me your original patch so I am make sure I hit
> everything. I can't seem to find a copy. If you would like to work on
> it, I can give you what I have and walk you through the process.
>
> ---------------------------------------------------------------------------
>
> Gavin Sherry wrote:
> > Hi Bruce,
> >
> > On Tue, 5 Mar 2002, Bruce Momjian wrote:
> >
> > >
> > > Seems the original title about "feature causes performance in COPY" was
> > > confusing.
> >
> > Oops.
> >
> > > This patch merely fixes the identified TODO item in the
> > > grammar about using WITH in COPY.
> >
> > Now that I look at this patch again I don't think I like the
> > syntax.
> >
> > COPY [BINARY] <relation> [WITH OIDS] TO | FROM <file> [[USING DELIMITERS |
> > WITH DELIMITER] <delimiter> [WITH NULL AS <char>]
> >
> > It isn't very elegant.
> >
> > 1) I forced the parser to be able to handle multiple WITHs, but that
> > doesn't mean its right. I can't remember why I didn't propose a better
> > syntax back then, such as:
> >
> > ... [WITH [DELIMITER <delimiter>,] [NULL AS <char>]]
> >
> > 2) Given (1), Why does WITH OIDS belong where it is now? Why not have it
> > as an 'option' at the end?
> >
> > Anyone have any opinion on this?
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
> >
>
>

Attachment Content-Type Size
copy.diff.gz application/x-gzip 4.0 KB

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] WITH DELIMITERS in COPY
Date: 2002-04-23 16:19:21
Message-ID: 200204231619.g3NGJL410624@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


I see in user.c::CreateUser:

if (strcmp(defel->defname, "password") == 0 ||
strcmp(defel->defname, "encryptedPassword") == 0 ||
strcmp(defel->defname, "unencryptedPassword") == 0)
{
if (dpassword)
elog(ERROR, "CREATE USER: conflicting options");
dpassword = defel;
if (strcmp(defel->defname, "encryptedPassword") == 0)
encrypt_password = true;
else if (strcmp(defel->defname, "unencryptedPassword") == 0)
encrypt_password = false;
}
else if (strcmp(defel->defname, "sysid") == 0)
{
if (dsysid)
elog(ERROR, "CREATE USER: conflicting options");
dsysid = defel;
}

Looks like this is how we normally test for conflicting params. Does
this help?

---------------------------------------------------------------------------

Gavin Sherry wrote:
> Bruce,
>
> Attached is a modified patch, using DefElem instead of the 'roll your own'
> method of collecting optional parameters to CopyStmt.
>
> Naturally, DoCopy() as well as a few support functions needed to be
> modified to get this going.
>
> In order to check if parameters were being passed more than once (COPY
> ... WITH OIDS FROM ... WITH DELIMITER '\t' OIDS), I have added a function
> defelemmember() to list.c. I could not think, off the top of my head, of a
> more elegant way to do this.
>
> Gavin
>
>
> On Sun, 14 Apr 2002, Bruce Momjian wrote:
>
> >
> > Gavin, I will do the legwork on this if you wish. I think we need to
> > use DefElem to store the COPY params, rather than using specific fields
> > in CopyStmt.
> >
> > Would you send me your original patch so I am make sure I hit
> > everything. I can't seem to find a copy. If you would like to work on
> > it, I can give you what I have and walk you through the process.
> >
> > ---------------------------------------------------------------------------
> >
> > Gavin Sherry wrote:
> > > Hi Bruce,
> > >
> > > On Tue, 5 Mar 2002, Bruce Momjian wrote:
> > >
> > > >
> > > > Seems the original title about "feature causes performance in COPY" was
> > > > confusing.
> > >
> > > Oops.
> > >
> > > > This patch merely fixes the identified TODO item in the
> > > > grammar about using WITH in COPY.
> > >
> > > Now that I look at this patch again I don't think I like the
> > > syntax.
> > >
> > > COPY [BINARY] <relation> [WITH OIDS] TO | FROM <file> [[USING DELIMITERS |
> > > WITH DELIMITER] <delimiter> [WITH NULL AS <char>]
> > >
> > > It isn't very elegant.
> > >
> > > 1) I forced the parser to be able to handle multiple WITHs, but that
> > > doesn't mean its right. I can't remember why I didn't propose a better
> > > syntax back then, such as:
> > >
> > > ... [WITH [DELIMITER <delimiter>,] [NULL AS <char>]]
> > >
> > > 2) Given (1), Why does WITH OIDS belong where it is now? Why not have it
> > > as an 'option' at the end?
> > >
> > > Anyone have any opinion on this?
> > >
> > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
> > >
> >
> >

Content-Description:

[ Attachment, skipping... ]

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026