Re: Good Delimiter for copy command

Lists: pgsql-generalpgsql-hackers
From: SHARMILA JOTHIRAJAH <sharmi_jo(at)yahoo(dot)com>
To: General postgres mailing list <pgsql-general(at)postgresql(dot)org>
Subject: Good Delimiter for copy command
Date: 2009-02-10 15:07:42
Message-ID: 472422.2908.qm@web110715.mail.gq1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hi,
What is a good delimiter to use for COPY command. Im trying to use COPY command to copy data from one table to another in 2 different databases.
Can you suggest a unique delimiter that I can use for this COPY command

Thanks
Sharmial


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Good Delimiter for copy command
Date: 2009-02-10 19:41:11
Message-ID: 20090210194111.GY3008@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Tue, Feb 10, 2009 at 07:07:42AM -0800, SHARMILA JOTHIRAJAH wrote:
> What is a good delimiter to use for COPY command. Im trying to use
> COPY command to copy data from one table to another in 2 different
> databases.
> Can you suggest a unique delimiter that I can use for this COPY command

If you're fixed with using COPY then what about just using CSV mode?

COPY table TO STDOUT WITH CSV;

If not then why not use pg_dump?

Sam


From: SHARMILA JOTHIRAJAH <sharmi_jo(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org, Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: Good Delimiter for copy command
Date: 2009-02-12 20:51:38
Message-ID: 580984.21817.qm@web110715.mail.gq1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

My data fields contains commas, tabs,'|'  etc. So I cant use t hem as delimiters..so I need a unique may be non-character to use as a delimiter...
-Sharmila

--- On Tue, 2/10/09, Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
From: Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: [GENERAL] Good Delimiter for copy command
To: pgsql-general(at)postgresql(dot)org
Date: Tuesday, February 10, 2009, 2:41 PM

On Tue, Feb 10, 2009 at 07:07:42AM -0800, SHARMILA JOTHIRAJAH wrote:
> What is a good delimiter to use for COPY command. Im trying to use
> COPY command to copy data from one table to another in 2 different
> databases.
> Can you suggest a unique delimiter that I can use for this COPY command

If you're fixed with using COPY then what about just using CSV mode?

COPY table TO STDOUT WITH CSV;

If not then why not use pg_dump?

Sam


From: Raymond O'Donnell <rod(at)iol(dot)ie>
To: sharmi_jo(at)yahoo(dot)com
Cc: pgsql-general(at)postgresql(dot)org, Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: Good Delimiter for copy command
Date: 2009-02-12 21:11:44
Message-ID: 49949090.80203@iol.ie
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 12/02/2009 20:51, SHARMILA JOTHIRAJAH wrote:
> My data fields contains commas, tabs,'|' etc. So I cant use t hem as
> delimiters..so I need a unique may be non-character to use as a
> delimiter...

Indeed you do... and a quick RTFM reveals that you can set a character
of your choice:

http://www.postgresql.org/docs/8.3/interactive/sql-copy.html

...look for the "delimiter" parameter.

Ray.

------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod(at)iol(dot)ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------


From: Andrew Gould <andrewlylegould(at)gmail(dot)com>
To: sharmi_jo(at)yahoo(dot)com
Cc: pgsql-general(at)postgresql(dot)org, Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: Good Delimiter for copy command
Date: 2009-02-12 21:15:06
Message-ID: d356c5630902121315p1ab23982h5101e0aa9de8bb31@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, Feb 12, 2009 at 2:51 PM, SHARMILA JOTHIRAJAH <sharmi_jo(at)yahoo(dot)com>wrote:

> My data fields contains commas, tabs,'|' etc. So I cant use t hem as
> delimiters..so I need a unique may be non-character to use as a delimiter...
> -Sharmila
>

How are you backing up your data? If your backup method deals appropriately
with the characters mentioned above, then backup the data and reload them
into the other tables.

If your backup method does not deal appropriately with the characters you
mention, you may have larger problems on your hands.

To the list: Does pg_dump escape characters that are the same as the
delimiter?

Thanks,

Andrew


From: SHARMILA JOTHIRAJAH <sharmi_jo(at)yahoo(dot)com>
To: Andrew Gould <andrewlylegould(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: Good Delimiter for copy command
Date: 2009-02-12 21:32:48
Message-ID: 46616.94728.qm@web110706.mail.gq1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

--- On Thu, 2/12/09, Andrew Gould <andrewlylegould(at)gmail(dot)com> wrote:

> From: Andrew Gould <andrewlylegould(at)gmail(dot)com>
> Subject: Re: [GENERAL] Good Delimiter for copy command
> To: sharmi_jo(at)yahoo(dot)com
> Cc: pgsql-general(at)postgresql(dot)org, "Sam Mason" <sam(at)samason(dot)me(dot)uk>
> Date: Thursday, February 12, 2009, 4:15 PM
> On Thu, Feb 12, 2009 at 2:51 PM, SHARMILA JOTHIRAJAH
> <sharmi_jo(at)yahoo(dot)com>wrote:
>
> > My data fields contains commas, tabs,'|' etc.
> So I cant use t hem as
> > delimiters..so I need a unique may be non-character to
> use as a delimiter...
> > -Sharmila
> >
>
>
> How are you backing up your data? If your backup method
> deals appropriately
> with the characters mentioned above, then backup the data
> and reload them
> into the other tables.

Im not using this for loading postgres data to postgres. Im trying this method to load my Oracle data to postgresql tables.... just trying to migrate my data from oracle to postgresql

Thanks
Sharmila
>
> If your backup method does not deal appropriately with the
> characters you
> mention, you may have larger problems on your hands.
>
> To the list: Does pg_dump escape characters that are the
> same as the
> delimiter?
>
> Thanks,
>
> Andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gould <andrewlylegould(at)gmail(dot)com>
Cc: sharmi_jo(at)yahoo(dot)com, pgsql-general(at)postgresql(dot)org, Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: Good Delimiter for copy command
Date: 2009-02-12 21:41:45
Message-ID: 17248.1234474905@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Andrew Gould <andrewlylegould(at)gmail(dot)com> writes:
> To the list: Does pg_dump escape characters that are the same as the
> delimiter?

Yes. The OP has not actually explained why he needs to pick a
nondefault delimiter, unless maybe it is that he wants to feed the
dump to some program that is too dumb to deal with escaping.

regards, tom lane


From: Mark Roberts <mailing_lists(at)pandapocket(dot)com>
To: sharmi_jo(at)yahoo(dot)com
Cc: pgsql-general(at)postgresql(dot)org, Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: Good Delimiter for copy command
Date: 2009-02-12 22:19:01
Message-ID: 1234477141.11148.252.camel@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On Thu, 2009-02-12 at 12:51 -0800, SHARMILA JOTHIRAJAH wrote:
>
> My data fields contains commas, tabs,'|' etc. So I cant use t hem as
> delimiters..so I need a unique may be non-character to use as a
> delimiter...
> -Sharmila

Is this a theoretical problem or an actual one? I haven't had any
problems with the default (tab). In fact, copying from one database to
another is an exceedingly common task that I do, all done with tab.

dev=> create table foo (a text);
CREATE TABLE
Time: 385.967 ms
dev=> insert into foo values (' '); -- literal tab
INSERT 0 1
Time: 0.536 ms
dev=> insert into foo values ('\t'); -- special character, parsed.
INSERT 0 1
Time: 0.224 ms
dev=> insert into foo values ('\\t'); -- backslash, t
INSERT 0 1
Time: 0.183 ms
dev=> copy foo to stdout;
\t
\t
\\t
Time: 0.188 ms
dev=> select * from foo;
a
------
\x09
\x09
\t
(3 rows)

Time: 0.239 ms
dev=>

-Mark


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: SHARMILA JOTHIRAJAH <sharmi_jo(at)yahoo(dot)com>
Cc: Andrew Gould <andrewlylegould(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org, Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: Good Delimiter for copy command
Date: 2009-02-12 23:03:51
Message-ID: Pine.GSO.4.64.0902121745420.27894@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, 12 Feb 2009, SHARMILA JOTHIRAJAH wrote:

> Im not using this for loading postgres data to postgres. Im trying this
> method to load my Oracle data to postgresql tables.... just trying to
> migrate my data from oracle to postgresql

The point other posts here were trying to make to you is that the ideal
situation is one where the source and destination databases support
escaping the delimiter in the same fashion. You might be able to arrange
that, depends on how you're dumping the data from Oracle.

If not, when you can't get matching delimiter escapes, there is no such
thing as a universal delimiter for moves between databases like this.
You have to look at your source data to figure out what characters aren't
used.

For example, "~" and "`" are good characters to consider because they
don't show up much in regular text. But if your database contained
programming source code, those would be awful delimiter choices. "@" used
to be a potential delimiter choice, nowadays too many databases have
e-mail addresses in them for that to work anymore.

Here's what you can do: pick one delimiter. Start with "~" say. Dump
your data. Now, search the result for the *other* delimiter you might use
(maybe "`"), using something like grep. If it doesn't show up, that other
delimiter will work for you. Otherwise, swap delimiters and try again.

The other option here that can avoid all sorts of headaches is to use
pgloader: http://pgfoundry.org/projects/pgloader/ which will save the
rows that are rejected for some reason, which is usually what happens when
there's a delimiter issue. You can then edit those by hand to work around
random odd delimiter problems.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Marco Colombo <pgsql(at)esiway(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Good Delimiter for copy command
Date: 2009-02-17 16:17:40
Message-ID: 499AE324.4070908@esiway.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Andrew Gould <andrewlylegould(at)gmail(dot)com> writes:
>> To the list: Does pg_dump escape characters that are the same as the
>> delimiter?
>
> Yes. The OP has not actually explained why he needs to pick a
> nondefault delimiter, unless maybe it is that he wants to feed the
> dump to some program that is too dumb to deal with escaping.
>
> regards, tom lane
>

Which makes me wonder, does copy accept UTF-8 input? Is it possibile
to use some unicode character which is unlikely to appear in the data
set as delimiter? Something like U+FFFC.

Here I'm using U+25CF for it's much more likely you can see it
(it's a black circle).

$ python -c 'print u";".join("ABCD").encode("utf8")' > 1.txt
$ cat 1.txt
A;B;C;D

$ python -c 'print u"\u25cf".join("ABCD").encode("utf8")' > 2.txt
$ cat 2.txt
A●B●C●D

$ psql -c "\d test"
Table "public.test"
Column | Type | Modifiers
--------+------+-----------
a | text |
b | text |
c | text |
d | text |

$ psql -c "\copy test from '1.txt' delimiter ;"
$ psql -c "\copy test from '2.txt' delimiter ●"
ERROR: COPY delimiter must be a single character
\copy: ERROR: COPY delimiter must be a single character

It doesn't seem to work, but I need to stress that I'm using
$ psql --version
psql (PostgreSQL) 8.2.9

.TM.


From: Mark Roberts <mailing_lists(at)pandapocket(dot)com>
To: Marco Colombo <pgsql(at)esiway(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Good Delimiter for copy command
Date: 2009-02-17 17:47:15
Message-ID: 1234892835.11148.295.camel@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On Tue, 2009-02-17 at 17:17 +0100, Marco Colombo wrote:
>
> Which makes me wonder, does copy accept UTF-8 input? Is it possibile
> to use some unicode character which is unlikely to appear in the data
> set as delimiter? Something like U+FFFC.

I'm also not able to get unicode characters to copy in like that, but I
also haven't had any problems with data containing the *tab* character.
It seems to be properly escaped (see my other email in this thread) so
it doesn't seem to matter if it appears in the data stream. The *only*
reason I see to switch from tab is if the receiving application requires
it in a different format.

-Mark


From: David Fetter <david(at)fetter(dot)org>
To: Marco Colombo <pgsql(at)esiway(dot)net>
Cc: pgsql-general(at)postgresql(dot)org, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Good Delimiter for copy command
Date: 2009-02-17 17:55:02
Message-ID: 20090217175502.GE6226@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Tue, Feb 17, 2009 at 05:17:40PM +0100, Marco Colombo wrote:
> Tom Lane wrote:
> > Andrew Gould <andrewlylegould(at)gmail(dot)com> writes:
> >> To the list: Does pg_dump escape characters that are the same as the
> >> delimiter?
> >
> > Yes. The OP has not actually explained why he needs to pick a
> > nondefault delimiter, unless maybe it is that he wants to feed the
> > dump to some program that is too dumb to deal with escaping.
> >
> > regards, tom lane
> >
>
> Which makes me wonder, does copy accept UTF-8 input?

Yes, but...

> Is it possibile to use some unicode character which is unlikely to
> appear in the data set as delimiter? Something like U+FFFC.

No. The delimiter needs to be one byte long at the moment. The error
message you're getting probably needs an update. Bug?

> $ psql -c "\copy test from '2.txt' delimiter ●"
> ERROR: COPY delimiter must be a single character
> \copy: ERROR: COPY delimiter must be a single character

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Fetter <david(at)fetter(dot)org>
Cc: Marco Colombo <pgsql(at)esiway(dot)net>, pgsql-general(at)postgresql(dot)org, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Good Delimiter for copy command
Date: 2009-02-17 20:14:52
Message-ID: 7838.1234901692@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

David Fetter <david(at)fetter(dot)org> writes:
> On Tue, Feb 17, 2009 at 05:17:40PM +0100, Marco Colombo wrote:
>> Is it possibile to use some unicode character which is unlikely to
>> appear in the data set as delimiter? Something like U+FFFC.

> No. The delimiter needs to be one byte long at the moment. The error
> message you're getting probably needs an update. Bug?

>> $ psql -c "\copy test from '2.txt' delimiter "
>> ERROR: COPY delimiter must be a single character
>> \copy: ERROR: COPY delimiter must be a single character

In 8.3 and CVS HEAD these messages are phrased as "must be a single
ASCII character" which I suppose is someone's attempt at improving the
situation; but IMHO this has replaced an imprecision with a lie. It
works fine with non-ASCII characters, if you're working in a single-byte
encoding (eg LATIN-n).

I think it should say "must be a single one-byte character" and not try
to talk down to the reader.

regards, tom lane


From: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Good Delimiter for copy command
Date: 2009-02-19 12:58:18
Message-ID: gnjl1a$k71$7@reversiblemaps.ath.cx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 2009-02-12, SHARMILA JOTHIRAJAH <sharmi_jo(at)yahoo(dot)com> wrote:
> --0-1509090113-1234471898=:21817
> Content-Type: text/plain; charset=iso-8859-1
> Content-Transfer-Encoding: quoted-printable
>
> My data fields contains commas, tabs,'|'=A0 etc. So I cant use them as del=
> imiters..so I need a unique may be non-character to use as a delimiter...

posgres will escape the delimiters (tabs come out at \t)

csv will work too (but with a different quoting convention)


From: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Good Delimiter for copy command
Date: 2009-02-19 13:05:12
Message-ID: gnjle8$k71$8@reversiblemaps.ath.cx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 2009-02-12, SHARMILA JOTHIRAJAH <sharmi_jo(at)yahoo(dot)com> wrote:
>
> Im not using this for loading postgres data to postgres.
> Im trying this method to load my Oracle data to postgresql tables....
> just trying to migrate my data from oracle to postgresql

dump as CSV, properly implemented CSV isn't daunted by , " or newline in the data.
(but the latter can make the files hard for humans to parse)

else dump as named column inserts,