Re: CSV import

Lists: pgsql-sql
From: Oliver Vecernik <vecernik(at)aon(dot)at>
To: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: CSV import
Date: 2003-01-28 14:46:07
Message-ID: 3E3697AF.2030509@aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Hi again!

After investigating a little bit further my CSV import couldn't work
because of following reasons:

1. CSV files are delimited with CR/LF
2. text fields are surrounded by double quotes

Is there a direct way to import such files into PostgreSQL?

I would like to have something like MySQL provides:

LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt'
[REPLACE | IGNORE]
INTO TABLE tbl_name
[FIELDS
[TERMINATED BY '\t']
[[OPTIONALLY] ENCLOSED BY '']
[ESCAPED BY '\\' ]
]
[LINES TERMINATED BY '\n']
[IGNORE number LINES]
[(col_name,...)]

Has anybody written such a function already?

Regards,
Oliver

--
VECERNIK Datenerfassungssysteme
A-2560 Hernstein, Hofkogelgasse 17
Tel.: +43 2633 47530, Fax: DW 50
http://members.aon.at/vecernik


From: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: Oliver Vecernik <vecernik(at)aon(dot)at>
Cc: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: CSV import
Date: 2003-01-28 19:01:23
Message-ID: Pine.LNX.4.44.0301281700010.15743-100000@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Tue, 28 Jan 2003, Oliver Vecernik wrote:

> Hi again!
>
> After investigating a little bit further my CSV import couldn't work
> because of following reasons:
>
> 1. CSV files are delimited with CR/LF
See below

> 2. text fields are surrounded by double quotes

in vi
:1,$ s/"//g

>
> Is there a direct way to import such files into PostgreSQL?
>
> I would like to have something like MySQL provides:
>
> LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt'
> [REPLACE | IGNORE]
> INTO TABLE tbl_name
> [FIELDS
> [TERMINATED BY '\t']
> [[OPTIONALLY] ENCLOSED BY '']
> [ESCAPED BY '\\' ]
> ]
> [LINES TERMINATED BY '\n']

make it
[LINES TERMINATED BY '\r\n']

> [IGNORE number LINES]
> [(col_name,...)]
>
> Has anybody written such a function already?
>
> Regards,
> Oliver
>
> --
> VECERNIK Datenerfassungssysteme
> A-2560 Hernstein, Hofkogelgasse 17
> Tel.: +43 2633 47530, Fax: DW 50
> http://members.aon.at/vecernik
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel: +30-10-8981112
fax: +30-10-8981877
email: achill(at)matrix(dot)gatewaynet(dot)com
mantzios(at)softlab(dot)ece(dot)ntua(dot)gr


From: Guy Fraser <guy(at)incentre(dot)net>
To: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: CSV import
Date: 2003-01-28 22:13:41
Message-ID: 3E370095.3020200@incentre.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Hi

You will need two text utilities {dos2unix and sed} to do this in the simplest
way. They are fairly standard text utilities and are probably already on your
machine.

This is how I would do it :

sed "s/\"//g" file_name.txt \
| dos2unix \
| pgsql -c "COPY table_name FROM STDIN USING DELIMITERS ',';" db

Where "file_name.txt" is the csv file you want to import and "table_name" is
the previously created table you want to insert the data into and db is the
database name.

How this works is "sed" {stream editor} removes all the double quote
characters '"' then pipes the output through "dos2unix" which converts all the
CRLF {DOS EOL} sequences into CR {UNIX EOL} characters, then pipes the data to
"pgsql" with a command that does a bulk insert into the table of the database
you have selected.

Guy

Oliver Vecernik wrote:
> Hi again!
>
> After investigating a little bit further my CSV import couldn't work
> because of following reasons:
>
> 1. CSV files are delimited with CR/LF
> 2. text fields are surrounded by double quotes
>
> Is there a direct way to import such files into PostgreSQL?
>
> I would like to have something like MySQL provides:
>
> LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt'
> [REPLACE | IGNORE]
> INTO TABLE tbl_name
> [FIELDS
> [TERMINATED BY '\t']
> [[OPTIONALLY] ENCLOSED BY '']
> [ESCAPED BY '\\' ]
> ]
> [LINES TERMINATED BY '\n']
> [IGNORE number LINES]
> [(col_name,...)]
>
> Has anybody written such a function already?
>
> Regards,
> Oliver
>


From: Jeff Eckermann <jeff_eckermann(at)yahoo(dot)com>
To: Oliver Vecernik <vecernik(at)aon(dot)at>, pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: CSV import
Date: 2003-01-28 22:45:33
Message-ID: 20030128224533.71751.qmail@web20809.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

--- Oliver Vecernik <vecernik(at)aon(dot)at> wrote:
> Is there a direct way to import such files into
> PostgreSQL?
>

As I believe others have replied: no, not yet.

If you are absolutely sure that your data will _never_
contain commas, then the simple solution of just
deleting all of the quotes , then using COPY with
comma delimiters, will work. Otherwise, parsing CSV
files gets just too complicated, and you are better
off using an existing solution (like a Perl module) to
preprocess your data.

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com


From: Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To: Guy Fraser <guy(at)incentre(dot)net>
Cc: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: CSV import
Date: 2003-01-29 01:50:33
Message-ID: 3E373369.52C862DE@nsd.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

You can acheive the same result with:

tr -d '"\015' < file_name.txt | psql {etc...}

Unix EOL is LF not CR.

Guy Fraser wrote:
>
> Hi
>
> You will need two text utilities {dos2unix and sed} to do this in the simplest
> way. They are fairly standard text utilities and are probably already on your
> machine.
>
> This is how I would do it :
>
> sed "s/\"//g" file_name.txt \
> | dos2unix \
> | pgsql -c "COPY table_name FROM STDIN USING DELIMITERS ',';" db
>
> Where "file_name.txt" is the csv file you want to import and "table_name" is
> the previously created table you want to insert the data into and db is the
> database name.
>
> How this works is "sed" {stream editor} removes all the double quote
> characters '"' then pipes the output through "dos2unix" which converts all the
> CRLF {DOS EOL} sequences into CR {UNIX EOL} characters, then pipes the data to
> "pgsql" with a command that does a bulk insert into the table of the database
> you have selected.
>
> Guy
>
> Oliver Vecernik wrote:
> > Hi again!
> >
> > After investigating a little bit further my CSV import couldn't work
> > because of following reasons:
> >
> > 1. CSV files are delimited with CR/LF
> > 2. text fields are surrounded by double quotes
> >
> > Is there a direct way to import such files into PostgreSQL?
> >
> > I would like to have something like MySQL provides:
> >
> > LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt'
> > [REPLACE | IGNORE]
> > INTO TABLE tbl_name
> > [FIELDS
> > [TERMINATED BY '\t']
> > [[OPTIONALLY] ENCLOSED BY '']
> > [ESCAPED BY '\\' ]
> > ]
> > [LINES TERMINATED BY '\n']
> > [IGNORE number LINES]
> > [(col_name,...)]
> >
> > Has anybody written such a function already?
> >
> > Regards,
> > Oliver
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html


From: Oliver Vecernik <vecernik(at)aon(dot)at>
To: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: CSV import
Date: 2003-01-29 05:50:42
Message-ID: 3E376BB2.4010302@aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Oliver Vecernik schrieb:

> Hi again!
>
> After investigating a little bit further my CSV import couldn't work
> because of following reasons:
>
> 1. CSV files are delimited with CR/LF
> 2. text fields are surrounded by double quotes
>
> Is there a direct way to import such files into PostgreSQL?

The answer seems to be no. But after googeling a bit a found a wonderful
Python module called csv at:

http://www.object-craft.com.au/projects/csv/

A minimal script called 'csv2tab.py' for conversion to a tab delimited
file could be:

#!/usr/bin/env python

import csv
import sys

def convert(file):
try:
f = open(file, 'r')
lines = f.readlines()
p = csv.parser()
for line in lines:
print '\t'.join(p.parse(line))
except:
print 'Error opening file!'

if __name__ == '__main__':
convert(sys.argv[1]);

Regards,
Oliver

--
VECERNIK Datenerfassungssysteme
A-2560 Hernstein, Hofkogelgasse 17
Tel.: +43 2633 47530, Fax: DW 50
http://members.aon.at/vecernik


From: Gary Stainburn <gary(dot)stainburn(at)ringways(dot)co(dot)uk>
To: Oliver Vecernik <vecernik(at)aon(dot)at>, pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: CSV import
Date: 2003-01-29 10:04:48
Message-ID: 200301291004.48919.gary.stainburn@ringways.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Wednesday 29 January 2003 5:50 am, Oliver Vecernik wrote:
> Oliver Vecernik schrieb:
> > Hi again!
> >
> > After investigating a little bit further my CSV import couldn't work
> > because of following reasons:
> >
> > 1. CSV files are delimited with CR/LF
> > 2. text fields are surrounded by double quotes
> >
> > Is there a direct way to import such files into PostgreSQL?

Here's a simple command that will take

"hello","world","splat","diddle"
"he said "hello world" to ","his mate"
and convert it to the following tab delimited file that can be COPYed using
psql. It even handles quotes inside fields. (^m and ^i are done by typing
CTRL+V CTRL+M and CTRL+V CTRL+I)

hello world splat diddle
he said "hello world" to his mate

sed 's/^"//' <t.txt|sed 's/"^m$//'|sed 's/","/^i/g'>t1.txt

Gary

>
> The answer seems to be no. But after googeling a bit a found a wonderful
> Python module called csv at:
>
> http://www.object-craft.com.au/projects/csv/
>
> A minimal script called 'csv2tab.py' for conversion to a tab delimited
> file could be:
>
> #!/usr/bin/env python
>
> import csv
> import sys
>
> def convert(file):
> try:
> f = open(file, 'r')
> lines = f.readlines()
> p = csv.parser()
> for line in lines:
> print '\t'.join(p.parse(line))
> except:
> print 'Error opening file!'
>
> if __name__ == '__main__':
> convert(sys.argv[1]);
>
> Regards,
> Oliver

--
Gary Stainburn

This email does not contain private or confidential material as it
may be snooped on by interested government parties for unknown
and undisclosed purposes - Regulation of Investigatory Powers Act, 2000


From: "Chad Thompson" <chad(at)weblinkservices(dot)com>
To:
Cc: "pgsql-sql" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: CSV import
Date: 2003-01-30 20:48:36
Message-ID: 013601c2c8a0$f6ca2280$32021aac@chad
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

>
> Unix EOL is LF not CR.
>
>

Is this the only difference between a dos and unix text file?

Thanks
Chad


From: Oliver Vecernik <vecernik(at)aon(dot)at>
To: Chad Thompson <chad(at)weblinkservices(dot)com>
Cc: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: CSV import
Date: 2003-01-30 22:31:30
Message-ID: 3E39A7C2.4050703@aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Chad Thompson schrieb:

>
>
>>Unix EOL is LF not CR.
>>
>>
>>
>>
>
>Is this the only difference between a dos and unix text file?
>
Yes, but to be more precise:
dos: CR + LF
unix: LF
mac: CR

Oliver

--
VECERNIK Datenerfassungssysteme
A-2560 Hernstein, Hofkogelgasse 17
Tel.: +43 2633 47530, Fax: DW 50
http://members.aon.at/vecernik


From: Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To: Chad Thompson <chad(at)weblinkservices(dot)com>
Cc: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: CSV import
Date: 2003-01-31 15:27:47
Message-ID: 3E3A95F3.842C9691@nsd.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

In DOS and Windows, text lines end with <CR><LF>.
In Unix, text lines end with <LF> only.

hex dec oct
<CR>=CTRL-M or 0x0D or 13 or 015
<LF>=CTRL-J or 0x0A or 10 or 012

Chad Thompson wrote:
>
> >
> > Unix EOL is LF not CR.
> >
> >
>
> Is this the only difference between a dos and unix text file?
>
> Thanks
> Chad
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html


From: Guy Fraser <guy(at)incentre(dot)net>
To: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: CSV import
Date: 2003-02-01 01:32:08
Message-ID: 3E3B2398.1010101@incentre.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

FYI

In text files on a Mac. the EOL character is a <CR> only.

What a messy thing this whole EOL cruft is.

To convert between these text formats on linux is easy if you have dos2unix.

The dos2unix on linux can perform many format conversions to and from unix,dos
and mac formats.

On BSD you need dos2unix to convert from dos to unix and unix2dos to convert
from unix to dos. You probably need to get the GNU version of dos2unix or
mac2unix to convert to or from mac formatted text.

Guy

Jean-Luc Lachance wrote:
> In DOS and Windows, text lines end with <CR><LF>.
> In Unix, text lines end with <LF> only.
>
> hex dec oct
> <CR>=CTRL-M or 0x0D or 13 or 015
> <LF>=CTRL-J or 0x0A or 10 or 012
>
>
>
> Chad Thompson wrote:
>
>>>Unix EOL is LF not CR.
>>>
>>>
>>
>>Is this the only difference between a dos and unix text file?
>>
>>Thanks
>>Chad
---%<...snip...