Re: Newbie question about importing text files...

From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
Cc: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Newbie question about importing text files...
Date: 2006-10-11 20:29:58
Message-ID: 1160598598.6181.50.camel@state.g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 2006-10-10 at 04:16, Ron Johnson wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 10/09/06 22:43, Jonathan Greenberg wrote:
> > So I've been looking at the documentation for COPY, and I'm curious about a
> > number of features which do not appear to be included, and whether these
> > functions are found someplace else:
> >
> > 1) How do I skip an arbitrary # of "header" lines (e.g. > 1 header line) to
> > begin reading in data?

Using something like bash, you can do this:

tail -n $(( `wc -l bookability-pg.sql|grep -oP "[0-9]+"` -2 ))
bookability-pg.sql|wc -l

make it an alias and call it skip and have it take an argument:

Put this in .bashrc and run the .bashrc file ( . ~/.bashrc ):

skipper(){
tail -n $(( `wc -l $1|grep -oP "[0-9]+"` -$2 )) $1
}

> > 2) Is it possible to screen out lines which begin with a comment character
> > (common outputs for csv/txt files from various programs)?

grep -vP "^#" filename

will remove all lines that start with #. grep is your friend in unix.
If you don't have unix, get cygwin as recommended elsewhere.

> > 3) Is there a way to read in fixed width files?

If you don't mind playing about with sed, you could use it and bash
scripting to do it. I have before. It's ugly looking but easy enough
to do. But I'd recommend a beginner use a scripting language they like,
one of the ones that starts with p is usually a good choice (perl,
python, php, ruby (wait, that's not a p!) etc...)

>
> Both Python & Perl have CSV parsing modules, and can of course deal
> with fixed-width data, let you skip comments, commit every N rows,
> skip over committed records in can the load crashes, etc, etc, etc.

php has a fgetcsv() built in as well. It breaks down csv into an array
and is really easy to work with.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jonathan Vanasco 2006-10-11 20:53:53 question on renaming a foreign key
Previous Message Tom Lane 2006-10-11 20:29:26 Re: invalid data in PID file