Functions with COPY

Lists: pgsql-hackers
From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Functions with COPY
Date: 2003-11-27 14:15:20
Message-ID: 20031127141520.GC24094@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Consider the following input data:
1234,24.50,10-Jan-2003,10/1/03,10-01-2003,hiall

The interpretation for the numbers is:
1234 = 12.34, 24.50 = 24.50
The interpretation for the dates is:
January 10th, 2003, October 1st, 2003, October 1st, 2003

I don't believe it's possible, currently, to correctly import this
data with copy. I'm not sure the date fields would even be accepted
as date fields. It'd be nice if this could be made to work. From a
user standpoint consider:

copy blah (
to_number(cost,'99V99'), -- I think that's right?
to_number(cost2,'99.99'),
to_date(install_date,'DD-Mon-YYYY'),
to_date(invoice_date,'MM/DD/YY'),
to_date(bill_date,'MM-DD-YYYY'),
service_type) from stdin;

Perhaps the actual syntax would be different, but the idea is there,
be able to specify a function with arguments to handle the
conversion from the text to the resulting data type in the table.
Right now casting is done but that casting has to be expected to
work for all input formats for each data type cast and that just
isn't reasonable to try and force something to do. Instead, I
believe, the number of input formats accepted has been limited.

Now, that isn't an actual data set I have to deal with, but it's a
good illustration of the problem I've had with some of my data sets,
mainly with the date fields though there is one data set that has the
problem of having an implied decimal point.

Thanks,

Stephen


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functions with COPY
Date: 2003-11-27 14:21:25
Message-ID: 20031127142125.GA9365@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 27, 2003 at 09:15:20 -0500,
Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>
> I don't believe it's possible, currently, to correctly import this
> data with copy. I'm not sure the date fields would even be accepted
> as date fields. It'd be nice if this could be made to work. From a
> user standpoint consider:

You can write a filter program that reads the data and passes it off
to copy. Perl works pretty well for this.


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functions with COPY
Date: 2003-11-27 14:28:05
Message-ID: 20031127142804.GD24094@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Bruno Wolff III (bruno(at)wolff(dot)to) wrote:
> On Thu, Nov 27, 2003 at 09:15:20 -0500,
> Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > I don't believe it's possible, currently, to correctly import this
> > data with copy. I'm not sure the date fields would even be accepted
> > as date fields. It'd be nice if this could be made to work. From a
> > user standpoint consider:
>
> You can write a filter program that reads the data and passes it off
> to copy. Perl works pretty well for this.

I already did, but it's basically a poor duplication of what the
Postgres functions listed already do. Not what I'd consider the best
scenario. Additionally, overall I'd expect it to be less work to have
the conversion from text->data type done once and correctly instead of
run through a filter program to 'clean it up' for Postgres and then also
run through functions in Postgres (casts at least) to convert it.

Stephen


From: Rod Taylor <pg(at)rbt(dot)ca>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functions with COPY
Date: 2003-11-27 14:38:49
Message-ID: 1069943928.17262.92.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2003-11-27 at 09:28, Stephen Frost wrote:
> * Bruno Wolff III (bruno(at)wolff(dot)to) wrote:
> > On Thu, Nov 27, 2003 at 09:15:20 -0500,
> > Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > > I don't believe it's possible, currently, to correctly import this
> > > data with copy. I'm not sure the date fields would even be accepted
> > > as date fields. It'd be nice if this could be made to work. From a
> > > user standpoint consider:
> >
> > You can write a filter program that reads the data and passes it off
> > to copy. Perl works pretty well for this.
>
> I already did, but it's basically a poor duplication of what the
> Postgres functions listed already do. Not what I'd consider the best
> scenario. Additionally, overall I'd expect it to be less work to have
> the conversion from text->data type done once and correctly instead of
> run through a filter program to 'clean it up' for Postgres and then also
> run through functions in Postgres (casts at least) to convert it.

How about COPY into a TEMP TABLE for 10k lines, then do an
insert into real_table .... select .... from temp_table;
which converts the data?

You could of course thread the load so 2 or 3 processes execute the data
import.


From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Rod Taylor <pg(at)rbt(dot)ca>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functions with COPY
Date: 2003-11-27 14:55:30
Message-ID: 1069944930.1640.18.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

There is a patch floating around for informix load/unload

the syntax is load from 'file' insert into ...., and unload to 'file'
select "whatever you like"

Would this solve the problem?

Dave
On Thu, 2003-11-27 at 09:38, Rod Taylor wrote:
> On Thu, 2003-11-27 at 09:28, Stephen Frost wrote:
> > * Bruno Wolff III (bruno(at)wolff(dot)to) wrote:
> > > On Thu, Nov 27, 2003 at 09:15:20 -0500,
> > > Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > > > I don't believe it's possible, currently, to correctly import this
> > > > data with copy. I'm not sure the date fields would even be accepted
> > > > as date fields. It'd be nice if this could be made to work. From a
> > > > user standpoint consider:
> > >
> > > You can write a filter program that reads the data and passes it off
> > > to copy. Perl works pretty well for this.
> >
> > I already did, but it's basically a poor duplication of what the
> > Postgres functions listed already do. Not what I'd consider the best
> > scenario. Additionally, overall I'd expect it to be less work to have
> > the conversion from text->data type done once and correctly instead of
> > run through a filter program to 'clean it up' for Postgres and then also
> > run through functions in Postgres (casts at least) to convert it.
>
> How about COPY into a TEMP TABLE for 10k lines, then do an
> insert into real_table .... select .... from temp_table;
> which converts the data?
>
> You could of course thread the load so 2 or 3 processes execute the data
> import.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
>


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Rod Taylor <pg(at)rbt(dot)ca>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functions with COPY
Date: 2003-11-27 15:19:34
Message-ID: 20031127151934.GE24094@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Rod Taylor (pg(at)rbt(dot)ca) wrote:
> How about COPY into a TEMP TABLE for 10k lines, then do an
> insert into real_table .... select .... from temp_table;
> which converts the data?
>
> You could of course thread the load so 2 or 3 processes execute the data
> import.

Sure, this would work, but it's a heck of alot more work from a
processing standpoint than either of the other options...

Stephen


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Dave Cramer <pg(at)fastcrypt(dot)com>
Cc: Rod Taylor <pg(at)rbt(dot)ca>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functions with COPY
Date: 2003-11-27 15:21:06
Message-ID: 20031127152106.GF24094@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Dave Cramer (pg(at)fastcrypt(dot)com) wrote:
> There is a patch floating around for informix load/unload
>
> the syntax is load from 'file' insert into ...., and unload to 'file'
> select "whatever you like"
>
> Would this solve the problem?

I'm not sure. It depends on what you can do with the '....' after
'insert into'. If it's :
insert into blah (what here?) to_number(blah1,'99V99'),etc

Then I'd think it would work, but I don't know if that's what you're
saying or not.

Stephen


From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Rod Taylor <pg(at)rbt(dot)ca>, Hackers <pgsql-hackers(at)postgresql(dot)org>, Gavin Sherry <swm(at)alcove(dot)com(dot)au>
Subject: Re: Functions with COPY
Date: 2003-11-27 15:28:07
Message-ID: 1069946887.1636.28.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen,

You can do whatever you can do with an insert now, so yes, I think that
is possible. Even if the current patch doesn't do that, it would
certainly be a start.

Dave

On Thu, 2003-11-27 at 10:21, Stephen Frost wrote:
> * Dave Cramer (pg(at)fastcrypt(dot)com) wrote:
> > There is a patch floating around for informix load/unload
> >
> > the syntax is load from 'file' insert into ...., and unload to 'file'
> > select "whatever you like"
> >
> > Would this solve the problem?
>
> I'm not sure. It depends on what you can do with the '....' after
> 'insert into'. If it's :
> insert into blah (what here?) to_number(blah1,'99V99'),etc
>
> Then I'd think it would work, but I don't know if that's what you're
> saying or not.
>
> Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functions with COPY
Date: 2003-11-28 16:55:43
Message-ID: 28681.1070038543@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> Consider the following input data:
> 1234,24.50,10-Jan-2003,10/1/03,10-01-2003,hiall

> The interpretation for the numbers is:
> 1234 =3D 12.34, 24.50 =3D 24.50
> The interpretation for the dates is:
> January 10th, 2003, October 1st, 2003, October 1st, 2003

> I don't believe it's possible, currently, to correctly import this
> data with copy. I'm not sure the date fields would even be accepted
> as date fields.

Nonsense.

regression=# set datestyle to mdy;
SET
regression=# select '10-Jan-2003'::date;
date
------------
2003-01-10
(1 row)

regression=# select '10/1/03'::date;
date
------------
2003-10-01
(1 row)

regression=# select '10-01-2003'::date;
date
------------
2003-10-01
(1 row)

I think you'd have to do some preprocessing on the numeric inputs if you
wanted implied decimal points inserted like that, but the dates look fine.

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functions with COPY
Date: 2003-11-28 17:07:03
Message-ID: 20031128170703.GH24094@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > Consider the following input data:
> > 1234,24.50,10-Jan-2003,10/1/03,10-01-2003,hiall
>
> > The interpretation for the numbers is:
> > 1234 =3D 12.34, 24.50 =3D 24.50
> > The interpretation for the dates is:
> > January 10th, 2003, October 1st, 2003, October 1st, 2003
>
> > I don't believe it's possible, currently, to correctly import this
> > data with copy. I'm not sure the date fields would even be accepted
> > as date fields.
> Nonsense.
[...]
> I think you'd have to do some preprocessing on the numeric inputs if you
> wanted implied decimal points inserted like that, but the dates look fine.

I guess my example was lacking, I'm sure there are cases where the
text->date casting will end up being wrong or some date style won't be
accepted. If the above was 'January 10th, 2003, October 1st, 2003,
January 1st, 2003', for example. Thinking back I think that might have
been the situation I was thinking about (conflicting mdy and dmy) and
would have made more sense as an example.

Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functions with COPY
Date: 2003-11-28 17:15:35
Message-ID: 28844.1070039735@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> I guess my example was lacking, I'm sure there are cases where the
> text->date casting will end up being wrong or some date style won't be
> accepted. If the above was 'January 10th, 2003, October 1st, 2003,
> January 1st, 2003', for example. Thinking back I think that might have
> been the situation I was thinking about (conflicting mdy and dmy) and
> would have made more sense as an example.

Then what are you after, some magically prescient input mode that will
guess the correct interpretation?

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functions with COPY
Date: 2003-11-28 17:30:08
Message-ID: 20031128173008.GI24094@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > I guess my example was lacking, I'm sure there are cases where the
> > text->date casting will end up being wrong or some date style won't be
> > accepted. If the above was 'January 10th, 2003, October 1st, 2003,
> > January 1st, 2003', for example. Thinking back I think that might have
> > been the situation I was thinking about (conflicting mdy and dmy) and
> > would have made more sense as an example.
>
> Then what are you after, some magically prescient input mode that will
> guess the correct interpretation?

No, I'm interested, as I discussed in my message[1], in the ability to
use functions in a copy statement to allow me to specify the conversion
from text to the appropriate data type. Right now Postgres is using
casting which can end up being wrong. That's not a fault or something
that can be fixed, the casting logic itself is fine but it's not always
appropriate to apply the same casting to all fields of a given type.

It would be nice to be able to choose what function is used and to pass
arguments to it. This is a feature request and I'm not suggesting a
change in host the text->date casting is done. From a programmatical
standpoint I see things like this:

Right now:
text -> date : cast(text as date)
text -> numeric : cast(text as numeric)

I want to be able to pop that out and put my own function in place of
it, like so:
text -> date : to_date(text,'YYYY-Mon-DD')
text -> numeric : to_numeric(text,'99V99')

My other message had an example syntax to do this. I don't know if
that'd be the appropriate syntax or not but I thought it illustrated
what I was interested in.

Thanks,

Stephen

[1] http://archives.postgresql.org/pgsql-hackers/2003-11/msg01456.php


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functions with COPY
Date: 2003-11-28 17:47:49
Message-ID: 29060.1070041669@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> No, I'm interested, as I discussed in my message[1], in the ability to
> use functions in a copy statement to allow me to specify the conversion
> from text to the appropriate data type.

COPY is not intended to be that flexible; it's intended to be fast.
You can do any amount of processing you want in an INSERT statement,
though.

INSERT INTO mytable VALUES (mydatefunc('2001/01/03'), ... );

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functions with COPY
Date: 2003-11-28 18:02:56
Message-ID: 20031128180256.GJ24094@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > No, I'm interested, as I discussed in my message[1], in the ability to
> > use functions in a copy statement to allow me to specify the conversion
> > from text to the appropriate data type.
>
> COPY is not intended to be that flexible; it's intended to be fast.

I wouldn't expect much of a speed difference between to_date() and
cast(text as date). Is there some reason I'm not seeing to expect it to
be much slower? My guess was that supporting this wouldn't involve
that much code change either but I'm probably wrong.

> You can do any amount of processing you want in an INSERT statement,
> though.

Certainly, but for bulk loads that requires more pre-processing work for
the user and I believe results in more work for the server too (it
certainly takes longer...).

Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functions with COPY
Date: 2003-11-28 18:37:16
Message-ID: 29399.1070044636@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
>> You can do any amount of processing you want in an INSERT statement,
>> though.

> Certainly, but for bulk loads that requires more pre-processing work for
> the user and I believe results in more work for the server too (it
> certainly takes longer...).

Have you batched multiple INSERTs into a transaction? Also consider
using a prepared statement to eliminate parse/plan overhead.

regards, tom lane