Re: textarray option for file FDW

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: textarray option for file FDW
Date: 2011-01-15 17:29:24
Message-ID: 4D31D974.2060609@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I've been waiting for the latest FDW patches as patiently as I can, and
I've been reviewing them this morning, in particular the file_fdw patch
and how it interacts with the newly exposed COPY API. Overall it seems
to be a whole lot cleaner, and the wholesale duplication of the copy
code is gone, so it's much nicer and cleaner. So now I'd like to add a
new option to it: "textarray". This option would require that the
foreign table have exactly one field, of type text[], and would compose
all the field strings read from the file for each record into the array
(however many there are). This would require a few changes to
contrib/file_fdw/file_fdw.c and a few changes to
src/backend/commands/copy.c, which I can probably have done in fairly
short order, Deo Volente. This will allow something like:

CREATE FOREIGN TABLE arr_text (
t text[]
) SERVER file_server
OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray
'true');
SELECT t[3]::int as a, t[1]::timestamptz as b, t[99] as not_there
FROM arr_text;

Thoughts?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: textarray option for file FDW
Date: 2011-01-15 17:44:29
Message-ID: 396.1295113469@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> ... So now I'd like to add a
> new option to it: "textarray". This option would require that the
> foreign table have exactly one field, of type text[], and would compose
> all the field strings read from the file for each record into the array
> (however many there are).

Why is this a good thing? It seems like it would accomplish little
except to defeat the SQL type system entirely.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: textarray option for file FDW
Date: 2011-01-15 18:07:59
Message-ID: 4D31E27F.3030600@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/15/2011 12:44 PM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> ... So now I'd like to add a
>> new option to it: "textarray". This option would require that the
>> foreign table have exactly one field, of type text[], and would compose
>> all the field strings read from the file for each record into the array
>> (however many there are).
> Why is this a good thing? It seems like it would accomplish little
> except to defeat the SQL type system entirely.
>
>

We have discussed previously allowing this sort of thing. It's not a new
proposal at all.

My use case is that I have a customer who reads in data like this (using
my patch to allow ragged CSV input, which you previously objected to)
and then performs a sophisticated battery of validity tests on the data
before loading it into its final destination. To do that their
requirement is that we not error out on reading the data, so we load the
data into a table that is all text fields.

In fact, having COPY read in a text array is *your* suggestion:
<http://archives.postgresql.org/pgsql-hackers/2009-09/msg00547.php>.
This is simply a proposal to implement that via FDW, which makes it easy
to avoid any syntax issues.

cheers

andrew


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: textarray option for file FDW
Date: 2011-01-15 18:24:02
Message-ID: AANLkTikwoCwx6n847Gf00SGgMGTgUVhERHa4oJpdOucA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 16, 2011 at 02:29, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> "textarray". This option would require that the foreign table have exactly
> one field, of type text[], and would compose all the field strings read from
> the file for each record into the array (however many there are).
>
>   CREATE FOREIGN TABLE arr_text (
>        t text[]
>   )  SERVER file_server
>   OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray 'true');

FOREIGN TABLE has stable definitions, so there are some issues
that doesn't exist in COPY command:
- when the type of the last column is not a text[]
- when the last column is changed by ALTER FOREIGN TABLE ADD/DROP COLUMN

BTW, those options could be specified in column options rather than table
options. But we don't have column option syntax in the current proposal.

  CREATE FOREIGN TABLE arr_text (
       i integer OPTION (column 1), -- column position in csv file
       t text[] OPTION (column 'all the rest'),
       d date OPTION (column 2)
  )  SERVER file_server
  OPTIONS (format 'csv', filename '/path/to/ragged.csv');

--
Itagaki Takahiro


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: textarray option for file FDW
Date: 2011-01-15 18:38:27
Message-ID: 4D31E9A3.4010900@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/15/2011 01:24 PM, Itagaki Takahiro wrote:
> On Sun, Jan 16, 2011 at 02:29, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> "textarray". This option would require that the foreign table have exactly
>> one field, of type text[], and would compose all the field strings read from
>> the file for each record into the array (however many there are).
>>
>> CREATE FOREIGN TABLE arr_text (
>> t text[]
>> ) SERVER file_server
>> OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray 'true');
> FOREIGN TABLE has stable definitions, so there are some issues
> that doesn't exist in COPY command:
> - when the type of the last column is not a text[]
> - when the last column is changed by ALTER FOREIGN TABLE ADD/DROP COLUMN

I intended that this would be checked for validity at runtime, in
BeginCopy(). If the table doesn't match the requirement an error would
be raised. I don't think it's a big problem.

> BTW, those options could be specified in column options rather than table
> options. But we don't have column option syntax in the current proposal.
>
> CREATE FOREIGN TABLE arr_text (
> i integer OPTION (column 1), -- column position in csv file
> t text[] OPTION (column 'all the rest'),
> d date OPTION (column 2)
> ) SERVER file_server
> OPTIONS (format 'csv', filename '/path/to/ragged.csv');

Well, ok, but is there any proposal on the table to do that?

cheers

andrew


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: textarray option for file FDW
Date: 2011-01-15 19:59:37
Message-ID: m2hbdaey52.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Why is this a good thing? It seems like it would accomplish little
> except to defeat the SQL type system entirely.

It simply allows to have the classical ETL problem solved directly in
the database server, in SQL. That's huge.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: textarray option for file FDW
Date: 2011-01-16 00:41:13
Message-ID: 4D323EA9.4030305@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/15/2011 12:29 PM, Andrew Dunstan wrote:
>
> I've been waiting for the latest FDW patches as patiently as I can,
> and I've been reviewing them this morning, in particular the file_fdw
> patch and how it interacts with the newly exposed COPY API. Overall it
> seems to be a whole lot cleaner, and the wholesale duplication of the
> copy code is gone, so it's much nicer and cleaner. So now I'd like to
> add a new option to it: "textarray". This option would require that
> the foreign table have exactly one field, of type text[], and would
> compose all the field strings read from the file for each record into
> the array (however many there are). This would require a few changes
> to contrib/file_fdw/file_fdw.c and a few changes to
> src/backend/commands/copy.c, which I can probably have done in fairly
> short order, Deo Volente. This will allow something like:
>
> CREATE FOREIGN TABLE arr_text (
> t text[]
> ) SERVER file_server
> OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray
> 'true');
> SELECT t[3]::int as a, t[1]::timestamptz as b, t[99] as not_there
> FROM arr_text;
>
>

A WIP patch is attached. It's against Shigeru Hanada's latest FDW
patches. It's surprisingly tiny. Right now it probably leaks memory like
a sieve, and that's the next thing I'm going to chase down.

cheers

andrew

Attachment Content-Type Size
fdw-textarray.patch1 text/plain 5.8 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: textarray option for file FDW
Date: 2011-01-16 17:45:02
Message-ID: 4D332E9E.3080405@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/15/2011 07:41 PM, Andrew Dunstan wrote:
>
>
> On 01/15/2011 12:29 PM, Andrew Dunstan wrote:
>>
>> I've been waiting for the latest FDW patches as patiently as I can,
>> and I've been reviewing them this morning, in particular the file_fdw
>> patch and how it interacts with the newly exposed COPY API. Overall
>> it seems to be a whole lot cleaner, and the wholesale duplication of
>> the copy code is gone, so it's much nicer and cleaner. So now I'd
>> like to add a new option to it: "textarray". This option would
>> require that the foreign table have exactly one field, of type
>> text[], and would compose all the field strings read from the file
>> for each record into the array (however many there are). This would
>> require a few changes to contrib/file_fdw/file_fdw.c and a few
>> changes to src/backend/commands/copy.c, which I can probably have
>> done in fairly short order, Deo Volente. This will allow something like:
>>
>> CREATE FOREIGN TABLE arr_text (
>> t text[]
>> ) SERVER file_server
>> OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray
>> 'true');
>> SELECT t[3]::int as a, t[1]::timestamptz as b, t[99] as not_there
>> FROM arr_text;
>>
>>
>
> A WIP patch is attached. It's against Shigeru Hanada's latest FDW
> patches. It's surprisingly tiny. Right now it probably leaks memory
> like a sieve, and that's the next thing I'm going to chase down.
>
>

Updated patch attached, that should use memory better.

cheers

andrew

Attachment Content-Type Size
fdw-textarray.patch2 text/plain 9.7 KB