Re: WIP Patch: Selective binary conversion of CSV file foreign tables

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP Patch: Selective binary conversion of CSV file foreign tables
Date: 2012-06-25 12:49:19
Message-ID: CADyhKSVXDCSmSi+9szL0QSmWc1YUTsAgFa9=WcRe8OFZ0-N1OQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fujita-san,

The revised patch is almost good for me.
Only point I noticed is the check for redundant or duplicated options
I pointed out on the previous post.
So, if you agree with the attached patch, I'd like to hand over this
patch for committers.

All I changed is:
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -122,6 +122,11 @@ typedef struct CopyStateData
@@ -217,7 +221,7 @@ index 98bcb2f..0143d81 100644
}
+ else if (strcmp(defel->defname, "convert_binary") == 0)
+ {
-+ if (cstate->convert_binary)
++ if (cstate->convert_selectively)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));

Thanks,

2012/6/20 Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>:
> Hi KaiGai-san,
>
> Thank you for the review.
>
>> -----Original Message-----
>> From: pgsql-hackers-owner(at)postgresql(dot)org
>> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kohei KaiGai
>> Sent: Wednesday, June 20, 2012 1:26 AM
>> To: Etsuro Fujita
>> Cc: Robert Haas; pgsql-hackers(at)postgresql(dot)org
>> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
>> foreign tables
>>
>> Hi Fujita-san,
>>
>> Could you rebase this patch towards the latest tree?
>> It was unavailable to apply the patch cleanly.
>
> Sorry, I updated the patch.  Please find attached an updated version of the
> patch.
>
>> I looked over the patch, then noticed a few points.
>>
>> At ProcessCopyOptions(), defel->arg can be NIL, isn't it?
>> If so, cstate->convert_binary is not a suitable flag to check redundant
>> option. It seems to me cstate->convert_selectively is more suitable flag
>> to check it.
>>
>> +       else if (strcmp(defel->defname, "convert_binary") == 0)
>> +       {
>> +           if (cstate->convert_binary)
>> +               ereport(ERROR,
>> +                       (errcode(ERRCODE_SYNTAX_ERROR),
>> +                        errmsg("conflicting or redundant options")));
>> +           cstate->convert_selectively = true;
>> +           if (defel->arg == NULL || IsA(defel->arg, List))
>> +               cstate->convert_binary = (List *) defel->arg;
>> +           else
>> +               ereport(ERROR,
>> +                       (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +                        errmsg("argument to option \"%s\" must be a
>> list of column names",
>> +                               defel->defname)));
>> +       }
>
> Yes, defel->arg can be NIL.  defel->arg is a List structure listing all the
> columns needed to be converted to binary representation, which is NIL for
> the case where no columns are needed to be converted.  For example,
> defel->arg is NIL for SELECT COUNT(*).  In this case, while
> cstate->convert_selectively is set to true, no columns are converted at
> NextCopyFrom().  Most efficient case!  In short, cstate->convert_selectively
> represents whether to do selective binary conversion at NextCopyFrom(), and
> cstate->convert_binary represents all the columns to be converted at
> NextCopyFrom(), that can be NIL.
>
>> At NextCopyFrom(), this routine computes default values if configured.
>> In case when these values are not referenced, it might be possible to skip
>> unnecessary calculations.
>> Is it unavailable to add logic to avoid to construct cstate->defmap on
>> unreferenced columns at  BeginCopyFrom()?
>
> I think that we don't need to add the above logic because file_fdw does
> BeginCopyFrom() with attnamelist = NIL, in which case, BeginCopyFrom()
> doesn't construct cstate->defmap at all.
>
> I fixed a bug plus some minor optimization in check_binary_conversion() that
> is renamed to check_selective_binary_conversion() in the updated version,
> and now file_fdw gives up selective binary conversion for the following
> cases:
>
>  a) BINARY format
>  b) CSV/TEXT format and whole row reference
>  c) CSV/TEXT format and all the user attributes needed
>
>
> Best regards,
> Etsuro Fujita
>
>> Thanks,
>>
>> 2012/5/11 Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>:
>> >> -----Original Message-----
>> >> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
>> >> Sent: Friday, May 11, 2012 1:36 AM
>> >> To: Etsuro Fujita
>> >> Cc: pgsql-hackers(at)postgresql(dot)org
>> >> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV
>> >> file foreign tables
>> >>
>> >> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
>> > <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
>> >> wrote:
>> >> > I would like to propose to improve parsing efficiency of
>> >> > contrib/file_fdw by selective parsing proposed by Alagiannis et
>> >> > al.[1], which means that for a CSV/TEXT file foreign table,
>> >> > file_fdw performs binary conversion only for the columns needed for
>> >> > query processing.  Attached is a WIP patch implementing the feature.
>> >>
>> >> Can you add this to the next CommitFest?  Looks interesting.
>> >
>> > Done.
>> >
>> > Best regards,
>> > Etsuro Fujita
>> >
>> >> --
>> >> Robert Haas
>> >> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
>> >> Company
>> >>
>> >
>> >
>> >
>> > --
>> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org) To
>> > make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
>>
>> --
>> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org) To make
>> changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
>
>
>

--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

Attachment Content-Type Size
file_fdw_sel_bin_conv_v3.patch application/octet-stream 9.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2012-06-25 13:12:46 Re: libpq compression
Previous Message Magnus Hagander 2012-06-25 12:26:16 Re: libpq compression