Re: SQL/MED - file_fdw

From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org, hanada(at)metrosystems(dot)co(dot)jp
Subject: Re: SQL/MED - file_fdw
Date: 2011-02-07 06:39:39
Message-ID: AANLkTimG1=en=9f8j=kdiNMAGMWfYGKb+5nYCKhRqSJc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for the detailed review!
Revised patch attached, but APIs for jagged csv is not included in it.

On Sun, Feb 6, 2011 at 09:01, Noah Misch <noah(at)leadboat(dot)com> wrote:
> Most "parse analysis"-type bits of DoCopy() move to BeginCopy().

It would be possible to move more FROM-only or TO-only members in BeginCopy()
into their BeginCopyFrom/To(), but it is just a code refactoring requires
more code rearrangement. It should be done by another try even if needed.

> CopyState.processed is gone, with the row count now bubbling up
> through return values.

I removed it from CopyState because it represents "number of inserted rows"
in COPY FROM. However, for the viewpoint of API, the numbers of lines in
the input file is more suitable. To avoid the confusion, I moved it
from a common member to COPY FROM-only variable.

> For COPY TO, the relkind checks move from DoCopyTo() to BeginCopyTo().  I'm
> skeptical about this one.  It's not required for correctness, and the relkind
> checks for COPY FROM remain in CopyFrom().

I think error checks should be done in the early phase, so I moved the check
to BeginCopyTo(). However, relkind check in COPY FROM is needed only for
COPY FROM command because the relation is the inserted target. For APIs,
the relation is used as a template for input file. So, we cannot perform
the check in BeginCopyFrom().

> file_fdw uses CopyFromErrorCallback() to give errors the proper context.  The
> function uses template strings like "COPY %s, line %d", where %s is the name of
> the relation being copied.  Presumably file_fdw and other features using this
> API would wish to customize that error message prefix, and the relation name
> might not be apropos at all.  How about another argument to BeginCopyFrom,
> specifying an error prefix to be stashed in the CopyState?

I changed "COPY %s, .." to "relation %s, ..." because the first string is
the relation name anyway. We could have another prefix argument, but I think
it has little information for errors.

We also have many "COPY" in other messages, but they won't be used actually
because the are messages for invalid arguments and file_fdw will have own
validater function. All invalid arguments will be filtered in CREATE commands.

> We could easily regret requiring a Relation in BeginCopyFrom(); another user may
> wish to use a fabricated TupleDesc.  Code paths in the new API use the Relation
> for a TupleDesc, relhashoids, column defaults, and error message strings.
> Removing the need for a full Relation is within reach.  That being said, the API
> definitely loses cleanliness if we cater to that possibility now.  It's probably
> best to keep the API as the patch has it, and let the hypothetical future use
> case change it.

Agreed. The current file_fdw has a template relation anyway.

> The header comment for BeginCopyFrom should probably explain the arguments a bit
> more; it assumes familiarity with the COPY implementation.  How about this:
> (snip)
> Comments on specific hunks:

Thanks included.

>> !             /* Reset the per-tuple exprcontext */
>> !             ResetPerTupleExprContext(estate);
>> !
>> !             /* Switch into its memory context */
>> !             MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
>
> Shouldn't a switch to this context happen inside NextCopyFrom(), then again for
> the calls to heap_form_tuple/HeapTupleSetOid?  I found previous discussion on
> this matter, but no conclusion.

In my understanding, NextCopyFrom() should use CurrentMemoryContext provided
by the caller. The file_fdw will use executor's per-tuple context for it.
Another issue in the automatic context switch is to discard the previous
values in the next call while the caller is unaware.

--
Itagaki Takahiro

Attachment Content-Type Size
copy_export-20110207.patch application/octet-stream 48.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2011-02-07 06:51:56 Re: pg_dump directory archive format / parallel pg_dump
Previous Message Noah Misch 2011-02-07 06:00:58 Re: ALTER TYPE 2: skip already-provable no-work rewrites