Re: SQL/MED - file_fdw

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <itagaki(dot)takahiro(at)gmail(dot)com>,<hanada(at)metrosystems(dot)co(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SQL/MED - file_fdw
Date: 2011-01-18 00:33:08
Message-ID: 4D348B6502000025000396E0@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Itagaki Takahiro wrote:
> Shigeru HANADA wrote:

>> Attached patch would avoid this leak by adding per-copy context to
>> CopyState. This would be overkill, and ResetCopyFrom() might be
>> reasonable though.
>
> Good catch. I merged your fix into the attached patch.

Review for CF:

While there is a post which suggests applying this patch in the
middle of a string of fdw patches, it stands alone without depending
on any of the other patches. I chose to focus on testing it alone,
to isolate just issues with this particular patch. In addition to
the base patch, there was a patch-on-patch which was applied to
change signature of NextCopyFrom to take fewer params and return
HeapTuple.

This patch is in context diff format, applies cleanly, compiles
without warning, passes all tests in `make check` and `make
installcheck-world`. From that and the testing I did, I think that
this patch could be committed before any of the others, if desired.

A few minor comments on format, though -- some parts of the patch
came out much more readable (for me, at least) when I regenerated the
diff using the git diff --patience switch. For example, see the end
of the CopyStateData structure definition each way. Also, whitespace
has little differences from what pgindent uses. Most are pretty
minor except for a section where the indentation wasn't changed based
on a change in depth of braces, to keep the patch size down.

It appears that the point of the patch is to provide a better API for
accessing the implementation of COPY, to support other patches. This
whole FDW effort is not an area of expertise for me, but the API
looks reasonable to me, with a somewhat parallel structure to some of
the other APIs used by the executor. From reading the FDW posts, it
appears that other patches are successfully using this new API, and
the reviewers of the other patches seem happy enough with it, which
would tend to indicate that it is on-target. Other hackers seem to
want it and we didn't already have it. From my reading of the posts
the idea of creating an API at this level was agreed upon by the
community.

The only two files touched by this patch are copy.h and copy.c. The
copy.h changes consisted entirely of new function prototypes and one
declaration of a pointer type (CopyState) to a struct defined and
used only inside copy.c (CopyStateData). That pointer is the return
value from BeginCopyFrom and required as a parameter to other new
functions. So the old API is still present exactly as it was, with
additional functions added.

I tried to read through the code to look for problems, but there are
so many structures and techniques involved that I haven't had to deal
with yet, that it would take me days to get up to speed enough to
desk check this adequately. Since this API is a prerequisite for
other patches, and already being used by them, I figured I should do
what I could quickly and then worry about how to cover that.

Since it doesn't appear to be intended to change any user-visible
behavior, I don't see any need for docs or changes to the regression
tests. I didn't see any portability problems. It seemed a little
light on comments to me, but perhaps that's because of my ignorance
of some of the techniques being used -- maybe things are obvious
enough to anyone who would be mucking about in copy.c.

I spent a few hours doing ad hoc tests of various sorts to try to
break things, without success. Among those tests were checks for
correct behavior on temporary tables and read only transactions --
separately and in combination. I ran millions of COPY statements
inside of a single database transaction (I tried both FROM and TO)
without seeing memory usage increase by more than a few kB. No
assertion failures. No segfaults. Nothing unusual in the log. I
plan to redo these tests with the full fdw patch set unless someone
has already covered that ground.

So far everything I've done has been with asserts enabled, so I
haven't tried to get serious benchmarks, but it seems fast. I will
rebuild without asserts and do performance tests before I change the
status on the CF page.

I'm wondering if it would make more sense to do the benchmarking with
just this patch or the full fdw patch set? Both?

-Kevin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2011-01-18 00:36:59 Re: estimating # of distinct values
Previous Message Greg Smith 2011-01-18 00:33:07 Re: Spread checkpoint sync