Re: [PATCH]Tablesample Submission

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Qi Huang <huangqiyx(at)outlook(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH]Tablesample Submission
Date: 2012-09-18 09:32:50
Message-ID: CAP7QgmkMhjoUi3_RetR7nH4ZOm=B_M7BWdAkv5cf-rekYU5H0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 21, 2012 at 8:08 AM, Qi Huang <huangqiyx(at)outlook(dot)com> wrote:
>> Please add your patch here:
>>
>> https://commitfest.postgresql.org/action/commitfest_view/open
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>
> Hi, Robert
> I added it under "Miscellaneous".
> https://commitfest.postgresql.org/action/patch_view?id=918
>
>

Patch does not apply cleanly against latest master. outfuncs.c,
allpath.c and cost.h have rejected parts. The make check failed in a
lot of cases up to 26 out of 133. I didn't look into each issue but I
suggest rebasing on the latest master and making sure the regression
test passes.

Some of the patch don't follow our coding standard. Please adjust
brace positions, for example. For the header include list and
Makefile, place a new files in alphabetical order.

The patch doesn't include any documentation. Consider add some doc
patch for such a big feature like this.

You should update kwlist.h for REPEATABLE but I'm not sure if
REPEATABLE should become a reserved keyword yet.

I don't see why you created T_TableSampleInfo. TableSampleInfo looks
fine with a simple struct rather than a Node.

If we want to disable a cursor over a sampling table, we should check
it in the parser not the planner. In the wiki page, one of the TODOs
says about cursor support, but how much difficult is it? How does the
standard say?

s/skiped/skipped/

Don't we need to reset seed on ExecReScanSampleScan? Should we add a
new executor node SampleScan? It seems everything about random
sampling is happening under heapam.

It looks like BERNOULLI allocates heap tuple array beforehand, and
copy all the tuples into it. This doesn't look acceptable when you
are dealing with a large number of rows in the table.

As wiki says, BERNOULLI relies on the statistics of the table, which
doesn't sound good to me. Of course we could say this is our
restriction and say good-bye to users who hadn't run ANALYZE first,
but it is too hard for a normal users to use it. We may need
quick-and-rough count(*) for this.

That is pretty much I have so far. I haven't read all the code nor
the standard, so I might be wrong somewhere.

Thanks,
--
Hitoshi Harada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2012-09-18 12:32:43 Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Previous Message Etsuro Fujita 2012-09-18 08:52:09 Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY