Re: WIP: Collecting statistics on CSV file data

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: Collecting statistics on CSV file data
Date: 2011-12-09 12:16:56
Message-ID: 4EE1FC38.5080707@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hanada-san,

I updated the patch. Please find attached a patch.

Best regards,
Etsuro Fujita

> (2011/11/18 21:00), Shigeru Hanada wrote:
>> (2011/11/18 16:25), Etsuro Fujita wrote:
>>> Thank you for your testing. I updated the patch according to your
>>> comments. Attached is the updated version of the patch.
>>
>> I'd like to share result of my review even though it's not fully
>> finished. So far I looked from viewpoint of API design, code
>> formatting, and documentation. I'll examine effectiveness of the patch
>> and details of implementation next week, and hopefully try writing
>> ANALYZE handler for pgsql_fdw :)
>>
>> New patch has correct format, and it could be applied to HEAD of master
>> branch cleanly. All regression tests including those of contrib modules
>> have passed. It contains changes of codes and regression tests related
>> to the issue, and they have enough comments. IMO the document in this
>> patch is not enough to show how to write analyze handler to FDW authors,
>> but it can be enhanced afterward. With this patch, FDW author can
>> provide optional ANALYZE handler which updates statistics of foreign
>> tables. Planner would be able to generate better plan by using statistics.
>>
>>> Yes. But in the updated version, I've refactored analyze.c a little bit
>>> to allow FDW authors to simply call do_analyze_rel().
>> <snip>
>>> The updated version enables FDW authors to just write their own
>>> acquire_sample_rows(). On the other hand, by retaining to hook
>>> AnalyzeForeignTable routine at analyze_rel(), higher level than
>>> acquire_sample_rows() as before, it allows FDW authors to write
>>> AnalyzeForeignTable routine for foreign tables on a remote server to ask
>>> the server for its current stats instead, as pointed out earlier by Tom
>>> Lane.
>>
>> IIUC, this patch offers three options to FDWs: a) set
>> AnalyzeForeignTable to NULL to indicate lack of capability, b) provide
>> AnalyzeForeignTable which calls do_analyze_rel with custom
>> sample_row_acquirer, and c) create statistics data from scratch by FDW
>> itself by doing similar things to do_analyze_rel with given argument or
>> copying statistics from remote PostgreSQL server.
>>
>> ISTM that this design is well-balanced between simplicity and
>> flexibility. Maybe these three options would suit web-based wrappers,
>> file-based or RDBMS wrappers, and pgsql_fdw respectively. I think that
>> adding more details of FdwRoutine, such as purpose of new callback
>> function and difference from required ones, would help FDW authors,
>> including me :)
>>
>> I have some random comments:
>>
>> - I think separated typedef of sample_acquire_rows would make codes more
>> readable. In addition, parameters of the function should be described
>> explicitly.
>> - I couldn't see the reason why file_fdw sets ctid of sample tuples,
>> though I guess it's for Vitter's random sampling algorithm. If every
>> FDW must set valid ctid to sample tuples, it should be mentioned in
>> document of AnalyzeForeignTable. Exporting some functions from
>> analyze.c relates this issue?
>> - Why file_fdw skips sample tuples which have NULL value? AFAIS
>> original acquire_sample_rows doesn't do so.
>> - Some comment lines go past 80 columns.
>> - Some headers included in file_fdw.c seems unnecessary.
>>
>> Regards,
>
>

Attachment Content-Type Size
postgresql-analyze-v5.patch text/plain 38.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2011-12-09 12:31:26 Re: Bug in walsender when calling out to do_pg_stop_backup (and others?)
Previous Message Albe Laurenz 2011-12-09 11:27:36 Re: review: CHECK FUNCTION statement