Re: WIP: Collecting statistics on CSV file data

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: Collecting statistics on CSV file data
Date: 2011-12-13 13:00:15
Message-ID: 4EE74C5F.3000309@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2011/12/12 19:33), Shigeru Hanada wrote:
> (2011/12/09 21:16), Etsuro Fujita wrote:
>> I updated the patch. Please find attached a patch.
>
> I've examined v5 patch, and got reasonable EXPLAIN results which reflect
> collected statistics! As increasing STATISTICS option, estimated rows
> become better. Please see attached stats_*.txt for what I
> tested.
>
> stats_none.txt : before ANALYZE
> stats_100.txt : SET STATISTICS = 100 for all columns, and ANALYZE
> stats_10000.txt : SET STATISTICS = 10000 for all columns, and ANALYZE
>
> I think that this patch become ready for committer after some
> minor corrections:
>
> Couldn't set n_distinct
> =======================
> I couldn't set n_distinct to columns of foreign tables. With some
> research, I noticed that ATSimplePermissions should accept
> ATT_FOREIGN_TABLE too for that case. In addition, regression tests for
> ALTER FOREIGN TABLE should be added to detect this kind of problem.
>
> Showing stats target
> ====================
> We can see stats target of ordinary tables with \d+, but it is not
> available for foreign tables. I think "Stats target" column should be
> added even though output of \d+ for foreign tables become wider. One
> possible idea is to remove useless "Storage" column instead, but views
> have that column even though their source could come from foreign tables.
>
> Please see attached patch for these two items.
>
> Comments of FdwRoutine
> ======================
> Mention about optional handler is obsolete. We should clearly say
> AnalyzeForeignTable is optional (can be set to NULL) and rest are
> required. IMO separating them with comment would help FDW authors to
> understand requirements, e.g.:
>
> typedef struct FdwRoutine
> {
> NodeTag type;
>
> /*
> * These Handlers are required to execute simple scan on a foreign
> * table. If any of them was set to NULL, scan on a foreign table
> * managed by such FDW would fail.
> */
> PlanForeignScan_function PlanForeignScan;
> ExplainForeignScan_function ExplainForeignScan;
> BeginForeignScan_function BeginForeignScan;
> IterateForeignScan_function IterateForeignScan;
> ReScanForeignScan_function ReScanForeignScan;
> EndForeignScan_function EndForeignScan;
>
> /*
> * Handlers below are optional. You can set any of them to
> * NULL to tell PostgreSQL that the FDW doesn't have the
> * capability.
> */
> AnalyzeForeignTable_function AnalyzeForeignTable;
> } FdwRoutine;
>
> Code formatting
> ===============
> Some code lines go past 80 columns.
>
> Message style
> =============
> The terms 'cannot support option "n_distinct"...' used in
> ATPrepSetOptions seems little unusual in PostgreSQL. Should we say
> 'cannot set "n_distinct_inherited" for foreign tables' for that case?
>
> Typo
> ====
> Typo "spcify" is in document of analyze.

Thank you for your effectiveness experiments and proposals for
improvements. I updated the patch according to your proposals.
Attached is the updated version of the patch.

Best regards,
Etsuro Fujita

> Regards,
>
>
>
>

Attachment Content-Type Size
postgresql-analyze-v6.patch text/plain 46.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2011-12-13 13:25:03 Re: Command Triggers
Previous Message Pavel Stehule 2011-12-13 12:59:51 Re: review: CHECK FUNCTION statement