Re: WIP: Collecting statistics on CSV file data

From: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: Collecting statistics on CSV file data
Date: 2011-12-12 10:33:06
Message-ID: 4EE5D862.1080809@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2011/12/09 21:16), Etsuro Fujita wrote:
> I updated the patch. Please find attached a patch.

I've examined v5 patch, and got reasonable EXPLAIN results which reflect
collected statistics! As increasing STATISTICS option, estimated rows
become better. Please see attached stats_*.txt for what I
tested.

stats_none.txt : before ANALYZE
stats_100.txt : SET STATISTICS = 100 for all columns, and ANALYZE
stats_10000.txt : SET STATISTICS = 10000 for all columns, and ANALYZE

I think that this patch become ready for committer after some
minor corrections:

Couldn't set n_distinct
=======================
I couldn't set n_distinct to columns of foreign tables. With some
research, I noticed that ATSimplePermissions should accept
ATT_FOREIGN_TABLE too for that case. In addition, regression tests for
ALTER FOREIGN TABLE should be added to detect this kind of problem.

Showing stats target
====================
We can see stats target of ordinary tables with \d+, but it is not
available for foreign tables. I think "Stats target" column should be
added even though output of \d+ for foreign tables become wider. One
possible idea is to remove useless "Storage" column instead, but views
have that column even though their source could come from foreign tables.

Please see attached patch for these two items.

Comments of FdwRoutine
======================
Mention about optional handler is obsolete. We should clearly say
AnalyzeForeignTable is optional (can be set to NULL) and rest are
required. IMO separating them with comment would help FDW authors to
understand requirements, e.g.:

typedef struct FdwRoutine
{
NodeTag type;

/*
* These Handlers are required to execute simple scan on a foreign
* table. If any of them was set to NULL, scan on a foreign table
* managed by such FDW would fail.
*/
PlanForeignScan_function PlanForeignScan;
ExplainForeignScan_function ExplainForeignScan;
BeginForeignScan_function BeginForeignScan;
IterateForeignScan_function IterateForeignScan;
ReScanForeignScan_function ReScanForeignScan;
EndForeignScan_function EndForeignScan;

/*
* Handlers below are optional. You can set any of them to
* NULL to tell PostgreSQL that the FDW doesn't have the
* capability.
*/
AnalyzeForeignTable_function AnalyzeForeignTable;
} FdwRoutine;

Code formatting
===============
Some code lines go past 80 columns.

Message style
=============
The terms 'cannot support option "n_distinct"...' used in
ATPrepSetOptions seems little unusual in PostgreSQL. Should we say
'cannot set "n_distinct_inherited" for foreign tables' for that case?

Typo
====
Typo "spcify" is in document of analyze.

Regards,
--
Shigeru Hanada

Attachment Content-Type Size
stats_none.txt text/plain 2.7 KB
stats_100.txt text/plain 2.6 KB
stats_100000.txt text/plain 2.6 KB
show_stats_target.patch text/plain 7.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Cave-Ayland 2011-12-12 11:43:39 Re: [PATCH] PostgreSQL fails to build with 32bit MinGW-w64
Previous Message Pavel Stehule 2011-12-12 04:59:40 Re: Arithmetic operators for macaddr type