Re: SQL/MED - file_fdw

From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SQL/MED - file_fdw
Date: 2011-02-07 12:00:53
Message-ID: AANLkTinvkGqf9RHjw0AKDZ8_LUe3mKTH_BVkgJJ8QxK3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 7, 2011 at 16:01, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp> wrote:
> This patch is based on latest FDW API patches which are posted in
> another thread "SQL/MED FDW API", and copy_export-20110104.patch which
> was posted by Itagaki-san.

I have questions about estimate_costs().

* What value does baserel->tuples have?
Foreign tables are never analyzed for now. Is the number correct?

* Your previous measurement showed it has much more startup_cost.
When you removed ReScan, it took long time but planner didn't choose
materialized plans. It might come from lower startup costs.

* Why do you use lstat() in it?
Even if the file is a symlink, we will read the linked file in the
succeeding copy. So, I think it should be stat() rather than lstat().

+estimate_costs(const char *filename, RelOptInfo *baserel,
+ double *startup_cost, double *total_cost)
+{
...
+ /* get size of the file */
+ if (lstat(filename, &stat) == -1)
+ {
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not stat file \"%s\": %m", filename)));
+ }
+
+ /*
+ * The way to estimate costs is almost same as cost_seqscan(), but there
+ * are some differences:
+ * - DISK costs are estimated from file size.
+ * - CPU costs are 10x of seq scan, for overhead of parsing records.
+ */
+ pages = stat.st_size / BLCKSZ + (stat.st_size % BLCKSZ > 0 ? 1 : 0);
+ run_cost += seq_page_cost * pages;
+
+ *startup_cost += baserel->baserestrictcost.startup;
+ cpu_per_tuple = cpu_tuple_cost + baserel->baserestrictcost.per_tuple;
+ run_cost += cpu_per_tuple * 10 * baserel->tuples;
+ *total_cost = *startup_cost + run_cost;
+
+ return stat.st_size;
+}

--
Itagaki Takahiro

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-02-07 12:16:37 Re: SQL/MED - file_fdw
Previous Message Thom Brown 2011-02-07 11:38:09 Re: [HACKERS] Issues with generate_series using integer boundaries