Tuple count used while costing MergeAppend and that for an append rel

Lists: pgsql-hackers
From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Tuple count used while costing MergeAppend and that for an append rel
Date: 2016-11-18 12:20:21
Message-ID: CAFjFpRek+cLCnTo24youuGtsq4zRphEB8EUUPjDxZjnL4n4HYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,
In create_merge_append_path() we have following code
1331
1332 /* Now we can compute total costs of the MergeAppend */
1333 cost_merge_append(&pathnode->path, root,
1334 pathkeys, list_length(subpaths),
1335 input_startup_cost, input_total_cost,
1336 rel->tuples);
1337

The last arguemnt to cost_merge_append() is described as
'tuples' is the number of tuples in all the streams

For an append relation, set_append_rel_size() sets rel->tuples to the
sum of rows output by each child i.e. sum of rel->rows from each
child.
1091 rel->rows = parent_rows;
1092 rel->reltarget->width = rint(parent_size / parent_rows);
1093 for (i = 0; i < nattrs; i++)
1094 rel->attr_widths[i] = rint(parent_attrsizes[i] / parent_rows);
1095
1096 /*
1097 * Set "raw tuples" count equal to "rows" for the appendrel; needed
1098 * because some places assume rel->tuples is valid for any baserel.
1099 */
1100 rel->tuples = parent_rows;

AFAIU, There's difference in the tuples and rows members of
RelOptInfo. While tuples gives the estimate of number of tuples per
pg_class, rows gives the number of rows output by that relation. From
that perspective rel->tuples should be set of the sum of
child_rel->tuples from each of the children. If we do that the last
argument to cost_merge_append() should change from rel->tuples to
rel->rows.

Does that make sense? Attached patch makes those changes.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
appendrel_tuples.patch text/x-diff 4.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tuple count used while costing MergeAppend and that for an append rel
Date: 2016-11-18 20:01:41
Message-ID: 31343.1479499301@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes:
> AFAIU, There's difference in the tuples and rows members of
> RelOptInfo. While tuples gives the estimate of number of tuples per
> pg_class, rows gives the number of rows output by that relation. From
> that perspective rel->tuples should be set of the sum of
> child_rel->tuples from each of the children. If we do that the last
> argument to cost_merge_append() should change from rel->tuples to
> rel->rows.

> Does that make sense? Attached patch makes those changes.

AFAICS, what you propose to add in set_append_rel_size is pure overhead.
There's no conceivable use to computing sum-of-raw-tuple-counts for an
appendrel ... or at least, if there is, you didn't say what you expect
it would be good for. Normally the difference between rel->tuples and
rel->rows corresponds to the selectivity of the rel's restriction clauses.
Since an appendrel has no restrictions of its own (they've all been pushed
down to the child rels) it doesn't seem unreasonable for it to have tuples
equal to rows. While we could also define it as you suggest, I don't see
the point of expending extra cycles to do so.

I concur that using path->rows not rel->tuples in create_merge_append_path
would be an improvement. I think it makes no difference now, but if we
were to support parameterized mergeappend paths, rel->rows and rel->tuples
would both be wrong for such a path. (I believe the possibility of a
parameterized path is the reason why create_append_path computes
path->rows the hard way instead of just copying it from rel->rows.
The logic in create_merge_append_path was probably just copied from
create_append_path; it's overkill today but might not be so forever.)

regards, tom lane


From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tuple count used while costing MergeAppend and that for an append rel
Date: 2016-11-21 04:49:37
Message-ID: CAFjFpRePwJYaaRDe62CH99UX+NxW0xE03p2ECRLZO6yphrxwuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> AFAICS, what you propose to add in set_append_rel_size is pure overhead.
> There's no conceivable use to computing sum-of-raw-tuple-counts for an
> appendrel ... or at least, if there is, you didn't say what you expect
> it would be good for. Normally the difference between rel->tuples and
> rel->rows corresponds to the selectivity of the rel's restriction clauses.
> Since an appendrel has no restrictions of its own (they've all been pushed
> down to the child rels) it doesn't seem unreasonable for it to have tuples
> equal to rows. While we could also define it as you suggest, I don't see
> the point of expending extra cycles to do so.
>

I suggested that change, just to keep the computation consistent with
the comment in RelOptInfo. I agree that it's a pure overhead for now.
If we were to maintain it per the comments, we may be able to find the
average selectivity for an appendrel, but again, I do not see any real
purpose in doing so.

> I concur that using path->rows not rel->tuples in create_merge_append_path
> would be an improvement. I think it makes no difference now, but if we
> were to support parameterized mergeappend paths, rel->rows and rel->tuples
> would both be wrong for such a path. (I believe the possibility of a
> parameterized path is the reason why create_append_path computes
> path->rows the hard way instead of just copying it from rel->rows.
> The logic in create_merge_append_path was probably just copied from
> create_append_path; it's overkill today but might not be so forever.)

Yes, this code did puzzle me. add_path() does not allow paths to have
both parameterization and pathkeys. So, a merge append on
parameterized paths does look odd. But then, I thought, that producing
a parameterized merge append would get allow merging ordered rows for
a given value of parameter. Although we don't do that now, but may be
it's useful in the future.

Thanks a lot for your explanation.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company