Use unique index for longer pathkeys.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Use unique index for longer pathkeys.
Date: 2014-06-13 07:41:33
Message-ID: 20140613.164133.160845727.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, This is the continuation from the last CF.

This patch intends to make PG to use index for longer pathkeys
than index columns when,

- The index is a unique index.
- All index columns are NOT NULL.
- The index column list is a subset of query_pathkeys.

The use cases for this patch are,

- (Useless?) DISTINCT on the table with PK constraints.

This alone is useless but this situation could take place when
processing UNION (without ALL), which silently adds DISTINCT *
to element queries. Especially effective with ORDER BY and
LIMIT clauses. But this needs another patch for UNION
optimization. This is the main motive for this patch.

- Unfortunately, other use cases are somewhat boresome, mainly
effective for essentially unnecessary ORDER BY or DISTINCT. To
streatch a point, naively or mechanically composed queires
could be saved by this patch..

====

Duing last CF, I proposed to match long pathkeys against index
columns during creating index paths. This worked fine but also it
is difficult to make sure that all side-effects are
eliminated. Finally Tom Lane suggested to truncate pathkeys while
generation of the pathkeys itself. So this patch comes.

This patch consists of mainly three parts.

1. Mark index as 'Uniquely orders the table' if the conditions
are satisfied. This is the same from the previous patch.

2. Collect the "common primary pathkeys", which is pathkeys that
can perfectly sort all involved
RTE's. (collect_common_primary_pathkeys())

3. Trim the query pathkeys using the common primary pathkeys.
(trim_query_pathkeys())

These steps take place between set_base_rel_sizes() and
set_base_rel_pathlists() in make_one_rel(). The reason for the
position is that the step 2 above needs all inheritence tables to
be extended in PlannerInfo and set_base_rel_sizes (currently)
does that. Then the trimmed pathkeys are used in
set_base_rel_pathlists so trim_query_pathkeys should be before
it. (This needs to be written as a comment?)

Finally, the new patch has grown up to twice in size.. Although
it seems a bit large, I think that side-effects are clearly
avoided.

This message is attatched by two patches.

1. pathkey_and_uniqueindx_typ2_v1.patch : This new patch.

2. pathkey_and_uniqueindx_v10_20130411.patch : The last version
of the previous approach.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
pathkey_and_uniqueindx_typ2_v1.patch text/x-patch 14.2 KB
pathkey_and_uniqueindx_v10_20130411.patch text/x-patch 5.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Barwick 2014-06-13 07:42:21 Re: PL/pgSQL support to define multi variables once
Previous Message Michael Paquier 2014-06-13 07:41:09 Re: PL/pgSQL support to define multi variables once