Re: Use unique index for longer pathkeys.

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Use unique index for longer pathkeys.
Date: 2014-07-14 05:31:52
Message-ID: CAA4eK1+6b6Wjwf51oZMrL+mKFH8xUp9J-pEhQvoR8SE7sWyTWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> Hello, This is the continuation from the last CF.
>
> This patch intends to make PG to use index for longer pathkeys
> than index columns when,
>
> - The index is a unique index.
> - All index columns are NOT NULL.
> - The index column list is a subset of query_pathkeys.
>
> ====
>
> This patch consists of mainly three parts.
>
> 1. Mark index as 'Uniquely orders the table' if the conditions
> are satisfied. This is the same from the previous patch.
>
> 2. Collect the "common primary pathkeys", which is pathkeys that
> can perfectly sort all involved
> RTE's. (collect_common_primary_pathkeys())
>
> 3. Trim the query pathkeys using the common primary pathkeys.
> (trim_query_pathkeys())

I have spent some time on this patch and would like to share my
findings as below with you.

1.
It seems to me that the new flag (IndexOptInfo.full_ordered) introduced in
this patch is not getting set properly. Please refer below test:

create table t (a int, b int, c int, d text);
create unique index i_t_pkey on t(a, b);
insert into t (select a % 10, a / 10, a, 't' from generate_series(0,
100000) a);
analyze;
explain (costs off, analyze on) select distinct * from t;

Unptached -
postgres=# explain (costs off, analyze on) select distinct * from t;
QUERY PLAN
---------------------------------------------------------------------------
Unique (actual time=331.624..597.278 rows=100001 loops=1)
-> Sort (actual time=330.889..430.449 rows=100001 loops=1)
Sort Key: a, b, c, d
Sort Method: external merge Disk: 2344kB
-> Seq Scan on t (actual time=0.013..74.202 rows=100001 loops=1)
Execution time: 665.332 ms
(6 rows)

Patch (pathkey_and_uniqueindx_typ2_v1) -
postgres=# explain (costs off, analyze on) select distinct * from t;
QUERY PLAN
---------------------------------------------------------------------------
Unique (actual time=336.033..601.144 rows=100001 loops=1)
-> Sort (actual time=336.027..436.621 rows=100001 loops=1)
Sort Key: a, b, c, d
Sort Method: external merge Disk: 2344kB
-> Seq Scan on t (actual time=0.014..78.826 rows=100001 loops=1)
Execution time: 667.387 ms
(6 rows)

Shouldn't above query use index scan after patch?

2.
typedef struct IndexOptInfo
bool predOK; /* true if predicate matches query */
bool unique; /* true if a unique index */
bool immediate; /* is uniqueness enforced immediately? */
+ bool full_ordered; /* don't key columns duplicate? */

It seems you have forgotten to update out function _outIndexOptInfo().

3.
+ root->distinct_pathkeys =
+ shorten_pathkeys_following(root, root->distinct_pathkeys,pk_guide,
+ true);
+
+ root->query_pathkeys =
+ shorten_pathkeys_following(root,root->query_pathkeys,pk_guide,
+ true);

Add a space after each parameter in function call.

4.
+PathKeysComparison
+compare_pathkeys_ignoring_strategy(List *keys1, List *keys2)

a. This function return 4 different type of values (PATHKEYS_EQUAL,
PATHKEYS_DIFFERENT, PATHKEYS_BETTER1, PATHKEYS_BETTER2),
however the caller just checks for PATHKEYS_EQUAL, isn't it better to
just
return either PATHKEYS_EQUAL or PATHKEYS_DIFFERENT.
b. Another idea could be that instead of writting separate function,
pass an additional parameter to compare_pathkeys() to distinguish
for ignore strategy case.
c. In any case, I think it is good to add comments on top of newly
introduced function (compare_pathkeys_ignoring_strategy) or extend
the comments of compare_pathkeys() if you want to add a new parameter
to it.

> Finally, the new patch has grown up to twice in size.. Although
> it seems a bit large, I think that side-effects are clearly
> avoided.

I am bit worried about the extra cycles added by this patch as compare
to your previous patch; example
During trimming of paths, it will build index paths (build_index_pathkeys())
which it will anyway do again during creation of index paths:
create_index_paths()->get_index_paths()->build_index_paths()->build_index_pathkeys()

I am not sure if this has direct impact, however I am suspecting trimming
logic can add quite a few instructions even for cases when it is not
beneficial.
At this moment, I also don't have better idea, so lets proceed with review
of this
patch unless there is any other better way to achieve it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Moench-Tegeder 2014-07-14 05:33:35 Re: 9.4 documentation: duplicate paragraph in logical decoding example
Previous Message Stephen Frost 2014-07-14 04:29:09 Re: tweaking NTUP_PER_BUCKET