Re: Get more from indices.

Lists: pgsql-hackers
From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Get more from indices.
Date: 2013-10-31 10:43:10
Message-ID: 20131031.194310.212107585.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello, This is the last episode of the 'dance with indices'?
series.

Unique indexes can sort the tuples in corresponding tables
prefectly. So this query might can use index.

> uniquetest=# create table t (a int, b int, c int, d text);
> uniquetest=# create unique index i_t_pkey on t(a, b);
> uniquetest=# insert into t
> (select a % 10, a / 10, a, 't' from generate_series(0, 100000) a);
> uniquetest=# analyze;
>
> uniquetest=# explain (costs off, analyze on) select distinct * from t;
> QUERY PLAN
> ---------------------------------------------------------------------------
> Unique (actual time=149.625..245.978 rows=100001 loops=1)
> -> Sort (actual time=149.624..199.806 rows=100001 loops=1)
> Sort Key: a, b, c, d
> Sort Method: external merge Disk: 2328kB
> -> Seq Scan on t (actual time=0.016..15.597 rows=100001 loops=1)
> Total runtime: 251.272 ms

With this patch, the plan for this query becomes as follows,

> uniquetest=# explain (costs off, analyze on) select distinct * from t;
> QUERY PLAN
> ---------------------------------------------------------------------
> Index Scan using i_t_pkey on t
> (actual time=0.019..32.457 rows=100001 loops=1)
> Total runtime: 37.488 ms

This only seems not so valuable to archive, but with my previous
MergeAppend for UNION patch, this will have more value.

> uniquetest=# create table pu1 (a int, b int, c int, d text);
> uniquetest=# create unique index pu1_pkey_idx on pu1 (a, b);
> uniquetest=# create table cu11 (like pu1 including all) inherits (pu1);
> uniquetest=# create table cu12 (like pu1 including all) inherits (pu1);
> uniquetest=# create table pu2 (like pu1 including all);
> uniquetest=# create table cu21 (like pu1 including all) inherits (pu2);
> uniquetest=# create table cu22 (like pu1 including all) inherits (pu2);
> uniquetest=# insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11'
> from generate_series(000000, 099999) a);
> uniquetest=# insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12'
> from generate_series(100000, 199999) a);
> uniquetest=# insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21'
> from generate_series(150000, 249999) a);
> uniquetest=# insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22'
> from generate_series(250000, 349999) a);
> uniquetest=# explain (analyze on, costs off)
> select * from pu1 union select * from pu2 order by a, b limit 100;
> QUERY PLAN
> --------------------------------------------------------------------------
> Limit (actual time=0.226..0.591 rows=100 loops=1)
> -> Unique (actual time=0.223..0.561 rows=100 loops=1)
> -> Merge Append (actual time=0.223..0.470 rows=100 loops=1)
> Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
> -> Limit (actual time=0.125..0.326 rows=100 loops=1)
> -> Unique (actual time=0.125..0.303 rows=100 loops=1)
> -> Merge Append (actual time=0.123..0.220 rows=100 loops=1)
> Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
> -> Index Scan using..on pu1 (actual time=0.005..0.005 rows=0 loops=1)
> -> Index Scan using..on cu11(actual time=0.071..0.128 rows=100 loops=1)
> -> Index Scan using..on cu12 (actual time=0.045..0.045 rows=1 loops=1)
> -> Limit (actual time=0.096..0.096 rows=1 loops=1)
> -> Unique (actual time=0.096..0.096 rows=1 loops=1)
> -> Merge Append (actual time=0.094..0.094 rows=1 loops=1)
> Sort Key: pu2.a, pu2.b, pu2.c, pu2.d
> -> Index Scan using..on pu2 (actual time=0.002..0.002 rows=0 loops=1)
> -> Index Scan using..on cu21 (actual time=0.047..0.047 rows=1 loops=1)
> -> Index Scan using..on cu22 (actual time=0.043..0.043 rows=1 loops=1)
> Total runtime: 1.987 ms

On the other hand, what we will get from the unpatched PostgreSQL is,

> uniquetest=# explain (analyze on, costs off)
> select * from pu1 union select * from pu2 order by a, b limit 100;
> QUERY PLAN
> -------------------------------------------------------------------------
> Limit (actual time=535.620..535.706 rows=100 loops=1)
> -> Unique (actual time=535.618..535.695 rows=100 loops=1)
> -> Sort (actual time=535.617..535.637 rows=100 loops=1)
> Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
> Sort Method: external sort Disk: 10568kB
> -> Append (actual time=0.012..144.144 rows=400000 loops=1)
> -> Append (actual time=0.012..49.570 rows=200000 loops=1)
> -> Seq Scan on pu1 (actual time=0.001..0.001 rows=0 loops=1)
> -> Seq Scan on cu11 (actual time=0.011..16.202 rows=100000 loops=1)
> -> Seq Scan on cu12 (actual time=0.008..16.334 rows=100000 loops=1)
> -> Append (actual time=0.010..54.052 rows=200000 loops=1)
> -> Seq Scan on pu2 (actual time=0.000..0.000 rows=0 loops=1)
> -> Seq Scan on cu21 (actual time=0.009..16.444 rows=100000 loops=1)
> -> Seq Scan on cu22 (actual time=0.021..18.923 rows=100000 loops=1)
> Total runtime: 537.765 ms

What do you think about this?

This could be realized by following modifications,

- Consider a query pathekeys is useful for ordering when a index
pathkey is a subset of that when the index is UNIQUE.
(build_index_paths, pathkeys_useful_for_ordering,
truncate_useless_pathkeys, grouping_planner)

- Judge a path is orderd or not counting the uniqueness of the
path.

- Regard IndexPath on UNIQUE index is reagarded uniquely
ordered.

- Regard MergeAppendPath of which all children is uniquely
orderd as (not necessarily uniquely) ordered.
(path_is_ordered)

- Judge the necessity of sorting on above
premises. (grouping_planner)

Needless to say, theses patches have no effect on any other set
operations than UNION(ALL). They are, INTERSECT(ALL),
EXCEPT(ALL).

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
pathkey_and_uniqueidx_v1_20131031.patch text/x-patch 12.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2013-10-31 14:59:45
Message-ID: 23472.1383231585@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> Unique indexes can sort the tuples in corresponding tables
> prefectly. So this query might can use index.

>> uniquetest=# create table t (a int, b int, c int, d text);
>> uniquetest=# create unique index i_t_pkey on t(a, b);
>> uniquetest=# insert into t
>> (select a % 10, a / 10, a, 't' from generate_series(0, 100000) a);
>> uniquetest=# analyze;
>>
>> uniquetest=# explain (costs off, analyze on) select distinct * from t;

ISTM the right way to deal with this is not what you've done here, but
just to deem the DISTINCT a no-op if there's a unique index on some subset
of the distinct-ed columns. This query is actually legally satisfied by
a simple seqscan, which would be faster than either of the plans you
mention. In any case, it seems like a bad idea to me to conflate
distinct-ness with ordering, so I don't like what you did to PathKeys.

Having said that, there is the kernel of a useful idea here, I think.
The reason you don't get an indexscan already is that the DISTINCT
assumes it needs to sort by (a,b,c,d), which an index on just (a,b)
doesn't appear to satisfy. However, if the index is unique, wouldn't
scanning the index produce data that actually satisfies the longer sort
key? It doesn't matter what the values of c,d are if there are no
duplicates in the a,b columns. So maybe as a separate patch, we could
look at claiming that a unique index satisfies the entire query_pathkeys
if it matches the first N columns of that.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Get more from indices.
Date: 2013-10-31 15:05:36
Message-ID: CA+TgmoYT8uY9YFk1bNVJuSjxH8N+M+unfKVSS4MfzD3w=LbEkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 31, 2013 at 10:59 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> However, if the index is unique, wouldn't
> scanning the index produce data that actually satisfies the longer sort
> key? It doesn't matter what the values of c,d are if there are no
> duplicates in the a,b columns. So maybe as a separate patch, we could
> look at claiming that a unique index satisfies the entire query_pathkeys
> if it matches the first N columns of that.

That would be really spiffy.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2013-10-31 17:22:43
Message-ID: 3790.1383240163@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
>> Unique indexes can sort the tuples in corresponding tables
>> prefectly. So this query might can use index.

BTW, how much of any of this is correct if the "unique" index contains
multiple NULL values? We might need to restrict the optimization(s)
to cases where the unique-ified columns are all marked NOT NULL.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2013-11-07 22:10:49
Message-ID: 527C0FE9.7000600@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/31/13, 6:43 AM, Kyotaro HORIGUCHI wrote:
> Hello, This is the last episode of the 'dance with indices'?
> series.

Your patch fails the isolation test because of changed query plans:

http://pgci.eisentraut.org/jenkins/job/postgresql_commitfest_world/175/artifact/src/test/isolation/regression.diffs/*view*/


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2013-11-11 09:29:47
Message-ID: 20131111.182947.147160799.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thank you,

> In any case, it seems like a bad idea to me to conflate
> distinct-ness with ordering, so I don't like what you did to
> PathKeys.

Hmm, that sounds quite resonable in general. But the conflation
is already found in grouping_planner to some extent. The name
distinct_pathkey itself asserts that it is the ordering used for
distinct. And actually is used for sorting if hashed-distinct is
not selected.

Plus, these modifications in grouping_planner is required by the
patch for pathkeys.c to be effective.

I suppose the main cause of nastiness of the patch for
grouping_planner comes from the adheration of the usage of the
variable for path uniqueness with the existent code.

The additional members to Plan, say, pathkeys and ordered could
help the code to look less ugly by taking in the related code
currently appears nakedly in grouping_planner into make(or
generate)_xxxxplan() functions. Although the new attributes
somewhat look out of place..

> > However, if the index is unique, wouldn't
> > scanning the index produce data that actually satisfies the longer sort
> > key? It doesn't matter what the values of c,d are if there are no
> > duplicates in the a,b columns. So maybe as a separate patch, we could
> > look at claiming that a unique index satisfies the entire query_pathkeys
> > if it matches the first N columns of that.
>
> That would be really spiffy.

# Putting aside the trueness of the assumption for unique-index
# and pathkeys.

The "expanded" sufficiency check can be archieved by involving
'unique-indexed' attribute for pathkeys_contained_in(),say
pathkeys_satisfies(pathkeys, pathkeys, is_uniq), but finally
could have no effect without some extent of assist in the process
in grouping_planner like my preveous patch to be in effect, I
believe.

I'll try to rewrite the path to be as following considering less
conflating lookings in grouping_planner.

- is_unique and pathkeys is added to the type Path. (umm...)

- create function like pathkeys_satisfies(check_pathkeys,
pathkeys, isuniq) or path_ordered_by(pathkeys, path) as
needed.

- Plan will be set ordered and pathkeys derived from source path
and its process and grouping_planner consults it to deceide
whether to do sort (to hide the currently naked code).

- Add check for NULLuble columns :-)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: peter_e(at)gmx(dot)net
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2013-11-11 09:30:59
Message-ID: 20131111.183059.07559874.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

> Your patch fails the isolation test because of changed query plans:
>
> http://pgci.eisentraut.org/jenkins/job/postgresql_commitfest_world/175/artifact/src/test/isolation/regression.diffs/*view*/

Thank you for pointing out. I wasn't aware of that..

# Because it is not launched from the top-level make check...

I'll count that in next pach.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2013-11-12 08:48:41
Message-ID: 20131112.174841.74104536.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello, this is the revised patch.

> Hmm, that sounds quite resonable in general. But the conflation
> is already found in grouping_planner to some extent.

Although this new patch is not split into unique-index sutff and
others, it removes current_pathkeys from grouping_planner, and
adds pathkeys and is_unique into struct Plan instead. This
eliminates independent and longstanding current_pathkeys variable
and calculus involving it from grouping_planner() so it would be
made clearer and easier to read, I expect.

> - is_unique and pathkeys is added to the type Path. (umm...)
>
> - create function like pathkeys_satisfies(check_pathkeys,
> pathkeys, isuniq) or path_ordered_by(pathkeys, path) as
> needed.

This was different from my thought. Finally I added
plan_is_ordered(plan, path) and some of make_<nodename>()
functions take pathkeys and/or uniquely_ordered as parameter and
some of others take them from given leftree plan. Plan nodes
propagate the attributes in autonomous way so explicit
current_pathkeys handling could be thrown away.

> - Plan will be set ordered and pathkeys derived from source path
> and its process and grouping_planner consults it to deceide
> whether to do sort (to hide the currently naked code).
>
> - Add check for NULLuble columns :-)

Now IndexOptInfo has new attribute full_ordered and it is set
true if the index does not cover any nullAble columns in
get_relation_info().

And expect file of isolation test is modified to fit the change
in result plan.

Finally, this version became to conflict with the another UNION
patch so I detached from it and rebased to current master.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
20131112_unique_orderby.patch text/x-patch 27.3 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: robertmhaas(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2013-11-14 04:17:38
Message-ID: 1384402658.26405.14.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2013-11-12 at 17:48 +0900, Kyotaro HORIGUCHI wrote:
> Hello, this is the revised patch.

Since you're using git, please check your patch for trailing whitespace
with git diff --check.


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2013-11-15 06:57:16
Message-ID: 20131115.155716.184031283.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello, I might made insufficient explanation. Surely it is
overdone for the example.

>> uniquetest=# create table t (a int, b int, c int, d text);
>> uniquetest=# create unique index i_t_pkey on t(a, b);
>> uniquetest=# insert into t
>> (select a % 10, a / 10, a, 't' from generate_series(0, 100000) a);
>> uniquetest=# analyze;
>>
>> uniquetest=# explain (costs off, analyze on) select distinct * from t;

This is the simplest example for this patch.

The main intention of this patch is to widen application of my
another 'Index for UNION' patch.

https://commitfest.postgresql.org/action/patch_view?id=1279

> ISTM the right way to deal with this is not what you've done
> here, but just to deem the DISTINCT a no-op if there's a unique
> index on some subset of the distinct-ed columns.

It is not sufficient only to deem DISTINCT no-op in order to get
more utility of MergeAppends on IndexScans for UNION. The
following example (omitting data insertion..)

| create table cu11 (a int not null, b int not null, c int, d text);
| create unique index i_cu11_pk on cu11 (a, b);
| create table cu12 (like cu11 including all);
| explain select a, b from cu11 union select a, b from cu12
| order by a, b limit 100;

yields following plan without this patch.

| Limit
| -> Sort (Sort Key: cu11.a, cu11.b)
| -> HashAggregate
| -> Append
| -> Limit
| -> Unique
| -> Index Only Scan using cu11_a_b_idx on cu11
| -> Limit
| -> Unique
| -> Index Only Scan using cu12_a_b_idx on cu12

But, this could be far more effecient when the plan were as
follows if the rows are fully-ordered on the sort key.

| Limit
| -> Unique
| -> Merge Append (Sort Key: cu11.a, cu11.b)
| -> Index Only Scan using cu11_a_b_idx on cu11
| -> Index Only Scan using cu12_a_b_idx on cu12

Propagation of uniqueness on MergeAppend is required to do this
but not for the first expample on this thread.

> This query is actually legally satisfied by a simple seqscan,
> which would be faster than either of the plans you mention. In
> any case, it seems like a bad idea to me to conflate
> distinct-ness with ordering, so I don't like what you did to
> PathKeys.

Umm. Reconsidering on that and the discussion of the purose above
and the last patch of this thread, I've been wrong in where to
split the patches. I'll repost the reorganized patches on both
threads.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2013-11-19 11:35:16
Message-ID: 20131119.203516.251520490.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello, I've totally refactored the series of patches and cut out
the appropriate portion as 'unique (and non-nullable) index
stuff'.

As the discussion before, it got rid of path distinctness. This
patch works only on index 'full-orederedness', i.e., unique index
on non-nullable columns.

This patch itself does not so much. Will have power applied with
'Using indices for UNION' patch.

=== Making test table

create table t (a int not null, b int not null, c int, d text);
create unique index i_t_ab on t (a, b);
insert into t (select a / 5, 4 - (a % 5), a, 't' from generate_series(000000, 099999) a);

=== Example 1.

not-patched=# explain select * from t order by a, b ,c limit 1;
> QUERY PLAN
> ----------------------------------------------------------------------
> Limit (cost=2041.00..2041.00 rows=1 width=14)
> -> Sort (cost=2041.00..2291.00 rows=100000 width=14)
> Sort Key: a, b, c
> -> Seq Scan on t (cost=0.00..1541.00 rows=100000 width=14)

patched=# explain select * from t order by a, b ,c limit 1;
> QUERY PLAN
> -----------------------------------------------------------------------------
> Limit (cost=0.29..0.33 rows=1 width=14)
> -> Index Scan using i_t_ab on t (cost=0.29..3857.04 rows=100000 width=14)

=== Example 2.

not-patched=# explain select distinct * from t order by a limit 1;
> QUERY PLAN
> ---------------------------------------------------------------------------
> Limit (cost=1820.46..1820.47 rows=1 width=44)
> -> Sort (cost=1820.46..1835.34 rows=5951 width=44)
> Sort Key: a
> -> HashAggregate (cost=1731.20..1790.71 rows=5951 width=44)
> -> Seq Scan on t (cost=0.00..1136.10 rows=59510 width=44)

patched=# explain select distinct * from t order by a limit 1;
> QUERY PLAN
> ------------------------------------------------------------------------------------
> Limit (cost=0.29..1.09 rows=1 width=44)
> -> Unique (cost=0.29..4756.04 rows=5951 width=44)
> -> Index Scan using i_t_ab on t (cost=0.29..4160.94 rows=59510 width=44)

The unique node could be removed technically but it requires to
care the distinctness of path/plan. So it has been put out to
"Using indeces for UNION" patch.

Any comments?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
pathkey_and_uniqueindx_v3_20131119.patch text/x-patch 8.6 KB

From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Cc: <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <robertmhaas(at)gmail(dot)com>
Subject: Re: Get more from indices.
Date: 2013-11-20 11:33:58
Message-ID: 001e01cee5e4$66f3f010$34dbd030$@etsuro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kyotaro HORIGUCHI wrote:
> Hello, I've totally refactored the series of patches and cut out the
appropriate portion as 'unique (and non-nullable) index stuff'.

> As the discussion before, it got rid of path distinctness. This patch
works only on index 'full-orederedness', i.e., unique index on non-nullable
columns.

This is interesting!

I took a quick look at the patch. Here is my initial comment. I don't
think it's so good to set the pathkeys of a unique-index path to
query_pathkeys after the scan/join optimization because it can't exploit the
full-orderedness property in implementing mergejoins, i.e., it can't skip
doing an explicit sort that is considered unnecessary from the property.
So, I think the path's pathkeys should be set to query_pathkeys before the
scan/join optimization, i.e., at the index-path creation time.

If you wouldn't mind, I'd like to rework on the patch.

Thanks,

Best regards,
Etsuro Fujita


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2013-11-22 07:59:10
Message-ID: 20131122.165910.65987817.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello. I found a bug(?) in thsi patch as I considered on another
patch.

In truncate_useless_pathkeys, if query_pathkeys is longer than
pathkeys made from index columns old patch picked up the latter
as IndexPath's pathkeys. But the former is more suitable
according to the context here.

the attched pathkey_and_uniqueindx_v4_20131122.patch is changed
as follows.

- Rebased to current master.

- truncate_useless_pathkeys returns root->query_pathkeys when
the index is fully-ordered and query_pathkeys contains the
pathkeys made from index columns.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
pathkey_and_uniqueindx_v4_20131122.patch text/x-patch 8.5 KB

From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Cc: <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <robertmhaas(at)gmail(dot)com>
Subject: Re: Get more from indices.
Date: 2013-11-22 09:42:59
Message-ID: 006801cee767$3ad86130$b0892390$@etsuro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kyotaro HORIGUCHI wrote:
> Hello. I found a bug(?) in thsi patch as I considered on another patch.

> In truncate_useless_pathkeys, if query_pathkeys is longer than pathkeys
made
> from index columns old patch picked up the latter as IndexPath's pathkeys.

> But the former is more suitable according to the context here.

> the attched pathkey_and_uniqueindx_v4_20131122.patch is changed as
follows.

> - Rebased to current master.

> - truncate_useless_pathkeys returns root->query_pathkeys when
> the index is fully-ordered and query_pathkeys contains the
> pathkeys made from index columns.

OK, I'd like to look at this version of the patch more closely, then.

Thanks,

Best regards,
Etsuro Fujita


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Cc: <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <robertmhaas(at)gmail(dot)com>
Subject: Re: Get more from indices.
Date: 2013-11-26 03:30:03
Message-ID: 004a01ceea57$cbab0440$63010cc0$@etsuro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kyotaro HORIGUCHI wrote:
> the attched pathkey_and_uniqueindx_v4_20131122.patch is changed as
> follows.

The patch is compiled successfully and passes all regression tests. Also,
the patch works well for the tests shown in an earlier email from
Horiguchi-san. But in this version I found an incorrect behavior.

postgres=# CREATE TABLE t (a int not null, b int not null, c int, d text);
CREATE TABLE
postgres=# CREATE UNIQUE INDEX i_t_ab ON t (a, b);
CREATE INDEX
postgres=# INSERT INTO t (SELECT a / 5, 4 - (a % 5), a, 't' FROM
generate_series(000000, 099999) a);
INSERT 0 100000
postgres=# ANALYZE t;
ANALYZE
postgres=# CREATE TABLE t2 (e text, f int);
CREATE TABLE
postgres=# INSERT INTO t2 VALUES ('t', 2);
INSERT 0 1
postgres=# INSERT INTO t2 VALUES ('t', 1);
INSERT 0 1
postgres=# ANALYZE t2;
ANALYZE
postgres=# EXPLAIN SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e ORDER
BY t.a, t.b, t.c, t.d, t2.f LIMIT 10;
QUERY PLAN
----------------------------------------------------------------------------
----
Limit (cost=0.29..9.30 rows=10 width=20)
-> Nested Loop (cost=0.29..129.99 rows=144 width=20)
Join Filter: (t.d = t2.e)
-> Index Scan using i_t_ab on t (cost=0.29..126.80 rows=72
width=14)
Index Cond: (a < 10)
-> Materialize (cost=0.00..1.03 rows=2 width=6)
-> Seq Scan on t2 (cost=0.00..1.02 rows=2 width=6)
(7 rows)

postgres=# SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e ORDER BY t.a,
t.b, t.c, t.d, t2.f LIMIT 10;
a | b | c | d | e | f
---+---+---+---+---+---
0 | 0 | 4 | t | t | 2
0 | 0 | 4 | t | t | 1
0 | 1 | 3 | t | t | 2
0 | 1 | 3 | t | t | 1
0 | 2 | 2 | t | t | 2
0 | 2 | 2 | t | t | 1
0 | 3 | 1 | t | t | 2
0 | 3 | 1 | t | t | 1
0 | 4 | 0 | t | t | 2
0 | 4 | 0 | t | t | 1
(10 rows)

(Note the column f is sorted in the descending order.)

ISTM this was brought by the following change.

> In truncate_useless_pathkeys, if query_pathkeys is longer than pathkeys
> made from index columns old patch picked up the latter as IndexPath's
> pathkeys. But the former is more suitable according to the context here.

> - truncate_useless_pathkeys returns root->query_pathkeys when
> the index is fully-ordered and query_pathkeys contains the
> pathkeys made from index columns.

I think it would be required to modify the patch so that the transformation
of the pathkeys is performed only when each pathkey of query_pathkeys
references the indexed relation. (The above change might have been made
according to my comment in an earlier email I sent. But I have to admit my
explanation there was not adequate. I'm sorry for that.)

Here are random comments.

* In grouping_planner(), the patch resets the pathkeys of the cheapest
presorted index-path to query_pathkeys through
get_cheapest_fractional_path_for_pathkeys(). Is this necessary? ISTM the
transformation of the pathkeys after the scan/join optimization would be no
longer necessary once it has been done at the index-path creation time, ie,
build_index_paths(). Why does the patch do that thing?

* In get_relation_info(), the patch determines the branch condition
"keyattno != ObjectIdAttributeNumber". I fail to understand the meaning of
this branch condition. Could you explain about it?

Thanks,

Best regards,
Etsuro Fujita


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp
Cc: pgsql-hackers(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2013-11-26 11:42:42
Message-ID: 20131126.204242.141708563.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thank you for pointing out.

> > the attched pathkey_and_uniqueindx_v4_20131122.patch is changed as
> > follows.
>
> The patch is compiled successfully and passes all regression tests. Also,
> the patch works well for the tests shown in an earlier email from
> Horiguchi-san. But in this version I found an incorrect behavior.
>
> postgres=# CREATE TABLE t (a int not null, b int not null, c int, d text);
> CREATE TABLE
> postgres=# CREATE UNIQUE INDEX i_t_ab ON t (a, b);
> CREATE INDEX
> postgres=# INSERT INTO t (SELECT a / 5, 4 - (a % 5), a, 't' FROM
> generate_series(000000, 099999) a);
> INSERT 0 100000
> postgres=# ANALYZE t;
> ANALYZE
> postgres=# CREATE TABLE t2 (e text, f int);
> CREATE TABLE
> postgres=# INSERT INTO t2 VALUES ('t', 2);
> INSERT 0 1
> postgres=# INSERT INTO t2 VALUES ('t', 1);
> INSERT 0 1
> postgres=# ANALYZE t2;
> ANALYZE
> postgres=# EXPLAIN SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e ORDER
> BY t.a, t.b, t.c, t.d, t2.f LIMIT 10;
> QUERY PLAN
> ----------------------------------------------------------------------------
> ----
> Limit (cost=0.29..9.30 rows=10 width=20)
- -> Sort (cost=92.33..92.57 rows=94 width=20)
- Sort Key: t.a, t.b, t.c, t.d, t2.f
> -> Nested Loop (cost=0.29..129.99 rows=144 width=20)
> Join Filter: (t.d = t2.e)
> -> Index Scan using i_t_ab on t (cost=0.29..126.80 rows=72
> width=14)
> Index Cond: (a < 10)
> -> Materialize (cost=0.00..1.03 rows=2 width=6)
> -> Seq Scan on t2 (cost=0.00..1.02 rows=2 width=6)
> (7 rows)

Auch. Outermost sort is omitted but necessary (Prefixed by '-' above).

> ISTM this was brought by the following change.
>
> > In truncate_useless_pathkeys, if query_pathkeys is longer than pathkeys
> > made from index columns old patch picked up the latter as IndexPath's
> > pathkeys. But the former is more suitable according to the context here.
>
> > - truncate_useless_pathkeys returns root->query_pathkeys when
> > the index is fully-ordered and query_pathkeys contains the
> > pathkeys made from index columns.

I implicitly put a wrong assumption that query_pathkeys is
dedicated for the relation under work, not whole subquery.

> I think it would be required to modify the patch so that the transformation
> of the pathkeys is performed only when each pathkey of query_pathkeys
> references the indexed relation.

What is wrong here is that IndexPath declares that it is ordered
in the pathkeys which includes the columns not belongs to the
table.

> (The above change might have been made according to my comment
> in an earlier email I sent. But I have to admit my explanation
> there was not adequate. I'm sorry for that.)

Nothing to be sorry for. It's my fault.

Anyway, I've put restriction on pathkeys_useful_for_ordering that
pick up longest pathkeys consists only ec members which matches
the required rel bitmap. With the new patch the planner makes
plan with the omitted sort and the query returns the following
result.

uniontest=# SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e
| ORDER BY t.a, t.b, t.c, t.d, t2.f LIMIT 10;
| a | b | c | d | e | f
| ---+---+---+---+---+---
| 0 | 0 | 4 | t | t | 1 <-+-- Correct order by t2.f.
| 0 | 0 | 4 | t | t | 2 <-+
| 0 | 1 | 3 | t | t | 1
(snipped.)

> Here are random comments.
>
> * In grouping_planner(), the patch resets the pathkeys of the cheapest
> presorted index-path to query_pathkeys through
> get_cheapest_fractional_path_for_pathkeys(). Is this necessary? ISTM the
> transformation of the pathkeys after the scan/join optimization would be no
> longer necessary once it has been done at the index-path creation time, ie,
> build_index_paths(). Why does the patch do that thing?

You're appliciated for pointing out. It is utterly useless code
of older patch. ("Have you totally scrapped the older verions?" >me :-(

Removed it in this version.

> * In get_relation_info(), the patch determines the branch condition
> "keyattno != ObjectIdAttributeNumber". I fail to understand the meaning of
> this branch condition. Could you explain about it?

Literally answering, it means oid cannot be NULL (if it exists).

Precisely, it reflects that I believed (or wished) it cannot be
NULL if oid column exists. (Other sysattrs are dismissed because
they are hardly to be a unique index column:-)

Oid in a tuple is retrived using HeapTupleGetOid() in
heap_getsysattr().

| result = ObjectIdGetDatum(HeapTupleGetOid(tup));

Then HeapTupleHeaderGetOid,

| #define HeapTupleGetOid(tuple) \
| HeapTupleHeaderGetOid((tuple)->t_data)

Finally asking HeapTupleHeaderGetOid for the value.

htup_details.h:354
| #define HeapTupleHeaderGetOid(tup) \
| ( \
| ((tup)->t_infomask & HEAP_HASOID) ? \
| *((Oid *) ((char *)(tup) + (tup)->t_hoff - sizeof(Oid))) \
| : \
| InvalidOid \
| )

So oid cannot be NULL after all even though can be InvalidOid.

On the otherhand, it can be NULL according to the definition in
heap.c.

heap.c:146
| static FormData_pg_attribute a2 = {
| 0, {"oid"}, OIDOID, 0, sizeof(Oid),
| ObjectIdAttributeNumber, 0, -1, -1,
| true, 'p', 'i', true, false, false, true, 0
| }; ~~~~

Can we set this to false safely?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
pathkey_and_uniqueindx_v5_20131126.patch text/x-patch 9.0 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2013-11-27 02:35:21
Message-ID: 1385519721.28256.0.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2013-11-22 at 16:59 +0900, Kyotaro HORIGUCHI wrote:
> Hello. I found a bug(?) in thsi patch as I considered on another
> patch.

src/backend/optimizer/util/plancat.c:256: trailing whitespace.
src/backend/optimizer/util/plancat.c:261: space before tab in indent.


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <robertmhaas(at)gmail(dot)com>
Subject: Re: Get more from indices.
Date: 2013-11-27 10:27:58
Message-ID: 002201ceeb5b$57924c80$06b6e580$@etsuro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kyotaro HORIGUCHI wrote:
> > * In get_relation_info(), the patch determines the branch condition
> > "keyattno != ObjectIdAttributeNumber". I fail to understand the
> > meaning of this branch condition. Could you explain about it?

> Literally answering, it means oid cannot be NULL (if it exists).

Understood. Thank you for the detailed information. But I'm not sure it's
worth complicating the code. What use cases are you thinking?

Thanks,

Best regards,
Etsuro Fujita


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Etsuro Fujita'" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <robertmhaas(at)gmail(dot)com>
Subject: Re: Get more from indices.
Date: 2013-11-28 03:50:07
Message-ID: 005301ceebec$edcdf4b0$c969de10$@etsuro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Kyotaro HORIGUCHI wrote:
> > > * In get_relation_info(), the patch determines the branch condition
> > > "keyattno != ObjectIdAttributeNumber". I fail to understand the
> > > meaning of this branch condition. Could you explain about it?

> > Literally answering, it means oid cannot be NULL (if it exists).

> Understood. Thank you for the detailed information. But I'm not sure
it's
> worth complicating the code. What use cases are you thinking?

Having said that, I've reconsidered about this, and start to think it would
be better that all system attributes are processed in the same way as are
user attributes because that makes the code more simple. Yes, as you
mention, it's not necessarily guaranteed that system attributes have the
uniqueness property in general, but that's another problem.

Thanks,

Best regards,
Etsuro Fujita


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Etsuro Fujita'" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <robertmhaas(at)gmail(dot)com>
Subject: Re: Get more from indices.
Date: 2013-11-28 12:10:29
Message-ID: 006b01ceec32$d462fca0$7d28f5e0$@etsuro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I wrote:
> > Kyotaro HORIGUCHI wrote:
> > > > * In get_relation_info(), the patch determines the branch
> > > > condition "keyattno != ObjectIdAttributeNumber". I fail to
> > > > understand the meaning of this branch condition. Could you explain
> about it?

> > > Literally answering, it means oid cannot be NULL (if it exists).

> > Understood. Thank you for the detailed information. But I'm not sure
> > it's worth complicating the code. What use cases are you thinking?

> Having said that, I've reconsidered about this, and start to think it
would
> be better that all system attributes are processed in the same way as are
> user attributes because that makes the code more simple. Yes, as you
> mention, it's not necessarily guaranteed that system attributes have the
> uniqueness property in general, but that's another problem.

I've modified the patch to work in such a way. Also, as ISTM the patch is
more complicated than what the patch really does, I've simplified the patch.
Attached is an updated version of the patch. Could you review the patch?

Thanks,

Best regards,
Etsuro Fujita

Attachment Content-Type Size
pathkey_and_uniqueindx_v6_20131128.patch application/octet-stream 5.1 KB

From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Etsuro Fujita'" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <robertmhaas(at)gmail(dot)com>
Subject: Re: Get more from indices.
Date: 2013-12-03 09:38:28
Message-ID: 00ad01cef00b$6bae88f0$430b9ad0$@etsuro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I've modified the patch to work in such a way. Also, as ISTM the patch
> is more complicated than what the patch really does, I've simplified the
> patch.

I've revised the patch a bit. Please find attached the patch.

Thanks,

Best regards,
Etsuro Fujita

Attachment Content-Type Size
pathkey_and_uniqueindx_v7_20131203.patch application/octet-stream 5.3 KB

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp
Cc: pgsql-hackers(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2013-12-05 08:12:18
Message-ID: 20131205.171218.164929626.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

> > I've modified the patch to work in such a way. Also, as ISTM the patch
> > is more complicated than what the patch really does, I've simplified the
> > patch.
>
> I've revised the patch a bit. Please find attached the patch.

Thank you, but it seems to me too simplified. You made two major
functional changes.

One is, you put the added code for getrelation_info() out of the
block for the condition (info->relam == BTREE_AM_OID) (though
amcanorder would be preferable..) Anyway the reason for the place
is to guarantee 'full_ordered' index always to be orderable. I
believe the relation between them are not obvious. So your patch
has an oppotunity to make wrong assumption for possible indexes
which is not orderable but unique. Going on your way some
additional works would be needed to judge an index to be
orderable or not on checking the extensibility of pathkeys.

Another is, you changed pathkeys expantion to be all-or-nothing
decision. While this change should simplify the code slightly, it
also dismisses the oppotunity for partially-extended
pathkeys. Could you let me know the reason why you did so.

Any thoughts?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <robertmhaas(at)gmail(dot)com>
Subject: Re: Get more from indices.
Date: 2013-12-05 11:48:19
Message-ID: 00ad01cef1af$e44793b0$acd6bb10$@etsuro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kyotaro HORIGUCHI wrote:
> Thank you, but it seems to me too simplified. You made two major
functional
> changes.

Thank you for the comments!

> One is, you put the added code for getrelation_info() out of the block for
> the condition (info->relam == BTREE_AM_OID) (though amcanorder would be
> preferable..) Anyway the reason for the place is to guarantee
'full_ordered'
> index always to be orderable. I believe the relation between them are not
> obvious. So your patch has an oppotunity to make wrong assumption for
> possible indexes which is not orderable but unique. Going on your way some
> additional works would be needed to judge an index to be orderable or not
> on checking the extensibility of pathkeys.

By checking the following equation in build_index_paths(), the updated
version of the patch guarantees that the result of an index scan is ordered:

index_is_ordered = (index->sortopfamily != NULL);

> Another is, you changed pathkeys expantion to be all-or-nothing decision.
> While this change should simplify the code slightly, it also dismisses the
> oppotunity for partially-extended pathkeys. Could you let me know the
reason
> why you did so.

At first I thought the partially-extended pathkey list that is made from
query_pathkeys, as you proposed in the original versions of the patch. But
I've started to doubt whether it's worth doing that because I think the
partially-extended pathkey list is merely one example while the original
pathkey list can be partially-extended in different ways, ie, ISTM the
partially-extended pathkey list doesn't necessarily have the optimality in
anything significant. We might be able to partially-extend the original
pathkey list optimally in something significant, but that seems useless
complexity to me. So, I modified the patch to do the all-or-nothing
decision.

Thanks,

Best regards,
Etsuro Fujita


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Etsuro Fujita'" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <robertmhaas(at)gmail(dot)com>
Subject: Re: Get more from indices.
Date: 2013-12-05 12:05:10
Message-ID: 00b401cef1b2$3f328030$bd978090$@etsuro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Kyotaro HORIGUCHI wrote:
> > Another is, you changed pathkeys expantion to be all-or-nothing
decision.
> > While this change should simplify the code slightly, it also dismisses
> > the oppotunity for partially-extended pathkeys. Could you let me know
> > the
> reason
> > why you did so.

> At first I thought the partially-extended pathkey list that is made from
> query_pathkeys, as you proposed in the original versions of the patch.
But
> I've started to doubt whether it's worth doing that because I think the
> partially-extended pathkey list is merely one example while the original
> pathkey list can be partially-extended in different ways, ie, ISTM the
> partially-extended pathkey list doesn't necessarily have the optimality
> in anything significant. We might be able to partially-extend the
original
> pathkey list optimally in something significant, but that seems useless
> complexity to me. So, I modified the patch to do the all-or-nothing
> decision.

Here I mean the optimality for use in merge joins.

Thanks,

Best regards,
Etsuro Fujita


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp
Cc: pgsql-hackers(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2013-12-10 05:25:09
Message-ID: 20131210.142509.197538792.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thank you,

> > One is, you put the added code for getrelation_info() out of the block for
> > the condition (info->relam == BTREE_AM_OID) (though amcanorder would be
..
> By checking the following equation in build_index_paths(), the updated
> version of the patch guarantees that the result of an index scan is ordered:
>
> index_is_ordered = (index->sortopfamily != NULL);

Oops.. I forgot about it although many times I've seen...
You're right.

> > > Another is, you changed pathkeys expantion to be all-or-nothing decision.
> > > While this change should simplify the code slightly, it also dismisses
> > > the oppotunity for partially-extended pathkeys. Could you let me know
> > > the reason why you did so.
...
> > We might be able to partially-extend the original
> > pathkey list optimally in something significant, but that seems useless
> > complexity to me. So, I modified the patch to do the all-or-nothing
> > decision.
>
> Here I mean the optimality for use in merge joins.

Ok, I'll follow your advice. I agree on the point about merit vs
complexity.

I'm convinced of the validity of your patch. I'm satisfied with
it. Thank you.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <robertmhaas(at)gmail(dot)com>
Subject: Re: Get more from indices.
Date: 2013-12-10 07:05:56
Message-ID: 005401cef576$45bd4120$d137c360$@etsuro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kyotaro HORIGUCHI wrote:
> I'm convinced of the validity of your patch. I'm satisfied with it. Thank
> you.

Thank you for the reply.

Then, if there are no objections of others, I'll mark this as "Ready for
Committer".

Thanks,

Best regards,
Etsuro Fujita


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Etsuro Fujita'" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <robertmhaas(at)gmail(dot)com>
Subject: Re: Get more from indices.
Date: 2013-12-11 01:40:04
Message-ID: 002a01cef611$ea81b8d0$bf852a70$@etsuro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Kyotaro HORIGUCHI wrote:
> > I'm convinced of the validity of your patch. I'm satisfied with it.

> Then, if there are no objections of others, I'll mark this as "Ready for
> Committer".

Done.

Thanks,

Best regards,
Etsuro Fujita


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2013-12-31 21:01:34
Message-ID: 32033.1388523694@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> [ pathkey_and_uniqueindx_v7_20131203.patch ]

I started to look at this patch. I don't understand the reason for the
foreach loop in index_pathkeys_are_extensible (and the complete lack of
comments in the patch isn't helping). Isn't it sufficient to check that
the index is unique/immediate/allnotnull and its ordering is a prefix
of query_pathkeys? If not, what's the rationale for the specific tests
being made on the pathkeys --- this code doesn't make much sense to me.

regards, tom lane


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>, <robertmhaas(at)gmail(dot)com>
Subject: Re: Get more from indices.
Date: 2014-01-07 02:17:26
Message-ID: 01bd01cf0b4e$9b960ad0$d2c22070$@etsuro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> > [ pathkey_and_uniqueindx_v7_20131203.patch ]

> I started to look at this patch. I don't understand the reason for the
> foreach loop in index_pathkeys_are_extensible (and the complete lack of
> comments in the patch isn't helping). Isn't it sufficient to check that
> the index is unique/immediate/allnotnull and its ordering is a prefix of
> query_pathkeys? If not, what's the rationale for the specific tests being
> made on the pathkeys --- this code doesn't make much sense to me.

Thank you for taking time to look at this patch. I think it's not
sufficient to check those things. Let me explain the reason why this patch
has that code. The patch has that code in order to prevent
build_join_pathkeys() from building incorrect join pathkeys', where the
pathkeys for a join relation constructed by mergejoin or nestloop join are
built normally just by using the outer path's pathkeys. Without that code,
the patch would produce an incorrect result for such a case. An example
will be shown below. A simple approach to avoid this problem would be to
apply this idea only when each pathkey in query_pathkeys references the
indexed relation in addition to that the index is
unique/immediate/allnotnull and its ordering is a prefix of query_pathkeys.
That's the reason.

[Data]
CREATE TABLE t (a int not null, b int not null, c int, d text);
CREATE UNIQUE INDEX i_t_ab ON t (a, b);
INSERT INTO t (SELECT a / 5, 4 - (a % 5), a, 't' FROM
generate_series(000000, 099999) a);
ANALYZE t;
CREATE TABLE t2 (e text, f int);
INSERT INTO t2 VALUES ('t', 2);
INSERT INTO t2 VALUES ('t', 1);
ANALYZE t2;

[Query]
EXPLAIN SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e ORDER BY t.a, t.b,
t.c, t.d, t2.f LIMIT 4;
QUERY PLAN
----------------------------------------------------------------------------
----
Limit (cost=0.29..3.96 rows=4 width=20)
-> Nested Loop (cost=0.29..110.17 rows=120 width=20)
Join Filter: (t.d = t2.e)
-> Index Scan using i_t_ab on t (cost=0.29..107.34 rows=60
width=14)
Index Cond: (a < 10)
-> Materialize (cost=0.00..1.03 rows=2 width=6)
-> Seq Scan on t2 (cost=0.00..1.02 rows=2 width=6)
(7 rows)

SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e ORDER BY t.a, t.b, t.c,
t.d, t2.f LIMIT 4;
a | b | c | d | e | f
---+---+---+---+---+---
0 | 0 | 4 | t | t | 2
0 | 0 | 4 | t | t | 1
0 | 1 | 3 | t | t | 2
0 | 1 | 3 | t | t | 1
(4 rows)

(Note the column f is sorted in the descending order.)

Sorry for the delay.

Best regards,
Etsuro Fujita


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2014-01-07 03:18:06
Message-ID: 29637.1389064686@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> Thank you for taking time to look at this patch. I think it's not
> sufficient to check those things. Let me explain the reason why this patch
> has that code. The patch has that code in order to prevent
> build_join_pathkeys() from building incorrect join pathkeys', where the
> pathkeys for a join relation constructed by mergejoin or nestloop join are
> built normally just by using the outer path's pathkeys. Without that code,
> the patch would produce an incorrect result for such a case.

Ah, thanks for the example. ISTM that really the issue is that if an
originally-unique row is "expanded" into multiple rows, those rows are
sort peers so far as the unique-index column(s) are concerned, and so
now the lower-order ORDER BY columns do matter after all.

The problem is that joining isn't the only way that such expansion can
happen. Set-returning functions in the targetlist are another way,
and I'm not sure that there aren't others. Here's an example that
I'm pretty sure breaks your patch (though I didn't actually reinstall
the patch to try it):

create or replace function rev(n int) returns setof int language plpgsql
as 'begin for i in reverse n..1 loop return next i; end loop; end';

create table tt (f1 int primary key, f2 int);

insert into tt values (1,2), (2,3);

select f1, rev(f2) from tt order by 1,2;

Also, even if the row-expansion mechanism is a join, it's certainly
insufficient to check that the lower-order sort column is an expression
in variables of the index's table. Something like "f2 + random()" is
going to need an explicit sort step anyway.

These particular objections could be worked around by checking for
set-returning functions and volatile functions in the lower-order
ORDER BY expressions. But I have to say that I think I'm losing
faith in the entire idea. I have little confidence that there
aren't other cases that will break it.

regards, tom lane


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2014-01-07 04:45:56
Message-ID: 20140107.134556.94242730.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

> Tom Lane wrote:
> > I started to look at this patch. I don't understand the reason for the
> > foreach loop in index_pathkeys_are_extensible (and the complete lack of
> > comments in the patch isn't helping). Isn't it sufficient to check that
> > the index is unique/immediate/allnotnull and its ordering is a prefix of
> > query_pathkeys? If not, what's the rationale for the specific tests being
> > made on the pathkeys --- this code doesn't make much sense to me.
>
> Thank you for taking time to look at this patch. I think it's not
> sufficient to check those things. Let me explain the reason why this patch
> has that code. The patch has that code in order to prevent
> build_join_pathkeys() from building incorrect join pathkeys', where the
> pathkeys for a join relation constructed by mergejoin or nestloop join are
> built normally just by using the outer path's pathkeys. Without that code,
> the patch would produce an incorrect result for such a case. An example
> will be shown below.

> A simple approach to avoid this problem would be to
> apply this idea only when each pathkey in query_pathkeys references the
> indexed relation in addition to that the index is
> unique/immediate/allnotnull and its ordering is a prefix of query_pathkeys.
> That's the reason.

Utterly disregarding the chances of joins - the patch (v7)
already does so in some extent, ignoring the possibility of
partial extension for multi-table'd pathkeys - it is also
avoidable by simply passing a boolean
'extend_pathkeys_if_possible', or splitting into two functions
regarding the boolean. The check was not a yes-or-no decision but
a how-long-it-can-be-extended measuring in the previous version
(pathkey_and_uniqueindx_v5). It has been simplified and splitted
out as individual function after.

> [Data]
> CREATE TABLE t (a int not null, b int not null, c int, d text);
> CREATE UNIQUE INDEX i_t_ab ON t (a, b);
> INSERT INTO t (SELECT a / 5, 4 - (a % 5), a, 't' FROM
> generate_series(000000, 099999) a);
> ANALYZE t;
> CREATE TABLE t2 (e text, f int);
> INSERT INTO t2 VALUES ('t', 2);
> INSERT INTO t2 VALUES ('t', 1);
> ANALYZE t2;
>
> [Query]
> EXPLAIN SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e ORDER BY t.a, t.b,
> t.c, t.d, t2.f LIMIT 4;
> QUERY PLAN
> ----------------------------------------------------------------------------
> ----
> Limit (cost=0.29..3.96 rows=4 width=20)
> -> Nested Loop (cost=0.29..110.17 rows=120 width=20)
> Join Filter: (t.d = t2.e)
> -> Index Scan using i_t_ab on t (cost=0.29..107.34 rows=60
> width=14)
> Index Cond: (a < 10)
> -> Materialize (cost=0.00..1.03 rows=2 width=6)
> -> Seq Scan on t2 (cost=0.00..1.02 rows=2 width=6)
> (7 rows)
>
> SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e ORDER BY t.a, t.b, t.c,
> t.d, t2.f LIMIT 4;
> a | b | c | d | e | f
> ---+---+---+---+---+---
> 0 | 0 | 4 | t | t | 2
> 0 | 0 | 4 | t | t | 1
> 0 | 1 | 3 | t | t | 2
> 0 | 1 | 3 | t | t | 1
> (4 rows)
>
> (Note the column f is sorted in the descending order.)
>
> Sorry for the delay.
>
> Best regards,
> Etsuro Fujita

With best wishes for a happy New Yaar.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2014-01-07 05:59:00
Message-ID: 20140107.145900.196068363.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

tgl> The problem is that joining isn't the only way that such expansion can
tgl> happen. Set-returning functions in the targetlist are another way,
tgl> and I'm not sure that there aren't others. Here's an example that
tgl> I'm pretty sure breaks your patch (though I didn't actually reinstall
tgl> the patch to try it):
tgl>
tgl> create or replace function rev(n int) returns setof int language plpgsql
tgl> as 'begin for i in reverse n..1 loop return next i; end loop; end';
tgl>
tgl> create table tt (f1 int primary key, f2 int);
tgl>
tgl> insert into tt values (1,2), (2,3);
tgl>
tgl> select f1, rev(f2) from tt order by 1,2;
tgl>
tgl> Also, even if the row-expansion mechanism is a join, it's certainly
tgl> insufficient to check that the lower-order sort column is an expression
tgl> in variables of the index's table. Something like "f2 + random()" is
tgl> going to need an explicit sort step anyway.
tgl>
tgl> These particular objections could be worked around by checking for
tgl> set-returning functions and volatile functions in the lower-order
tgl> ORDER BY expressions. But I have to say that I think I'm losing
tgl> faith in the entire idea. I have little confidence that there
tgl> aren't other cases that will break it.

I think that the required condition for all these ordering
problems is generating multiple rows for single input (for a
value of any column of the same table).

If a prefixing set of values correspond to a unique index appears
only once in a result, the result must can be fully-ordered by
the extended pathkeys. This is what this patch stands on. It
never be broken while the precondition is satisfied... I think.

On the other hand, the precondition should be satisfied when all
extended columns are simple Vars of the target table. I believe
Vars cannot produce multiple result. More close checking for
every possibility could be done but it should be a overdone.

The following modification to v7 does this.

=========
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 380f3ba..233e21c 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -536,7 +536,8 @@ index_pathkeys_are_extensible(PlannerInfo *root,
{
EquivalenceMember *member = (EquivalenceMember *) lfirst(lc2);

- if (!bms_equal(member->em_relids, index->rel->relids))
+ if (!bms_equal(member->em_relids, index->rel->relids) ||
+ !IsA(member, Var))
continue;
else
{
==========

The result is

postgres=# select f1, rev(f2) from tt order by 1, 2;
f1 | rev
----+-----
1 | 1
1 | 2
2 | 1
2 | 2
2 | 3
==========

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2014-01-07 06:25:56
Message-ID: 20140107.152556.129570457.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I think that the required condition for all these ordering

Careless wording. the 'required' condition above means 'necessary
(but not sufficient)' condition so I think the lack of the
precondition (=multiple rows) results in prevention of the
postcondition = ordering problems.

> problems is generating multiple rows for single input (for a
> value of any column of the same table).
>
> If a prefixing set of values correspond to a unique index appears
> only once in a result, the result must can be fully-ordered by
> the extended pathkeys. This is what this patch stands on. It
> never be broken while the precondition is satisfied... I think.
>
> On the other hand, the precondition should be satisfied when all
> extended columns are simple Vars of the target table. I believe
> Vars cannot produce multiple result. More close checking for
> every possibility could be done but it should be a overdone.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <robertmhaas(at)gmail(dot)com>
Subject: Re: Get more from indices.
Date: 2014-01-07 07:45:59
Message-ID: 01d501cf0b7c$816d06d0$84471470$@etsuro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kyotaro HORIGUCHI wrote:
> tgl> The problem is that joining isn't the only way that such expansion
> tgl> can happen. Set-returning functions in the targetlist are another
> tgl> way, and I'm not sure that there aren't others. Here's an example
> tgl> that I'm pretty sure breaks your patch (though I didn't actually
> tgl> reinstall the patch to try it):

> tgl> Also, even if the row-expansion mechanism is a join, it's certainly
> tgl> insufficient to check that the lower-order sort column is an
> tgl> expression in variables of the index's table. Something like "f2 +
> tgl> random()" is going to need an explicit sort step anyway.

> tgl> These particular objections could be worked around by checking for
> tgl> set-returning functions and volatile functions in the lower-order
> tgl> ORDER BY expressions. But I have to say that I think I'm losing
> tgl> faith in the entire idea. I have little confidence that there
> tgl> aren't other cases that will break it.

> On the other hand, the precondition should be satisfied when all extended
> columns are simple Vars of the target table. I believe Vars cannot produce
> multiple result. More close checking for every possibility could be done
> but it should be a overdone.

> The following modification to v7 does this.

It seems to me that this would be reasonable at least as the first step and
that it would be better to leave more close checking for future work.

Thanks,

Best regards,
Etsuro Fujita


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2014-01-07 21:00:30
Message-ID: 11242.1389128430@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> The following modification to v7 does this.

> =========
> diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
> index 380f3ba..233e21c 100644
> --- a/src/backend/optimizer/path/pathkeys.c
> +++ b/src/backend/optimizer/path/pathkeys.c
> @@ -536,7 +536,8 @@ index_pathkeys_are_extensible(PlannerInfo *root,
> {
> EquivalenceMember *member = (EquivalenceMember *) lfirst(lc2);

> - if (!bms_equal(member->em_relids, index->rel->relids))
> + if (!bms_equal(member->em_relids, index->rel->relids) ||
> + !IsA(member, Var))
> continue;
> else
> {
> ==========

I'm pretty sure that IsA test prevents the optimization from ever firing.

But aside from hasty typos, is there enough left of this optimization to
be worth the complication?

regards, tom lane


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2014-01-08 04:22:57
Message-ID: 20140108.132257.201466486.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

tgl> I'm pretty sure that IsA test prevents the optimization from ever firing.

Thank you.

tgl> But aside from hasty typos,

Oops! I've picked up wrong node. It always denies pathkeys extension.

| !IsA(member, Var))

is a mistake of the following.

| !IsA(member->em_expr, Var))

tgl> is there enough left of this optimization to
tgl> be worth the complication?

Certainly.

This patch is intended to be with my another UNION-ALL
optimization pathce at first. It does very much with
UNION-ORDERBY-LIMIT and also with UNION_ALL(Partitioned
tables)-DISTINCT-ORDERBY-LIMIT.

===== test tables
create table pu1 (a int not null, b int not null, c int, d text);
create unique index i_pu1_ab on pu1 (a, b);
create table cu11 (like pu1 including all) inherits (pu1);
create table cu12 (like pu1 including all) inherits (pu1);
create table pu2 (like pu1 including all);
create table cu21 (like pu2 including all) inherits (pu2);
create table cu22 (like pu2 including all) inherits (pu2);
insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' from generate_series(000000, 099999) a);
insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' from generate_series(100000, 199999) a);
insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21' from generate_series(200000, 299999) a);
insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22' from generate_series(300000, 399999) a);
====

Following example is an implicit union derived from partitioned
tables created as above. Limit is added to highlight the effect.

9.4dev with no patch,

| =# explain (analyze on, costs off) select distinct * from pu1 order by a, b limit 100;
| QUERY PLAN
| ----------------------------------------------------------------------------
| Limit (actual time=246.653..246.730 rows=100 loops=1)
| -> Unique (actual time=246.646..246.713 rows=100 loops=1)
| -> Sort (actual time=246.645..246.666 rows=100 loops=1)
| Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
| Sort Method: external sort Disk: 5280kB
| -> Append (actual time=0.017..52.608 rows=200000 loops=1)
| -> Seq Scan on pu1 (actual time=0.001..0.001 rows=0 loops=1)
| -> Seq Scan on cu11 (actual time=0.015..17.047 rows=100000 loops=1)
| -> Seq Scan on cu12 (actual time=0.007..15.474 rows=100000 loops=1)
| Total runtime: 247.854 ms

Fairly common query. It will get following plan with this patch.

| =# explain (analyze on, costs off) select distinct * from pu1 order by a, b limit 100;
| QUERY PLAN
| -----------------------------------------------------------------------------
| Limit (actual time=0.108..0.350 rows=100 loops=1)
| -> Unique (actual time=0.104..0.329 rows=100 loops=1)
| -> Merge Append (actual time=0.103..0.214 rows=100 loops=1)
| Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
| -> Index Scan .. on pu1 (actual time=0.003..0.003 rows=0 loops=1)
| -> Index Scan .. on cu11 (actual time=0.049..0.110 rows=100 loops=1)
| -> Index Scan .. on cu12 (actual time=0.044..0.044 rows=1 loops=1)
| Total runtime: 0.666 ms

The same could be said on union with my another union-pullup
patch. We will get the following result with only the
union-pullup patch.

|=# explain (analyze on, costs off) select * from cu11 union select * from cu12 union select * from cu21 union select * from cu22 order by a limit 10000;
| QUERY PLAN
|---------------------------------------------------------------------------
| Limit (actual time=474.825..482.326 rows=10000 loops=1)
| -> Unique (actual time=474.824..481.357 rows=10000 loops=1)
| -> Sort (actual time=474.823..477.101 rows=10000 loops=1)
| Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
| Sort Method: external sort Disk: 10568kB
| -> Append (actual time=0.018..99.310 rows=400000 loops=1)
| -> Seq Scan on cu11 (actual time=0.017..16.263 rows=100000 loops=1)
| -> Seq Scan on cu12 (actual time=0.006..14.831 rows=100000 loops=1)
| -> Seq Scan on cu21 (actual time=0.006..14.766 rows=100000 loops=1)
| -> Seq Scan on cu22 (actual time=0.006..14.721 rows=100000 loops=1)
| Total runtime: 484.555 ms

This is also a quite common operation but implicit DISTINCT
prevents planner from selecting index scans. (ORDER BY is not
essential but needed because planner does not consult
distinct_pathkeys on planning scans and I've left it as it is.)

The planner gives the following plan with this patch.

| =# explain (analyze on, costs off) select * from cu11 union select * from cu12 union select * from cu21 union select * from cu22 order by a limit 10000;
| QUERY PLAN
| -------------------------------------------------------------------------
| Limit (actual time=0.197..14.739 rows=10000 loops=1)
| -> Unique (actual time=0.191..13.527 rows=10000 loops=1)
| -> Merge Append (actual time=0.191..7.894 rows=10000 loops=1)
| Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
| -> Index Scan .. on cu11 (actual time=0.054..4.676 rows=10000 loops=1)
| -> Index Scan .. on cu12 (actual time=0.045..0.045 rows=1 loops=1)
| -> Index Scan .. on cu21 (actual time=0.044..0.044 rows=1 loops=1)
| -> Index Scan .. on cu22 (actual time=0.043..0.043 rows=1 loops=1)
| Total runtime: 15.872 ms

This seems for me worth enough to do.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2014-01-14 09:08:10
Message-ID: 20140114.180810.122352231.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello, since CF4 is already closed but this patch ramains marked
as 'Ready for Committer', please let me re-post the latest
version for CF4 to get rid of vanishing :-p

> tgl> But aside from hasty typos,
>
> Oops! I've picked up wrong node. It always denies pathkeys extension.
>
> | !IsA(member, Var))
>
> is a mistake of the following.
>
> | !IsA(member->em_expr, Var))

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
pathkey_and_uniqueindx_v8_20140114.patch text/x-patch 5.4 KB

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2014-02-25 08:36:03
Message-ID: 20140225.173603.53979405.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello.

May I mark this patch as "Ready for Committer" by myself since
this was already marked so in last CF3 and have had no objection
or new suggestion in this CF4 for more than a month?

At Tue, 14 Jan 2014 18:08:10 +0900, Kyotaro HORIGUCHI wrote
> Hello, since CF4 is already closed but this patch ramains marked
CF3
> as 'Ready for Committer', please let me re-post the latest
> version for CF4 to get rid of vanishing :-p
>

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Get more from indices.
Date: 2014-02-25 21:35:55
Message-ID: CA+TgmoZAZRHwR6huUrNYbTipCbyvnPb+c5wh5BTEJq8m-m0FMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 25, 2014 at 3:36 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> May I mark this patch as "Ready for Committer" by myself since
> this was already marked so in last CF3 and have had no objection
> or new suggestion in this CF4 for more than a month?

Sounds fair.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2014-02-26 04:02:05
Message-ID: 20140226.130205.40819588.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

At Tue, 25 Feb 2014 16:35:55 -0500, Robert Haas wrote
> On Tue, Feb 25, 2014 at 3:36 AM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > May I mark this patch as "Ready for Committer" by myself since
> > this was already marked so in last CF3 and have had no objection
> > or new suggestion in this CF4 for more than a month?
>
> Sounds fair.

Thank you, I will send this patch to commtters after waiting
another few days for more suggestions.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2014-03-03 10:03:18
Message-ID: 20140303.190318.149980903.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I marked this patch as 'Ready for Committer' by myself according
to the following discussion.

Thanks.

> At Tue, 25 Feb 2014 16:35:55 -0500, Robert Haas wrote
> > On Tue, Feb 25, 2014 at 3:36 AM, Kyotaro HORIGUCHI
> > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > > May I mark this patch as "Ready for Committer" by myself since
> > > this was already marked so in last CF3 and have had no objection
> > > or new suggestion in this CF4 for more than a month?
> >
> > Sounds fair.
>
> Thank you, I will send this patch to commtters after waiting
> another few days for more suggestions.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: robertmhaas(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2014-03-10 07:21:39
Message-ID: 20140310.162139.201355356.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oops! I found a bug in this patch. The previous v8 patch missed
the case that build_index_pathkeys() could build a partial
pathkeys from the index tlist.

This causes the situation follows,

=======
=# \d cu11
Table "public.cu11"
Column | Type | Modifiers
--------+---------+-----------
a | integer | not null
b | integer | not null
c | integer |
d | text |
Indexes:
"cu11_a_b_idx" UNIQUE, btree (a, b)

s=# explain (costs off) select * from cu11 order by a, c ,d;
QUERY PLAN
---------------------------------------
Index Scan using cu11_a_b_idx on cu11
(1 row)
=======

Where the simple ORDER BY a, c, d on the table with index on
columns (a, b) results simple index scan which cannot perform the
order.

The attached v9 patche is fixed by adding a check for the case
into index_pathkeys_are_extensible(), and rebase to current HEAD.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
pathkey_and_uniqueindx_v9_20130310.patch text/x-patch 5.6 KB

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2014-04-07 08:45:53
Message-ID: 534265C1.9080607@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Horiguchi-san,

Sorry for not reviewing this patch in the last CF.

(2014/03/10 16:21), Kyotaro HORIGUCHI wrote:
> Oops! I found a bug in this patch. The previous v8 patch missed
> the case that build_index_pathkeys() could build a partial
> pathkeys from the index tlist.
>
> This causes the situation follows,
>
> =======
> =# \d cu11
> Table "public.cu11"
> Column | Type | Modifiers
> --------+---------+-----------
> a | integer | not null
> b | integer | not null
> c | integer |
> d | text |
> Indexes:
> "cu11_a_b_idx" UNIQUE, btree (a, b)
>
> s=# explain (costs off) select * from cu11 order by a, c ,d;
> QUERY PLAN
> ---------------------------------------
> Index Scan using cu11_a_b_idx on cu11
> (1 row)
> =======
>
> Where the simple ORDER BY a, c, d on the table with index on
> columns (a, b) results simple index scan which cannot perform the
> order.
>
> The attached v9 patche is fixed by adding a check for the case
> into index_pathkeys_are_extensible(), and rebase to current HEAD.

Good catch!

ISTM that the biggest concern about this patch would be whether it's
worth complicating the code, because the range of application of the
patch is not so wide as is. So, all we need to do is show important use
cases that prove the effectiveness of the patch. Sorry, I can't come up
with a good idea, though.

Thanks,

Best regards,
Etsuro Fujita


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: robertmhaas(at)gmail(dot)com, fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2014-04-09 15:08:49
Message-ID: 29813.1397056129@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> Oops! I found a bug in this patch. The previous v8 patch missed
> the case that build_index_pathkeys() could build a partial
> pathkeys from the index tlist.

TBH I think that's barely the tip of the iceberg of cases where this
patch will get the wrong answer. It looks like it thinks that *any*
Var belonging to the indexed relation must be the same one indexed
by the index. Also, I don't see it doing anything to check the ordering
of multiple index columns --- for instance, an index on (a,b) and one
on (b,a) would both pass or fail identically, though surely only one
should match "ORDER BY a,b,...". Also, what's with the success return
before the loop:

+ if (list_length(pathkeys) == list_length(root->query_pathkeys))
+ return true;

At this point you haven't proven *anything at all* about whether the
index columns have something to do with the query_pathkeys.

regards, tom lane


From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2014-04-10 09:41:03
Message-ID: 5346672F.1060202@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2014/04/10 0:08), Tom Lane wrote:
> Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
>> Oops! I found a bug in this patch. The previous v8 patch missed
>> the case that build_index_pathkeys() could build a partial
>> pathkeys from the index tlist.
>
> TBH I think that's barely the tip of the iceberg of cases where this
> patch will get the wrong answer.

> Also, I don't see it doing anything to check the ordering
> of multiple index columns

I think that the following code in index_pathkeys_are_extensible() would
check the ordering:

+ if (!pathkeys_contained_in(pathkeys, root->query_pathkeys))
+ return false;

> Also, what's with the success return
> before the loop:
>
> + if (list_length(pathkeys) == list_length(root->query_pathkeys))
> + return true;
>
> At this point you haven't proven *anything at all* about whether the
> index columns have something to do with the query_pathkeys.

I think that the two pathkeys would be proved to be equal, if the both
conditions are satisfied.

Thanks,

Best regards,
Etsuro Fujita


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2014-04-10 13:25:31
Message-ID: 13118.1397136331@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> (2014/04/10 0:08), Tom Lane wrote:
>> TBH I think that's barely the tip of the iceberg of cases where this
>> patch will get the wrong answer.

>> Also, I don't see it doing anything to check the ordering
>> of multiple index columns

> I think that the following code in index_pathkeys_are_extensible() would
> check the ordering:
> + if (!pathkeys_contained_in(pathkeys, root->query_pathkeys))
> + return false;

Hm ... if you're relying on that, then what's the point of the new loop
at all?

regards, tom lane


From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2014-04-11 02:38:01
Message-ID: 53475589.2060905@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2014/04/10 22:25), Tom Lane wrote:
> Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> writes:
>> (2014/04/10 0:08), Tom Lane wrote:
>>> TBH I think that's barely the tip of the iceberg of cases where this
>>> patch will get the wrong answer.
>
>>> Also, I don't see it doing anything to check the ordering
>>> of multiple index columns
>
>> I think that the following code in index_pathkeys_are_extensible() would
>> check the ordering:
>> + if (!pathkeys_contained_in(pathkeys, root->query_pathkeys))
>> + return false;
>
> Hm ... if you're relying on that, then what's the point of the new loop
> at all?

The point is that from the discussion [1], we allow the index pathkeys
to be extended to query_pathkeys if each *remaining* pathkey in
query_pathkey is a Var belonging to the indexed relation. The code is
confusing, though. Sorry, that is my faults.

Thanks,

[1] http://www.postgresql.org/message-id/29637.1389064686@sss.pgh.pa.us

Best regards,
Etsuro Fujita


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2014-04-11 05:12:40
Message-ID: 20140411.141240.257315020.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, sorry for the absense. I've been back.

Attached is the patch following the discussion below.

> >> (2014/04/10 0:08), Tom Lane wrote:
> >>> TBH I think that's barely the tip of the iceberg of cases where this
> >>> patch will get the wrong answer.
> >
> >>> Also, I don't see it doing anything to check the ordering
> >>> of multiple index columns
> >
> >> I think that the following code in index_pathkeys_are_extensible() would
> >> check the ordering:
> >> + if (!pathkeys_contained_in(pathkeys, root->query_pathkeys))
> >> + return false;
> >
> > Hm ... if you're relying on that, then what's the point of the new loop
> > at all?
>
> The point is that from the discussion [1], we allow the index pathkeys
> to be extended to query_pathkeys if each *remaining* pathkey in
> query_pathkey is a Var belonging to the indexed relation. The code is
> confusing, though. Sorry, that is my faults.

Hmm, I found that the iterations for the part that already
checked by pathkeys_contained_in are not only needless but a bit
heavy. And the loop seems a little verbose. I did for the patch,
in index_pathkeys_are_extensible,

- Avoiding duplicate check with pathkeys_contained_in.

I put similar code to list_nth_cell since it is not exposed
outside of list.c.

- Add comment to clarify the purpose of the loop.

- Simplify the check for the "remaining" keycolumns

> Thanks,
>
> [1] http://www.postgresql.org/message-id/29637.1389064686@sss.pgh.pa.us
>
> Best regards,
> Etsuro Fujita
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2014-04-11 05:31:32
Message-ID: 20140411.143132.30735259.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

# Sorry for accidentialy sending the previous mail unfinished.

## ...and I seem to have bombed uncertain files off out of my
## home directory by accident, too :(

=====
Hi, sorry for the absense. I've been back.

Thank you for continuing this discussion.

Attached is the patch following the discussion below.

> >> (2014/04/10 0:08), Tom Lane wrote:
> >>> TBH I think that's barely the tip of the iceberg of cases where this
> >>> patch will get the wrong answer.
> >
> >>> Also, I don't see it doing anything to check the ordering
> >>> of multiple index columns
> >
> >> I think that the following code in index_pathkeys_are_extensible() would
> >> check the ordering:
> >> + if (!pathkeys_contained_in(pathkeys, root->query_pathkeys))
> >> + return false;
> >
> > Hm ... if you're relying on that, then what's the point of the new loop
> > at all?
>
> The point is that from the discussion [1], we allow the index pathkeys
> to be extended to query_pathkeys if each *remaining* pathkey in
> query_pathkey is a Var belonging to the indexed relation. The code is
> confusing, though. Sorry, that is my faults.

Hmm, I found that the iterations for the part that already
checked by pathkeys_contained_in are not only useless but a bit
heavy. And the loop seems a little verbose. I did for the patch,
in index_pathkeys_are_extensible,

- Avoiding duplicate check with pathkeys_contained_in.

I put similar code to list_nth_cell since it is not exposed
outside of list.c.

- Add comment to clarify the purpose of the loop.

- Simplify the check for the "remaining" keycolumns

I think this makes some things clearer.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
pathkey_and_uniqueindx_v10_20130411.patch text/x-patch 5.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2014-04-15 22:10:17
Message-ID: 5212.1397599817@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> [ pathkey_and_uniqueindx_v10_20130411.patch ]

I thought some more about this patch, and realized that it's more or less
morally equivalent to allowing references to ungrouped variables when the
query has a GROUP BY clause listing all the columns of the primary key.
In that case the parser is effectively pretending that the GROUP BY list
contains additional implicit entries that are functionally dependent on
the entries that are actually there. In this patch, what we want to do
is recognize that trailing entries in an ORDER BY list are semantically
no-ops and can be ignored because they are functionally dependent on
earlier entries.

Now, the reason that the parser restricts the functional dependency
deduction to a primary key is that it wants to be able to identify a
constraint OID that the query is dependent on to be semantically valid.
In this case, we don't need such an OID, so just finding any old unique
index on not-null columns is good enough. (If someone drops the index,
the optimization might become incorrect, but that would force replanning
anyway.)

However, this way of thinking about it shows that the patch is missing
possible optimizations. If we have "ORDER BY a, b, c" and (a,b) is the
primary key, then including c in the ORDER BY list is semantically
redundant, *whether or not we use an indexscan on the pkey index at all*.
More: if we have "ORDER BY a, b, c" and the primary key is (b,a), we
can still discard c from the sort requirement, even though the pkey
index as such isn't helpful for producing the required order.

So hacking up the pathkeys attributed to the indexscan is the wrong thing.
Rather, what we should be looking to do is decide that c is a useless
pathkey and remove it from the query_pathkeys, much as we'd do if we found
"c = constant" in WHERE. That would allow optimization of other query
plans besides scan-the-pkey-index plans.

regards, tom lane


From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2014-04-18 08:46:44
Message-ID: 20140418.174644.24190222.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

> I thought some more about this patch, and realized that it's more or less
> morally equivalent to allowing references to ungrouped variables when the
> query has a GROUP BY clause listing all the columns of the primary key.
> In that case the parser is effectively pretending that the GROUP BY list
> contains additional implicit entries that are functionally dependent on
> the entries that are actually there. In this patch, what we want to do
> is recognize that trailing entries in an ORDER BY list are semantically
> no-ops and can be ignored because they are functionally dependent on
> earlier entries.

Ah, that sounds smarter than extending pathekys. I feel it preferable.

> Now, the reason that the parser restricts the functional dependency
> deduction to a primary key is that it wants to be able to identify a
> constraint OID that the query is dependent on to be semantically valid.
> In this case, we don't need such an OID, so just finding any old unique
> index on not-null columns is good enough. (If someone drops the index,
> the optimization might become incorrect, but that would force replanning
> anyway.)

Agreed,

> However, this way of thinking about it shows that the patch is missing
> possible optimizations. If we have "ORDER BY a, b, c" and (a,b) is the
> primary key, then including c in the ORDER BY list is semantically
> redundant, *whether or not we use an indexscan on the pkey index at all*.
> More: if we have "ORDER BY a, b, c" and the primary key is (b,a), we
> can still discard c from the sort requirement, even though the pkey
> index as such isn't helpful for producing the required order.

Hmm yes, it really seems expectable.

> So hacking up the pathkeys attributed to the indexscan is the wrong thing.
> Rather, what we should be looking to do is decide that c is a useless
> pathkey and remove it from the query_pathkeys, much as we'd do if we found
> "c = constant" in WHERE. That would allow optimization of other query
> plans besides scan-the-pkey-index plans.

Ok, I am convinced that your suggestion - truncating
query_pathkeys by removing eventually no-op entries - seems
preferable and will have wider effect naturally - more promised.

I won't persist with the way this patch currently does but the
new patch of course can't come up within this CF. I will agree if
you decide to make this patch 'Returned with Feedback'. (I feel a
little sad for 'Rejected' but you can justly do that if you think
that the patch comming up next is utterly different from this
one:()

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Get more from indices.
Date: 2014-04-18 13:43:27
Message-ID: 22533.1397828607@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> Ok, I am convinced that your suggestion - truncating
> query_pathkeys by removing eventually no-op entries - seems
> preferable and will have wider effect naturally - more promised.

> I won't persist with the way this patch currently does but the
> new patch of course can't come up within this CF. I will agree if
> you decide to make this patch 'Returned with Feedback'. (I feel a
> little sad for 'Rejected' but you can justly do that if you think
> that the patch comming up next is utterly different from this
> one:()

I marked it as "returned with feedback".

regards, tom lane