Re: Get more from indices.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp
Cc: pgsql-hackers(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com
Subject: Re: Get more from indices.
Date: 2013-11-26 11:42:42
Message-ID: 20131126.204242.141708563.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for pointing out.

> > the attched pathkey_and_uniqueindx_v4_20131122.patch is changed as
> > follows.
>
> The patch is compiled successfully and passes all regression tests. Also,
> the patch works well for the tests shown in an earlier email from
> Horiguchi-san. But in this version I found an incorrect behavior.
>
> postgres=# CREATE TABLE t (a int not null, b int not null, c int, d text);
> CREATE TABLE
> postgres=# CREATE UNIQUE INDEX i_t_ab ON t (a, b);
> CREATE INDEX
> postgres=# INSERT INTO t (SELECT a / 5, 4 - (a % 5), a, 't' FROM
> generate_series(000000, 099999) a);
> INSERT 0 100000
> postgres=# ANALYZE t;
> ANALYZE
> postgres=# CREATE TABLE t2 (e text, f int);
> CREATE TABLE
> postgres=# INSERT INTO t2 VALUES ('t', 2);
> INSERT 0 1
> postgres=# INSERT INTO t2 VALUES ('t', 1);
> INSERT 0 1
> postgres=# ANALYZE t2;
> ANALYZE
> postgres=# EXPLAIN SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e ORDER
> BY t.a, t.b, t.c, t.d, t2.f LIMIT 10;
> QUERY PLAN
> ----------------------------------------------------------------------------
> ----
> Limit (cost=0.29..9.30 rows=10 width=20)
- -> Sort (cost=92.33..92.57 rows=94 width=20)
- Sort Key: t.a, t.b, t.c, t.d, t2.f
> -> Nested Loop (cost=0.29..129.99 rows=144 width=20)
> Join Filter: (t.d = t2.e)
> -> Index Scan using i_t_ab on t (cost=0.29..126.80 rows=72
> width=14)
> Index Cond: (a < 10)
> -> Materialize (cost=0.00..1.03 rows=2 width=6)
> -> Seq Scan on t2 (cost=0.00..1.02 rows=2 width=6)
> (7 rows)

Auch. Outermost sort is omitted but necessary (Prefixed by '-' above).

> ISTM this was brought by the following change.
>
> > In truncate_useless_pathkeys, if query_pathkeys is longer than pathkeys
> > made from index columns old patch picked up the latter as IndexPath's
> > pathkeys. But the former is more suitable according to the context here.
>
> > - truncate_useless_pathkeys returns root->query_pathkeys when
> > the index is fully-ordered and query_pathkeys contains the
> > pathkeys made from index columns.

I implicitly put a wrong assumption that query_pathkeys is
dedicated for the relation under work, not whole subquery.

> I think it would be required to modify the patch so that the transformation
> of the pathkeys is performed only when each pathkey of query_pathkeys
> references the indexed relation.

What is wrong here is that IndexPath declares that it is ordered
in the pathkeys which includes the columns not belongs to the
table.

> (The above change might have been made according to my comment
> in an earlier email I sent. But I have to admit my explanation
> there was not adequate. I'm sorry for that.)

Nothing to be sorry for. It's my fault.

Anyway, I've put restriction on pathkeys_useful_for_ordering that
pick up longest pathkeys consists only ec members which matches
the required rel bitmap. With the new patch the planner makes
plan with the omitted sort and the query returns the following
result.

uniontest=# SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e
| ORDER BY t.a, t.b, t.c, t.d, t2.f LIMIT 10;
| a | b | c | d | e | f
| ---+---+---+---+---+---
| 0 | 0 | 4 | t | t | 1 <-+-- Correct order by t2.f.
| 0 | 0 | 4 | t | t | 2 <-+
| 0 | 1 | 3 | t | t | 1
(snipped.)

> Here are random comments.
>
> * In grouping_planner(), the patch resets the pathkeys of the cheapest
> presorted index-path to query_pathkeys through
> get_cheapest_fractional_path_for_pathkeys(). Is this necessary? ISTM the
> transformation of the pathkeys after the scan/join optimization would be no
> longer necessary once it has been done at the index-path creation time, ie,
> build_index_paths(). Why does the patch do that thing?

You're appliciated for pointing out. It is utterly useless code
of older patch. ("Have you totally scrapped the older verions?" >me :-(

Removed it in this version.

> * In get_relation_info(), the patch determines the branch condition
> "keyattno != ObjectIdAttributeNumber". I fail to understand the meaning of
> this branch condition. Could you explain about it?

Literally answering, it means oid cannot be NULL (if it exists).

Precisely, it reflects that I believed (or wished) it cannot be
NULL if oid column exists. (Other sysattrs are dismissed because
they are hardly to be a unique index column:-)

Oid in a tuple is retrived using HeapTupleGetOid() in
heap_getsysattr().

| result = ObjectIdGetDatum(HeapTupleGetOid(tup));

Then HeapTupleHeaderGetOid,

| #define HeapTupleGetOid(tuple) \
| HeapTupleHeaderGetOid((tuple)->t_data)

Finally asking HeapTupleHeaderGetOid for the value.

htup_details.h:354
| #define HeapTupleHeaderGetOid(tup) \
| ( \
| ((tup)->t_infomask & HEAP_HASOID) ? \
| *((Oid *) ((char *)(tup) + (tup)->t_hoff - sizeof(Oid))) \
| : \
| InvalidOid \
| )

So oid cannot be NULL after all even though can be InvalidOid.

On the otherhand, it can be NULL according to the definition in
heap.c.

heap.c:146
| static FormData_pg_attribute a2 = {
| 0, {"oid"}, OIDOID, 0, sizeof(Oid),
| ObjectIdAttributeNumber, 0, -1, -1,
| true, 'p', 'i', true, false, false, true, 0
| }; ~~~~

Can we set this to false safely?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
pathkey_and_uniqueindx_v5_20131126.patch text/x-patch 9.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-11-26 11:59:41 Re: Completing PL support for Event Triggers
Previous Message Heikki Linnakangas 2013-11-26 11:32:58 Re: Sequence Access Method WIP