Re: [WIP] Better partial index-only scans

From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Joshua Yanovski <pythonesque(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Better partial index-only scans
Date: 2014-06-29 18:55:24
Message-ID: 20140629185524.GA62670@hasegeli.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Joshua Yanovski <pythonesque(at)gmail(dot)com>:
> Proof of concept initial patch for enabling index only scans for
> partial indices even when an attribute is not in the target list, as
> long as it is only used in restriction clauses that can be proved by
> the index predicate. This also works for index quals, though they
> still can't be used in the target list. However, this patch may be
> inefficient since it duplicates effort that is currently delayed until
> after the best plan is chosen.

I find the improvement very useful. I use functional and partial
indexes, a lot. I think we even need a dummy index to make more use
of these features.

> The patch works by basically repeating the logic from
> create_indexscan_plan in createplan.c that determines which clauses
> can't be discarded, instead of the current approach, which just
> assumes that any attributes referenced anywhere in a restriction
> clause has to be a column in the relevant index. It should build
> against master and passes make check for me. It also includes a minor
> fix in the same code in createplan.c to make sure we're explicitly
> comparing an empty list to NIL, but I can take that out if that's not
> considered in scope. If this were the final patch I'd probably
> coalesce the code used in both places into a single function, but
> since I'm not certain that the implementation in check_index_only
> won't change substantially I held off on that.
>
> Since the original comment suggested that this was not done due to
> planner performance concerns, I assume the performance of this
> approach is unacceptable (though I did a few benchmarks and wasn't
> able to detect a consistent difference--what would be a good test for
> this?). As such, this is intended as more of a first pass that I can
> build on, but I wanted to get feedback at this stage on where we can
> improve (particularly if there were already ideas on how this might be
> done, as the comment hints). Index only scans cost less than regular
> index scans so I don't think we can get away with waiting until we've
> chosen the best plan before we do the work described above. That
> said, as I see it performance could improve in any combination of five
> ways:
> * Improve the performance of determining which clauses can't be
> discarded (e.g. precompute some information about equivalence classes
> for index predicates, mess around with the order in which we check the
> clauses to make it fail faster, switch to real union-find data
> structures for equivalence classes).
> * Find a cleverer way of figuring out whether a partial index can be
> used than just checking which clauses can't be discarded.
> * Use a simpler heuristic (that doesn't match what use to determine
> which clauses can be discarded, but still matches more than we do
> now).
> * Take advantage of work we do here to speed things up elsewhere (e.g.
> if this does get chosen as the best plan we don't need to recompute
> the same information in create_indexscan_plan).
> * Delay determining whether to use an index scan or index only scan
> until after cost analysis somehow. I'm not sure exactly what this
> would entail.

I do not know much about the internals of the planner. So, I am not
in a position to decide the performance is acceptable or not. If it
is not, I think your first solution would minimize the penalty in
almost all scenarios. Your other options seems harder to implement.

I think, it can also be useful to store predicates implied by the
index clause or the index predicate under the path tree. We may
make use of them in future improvements. Especially it would be
nice to avoid calling expensive functions if they are included in
an index. This approach can also simplify the design. The predicates
can be stored under IndexPath even index only scan is disabled.
They can be used used unconditionally on create_indexscan_plan().

I will update the patch as returned with feedback, but it would
be nice if someone more experienced give an opinion about these
ideas. I would be happy to review further developments when they
are ready. Please let me know if I can help any other way.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Patrick Finnegan 2014-06-29 18:58:56 Re: PostgreSQL for VAX on NetBSD/OpenBSD
Previous Message Robert Haas 2014-06-29 18:52:13 Re: idle_in_transaction_timeout