Re: [WIP] Better partial index-only scans

From: Joshua Yanovski <pythonesque(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [WIP] Better partial index-only scans
Date: 2014-03-17 07:14:49
Message-ID: CABz-M-GQo_vpNzK8MuRK9WHWqvF3vT0CGR8fzhB0fD_zfy6xqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here's a SQL script that (1) demonstrates the new index only scan
functionality, and (2) at least on my machine, has a consistently
higher planning time for the version with my change than without it.

On Sun, Mar 16, 2014 at 5:08 AM, Joshua Yanovski <pythonesque(at)gmail(dot)com> wrote:
> Proof of concept initial patch for enabling index only scans for
> partial indices even when an attribute is not in the target list, as
> long as it is only used in restriction clauses that can be proved by
> the index predicate. This also works for index quals, though they
> still can't be used in the target list. However, this patch may be
> inefficient since it duplicates effort that is currently delayed until
> after the best plan is chosen.
>
> The patch works by basically repeating the logic from
> create_indexscan_plan in createplan.c that determines which clauses
> can't be discarded, instead of the current approach, which just
> assumes that any attributes referenced anywhere in a restriction
> clause has to be a column in the relevant index. It should build
> against master and passes make check for me. It also includes a minor
> fix in the same code in createplan.c to make sure we're explicitly
> comparing an empty list to NIL, but I can take that out if that's not
> considered in scope. If this were the final patch I'd probably
> coalesce the code used in both places into a single function, but
> since I'm not certain that the implementation in check_index_only
> won't change substantially I held off on that.
>
> Since the original comment suggested that this was not done due to
> planner performance concerns, I assume the performance of this
> approach is unacceptable (though I did a few benchmarks and wasn't
> able to detect a consistent difference--what would be a good test for
> this?). As such, this is intended as more of a first pass that I can
> build on, but I wanted to get feedback at this stage on where we can
> improve (particularly if there were already ideas on how this might be
> done, as the comment hints). Index only scans cost less than regular
> index scans so I don't think we can get away with waiting until we've
> chosen the best plan before we do the work described above. That
> said, as I see it performance could improve in any combination of five
> ways:
> * Improve the performance of determining which clauses can't be
> discarded (e.g. precompute some information about equivalence classes
> for index predicates, mess around with the order in which we check the
> clauses to make it fail faster, switch to real union-find data
> structures for equivalence classes).
> * Find a cleverer way of figuring out whether a partial index can be
> used than just checking which clauses can't be discarded.
> * Use a simpler heuristic (that doesn't match what use to determine
> which clauses can be discarded, but still matches more than we do
> now).
> * Take advantage of work we do here to speed things up elsewhere (e.g.
> if this does get chosen as the best plan we don't need to recompute
> the same information in create_indexscan_plan).
> * Delay determining whether to use an index scan or index only scan
> until after cost analysis somehow. I'm not sure exactly what this
> would entail.
>
> Since this is my first real work with the codebase, I'd really
> appreciate it if people could help me figure out the best approach
> here (and, more importantly, if one is necessary based on benchmarks).
> And while this should go without saying, if this patch doesn't
> actually work then please let me know, since all the above is based on
> the assumption that what's there is enough :)
>
> Thanks,
> Joshua Yanovski

--
Josh

Attachment Content-Type Size
test_indexscan.sql application/octet-stream 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2014-03-17 07:24:55 Re: Re: BUG #9578: Undocumented behaviour for temp tables created inside query language (SQL) functions
Previous Message KONDO Mitsumasa 2014-03-17 05:39:44 Re: gaussian distribution pgbench