Re: Yet another abort-early plan disaster on 9.3

From: Greg Stark <stark(at)mit(dot)edu>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-10-19 22:24:51
Message-ID: CAM-w4HMfZuhYBzJ_sCr75G1KJLNX5pYr7uogkWQKfbey29p9cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Sat, Oct 18, 2014 at 6:01 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:

> Hmmm. I have 0 experience with handling patents and related issues. Any
> idea how to address that?

Well there's no real way to address it. But to summarize: 1) We should
not go searching for patents, knowing that something is patented
increases your liability if you end up infringing on it 2) You should
consult your lawyer for advise which he can give under attorney-client
privilege and not expose you to such liability. 3) The whole patent
system is fundamentally broken and serves only to protect incumbents
from innovative competitors :(

I think realistically software patents are so vague and cover so many
basic algorithms which are obvious to anyone in the field because
they're part of the basic knowledge taught in every algorithms course.
So Postgres is probably infringing on hundreds of patents which would
all easily be declared invalid if the owners ever sought to use them
to attack widely used free software.

That doesn't mean we should be specifically seeking out specific
algorithms that are known to have been published in papers coming out
of major proprietary database vendors though. That just seems like
asking for trouble. We should be looking for solutions that seem
obvious to us or were published 17+ years ago or were published by
people who specifically claim they're not patented.

> I think this will be very tricky, and in fact it may make the estimates
> much worse easily, because all the algorithms assume random sampling.

Well this is where Josh and I agree but come to different conclusions.
He wants to use more of each block so he can take larger samples in
the hopes it will produce better statistics. I want to use more of
each block so we can continue to take rigorously justified sample
sizes but without having to read such a large portion of the table.

Either way our current sampling method isn't meeting anyone's needs.
It requires you to do copious amounts of I/O which makes running
analyze after an upgrade or data load a major contributor to your
outage window.

>
> For example the ndistinct estimator

Yes, well, the ndistinct estimator just sucks. It's always going to
suck. Unless it has a chance to see every row of the table there's
absolutely no way it's ever going to not suck. Any attempt to estimate
ndistinct from a sample of any size is going to suck. That's just the
way it is.

Our sample sizes are based on the size needed to build the histogram.
That requires only a fairly small and slow growing sample though our
block based sampling inflates the amount of I/O that leads to.
ndistinct and MCV are a different kind of problem that would require a
proportional sample where the sample needs to grow linearly as the
table grows. To get a good estimate of ndistinct requires a large
enough proportion to effectively require reading the whole table. To
get decent MCV stats only requires a smaller proportion but it's still
a proportion that grows linearly which is going to be impractical for
large data.

There are two strategies for dealing with ndistinct that I can see
here. Either a) We decide that to estimate ndistinct we need to scan
the entire table and just make that a separate type of ANALYZE that
you run less frequently. Basically this is what we did with VACUUM and
indeed we could even invest in infrastructure like the FSM which
allows you to process only the changed pages so it can be
incrementally.

Or we decide gathering periodic statistics isn't the way to handle
ndistinct and instead watch the incoming inserts and outgoing deletes
and keep the statistic up to date incrementally. That can be done
though obviously it's a bit tricky. But there's tons of published
research on updating database statistics incrementally -- which brings
us back to patents...

> I think the 'minimal' stats (when we have just '=' for the type) behaves
> like this, but fixing it by switching to a two-pass approach should not
> be that difficult (but would cost a few more CPU cycles).
>
> Or do you suggest that even the scalar MCV algorithm behaves has this
> bias issue? I doubt that, because the MCV works with an array sorted by
> number of occurences, so the position within the table is irrelevant.

Hum. I'll have to reconstruct the tests I was doing back then and see
what was going on. I never really understand what was causing it.

What I observed was that if I increased the number of rows read per
sampled block then the MCV was more and more dominated by the rows
found early in the table. This was in a column where the values were
actually uniformly distributed so there should have been only any
values which happened to be sampled substantially more than the mean
frequency. They were integers though so should have been covered by
the sorting approach.

> I don't see why the counting bloom filter would be necessary, in a two
> pass approach?

I guess I was imagining it was not keeping all the values in memory
for at the same time. Isn't that the whole point of the lossy =
algorithm? But now that I think of it I don't understand why that
algorithm is needed at all.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-10-19 23:25:07 Re: pgcrypto: PGP signatures
Previous Message Greg Stark 2014-10-19 21:52:17 Re: UPSERT wiki page, and SQL MERGE syntax

Browse pgsql-performance by date

  From Date Subject
Next Message Laurent Martelli 2014-10-20 09:29:35 Re: IS NOT NULL and LEFT JOIN
Previous Message David Rowley 2014-10-19 08:41:57 Re: IS NOT NULL and LEFT JOIN