Re: WIP: collect frequency statistics for arrays

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: collect frequency statistics for arrays
Date: 2011-06-13 04:16:43
Message-ID: BANLkTikX+iWCjKMUbkSjGv=K-C+cKSj5gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 12, 2011 at 12:17 PM, Alexander Korotkov
<aekorotkov(at)gmail(dot)com> wrote:
> I'm worrying about perfomance of "column <@ const" estimation. It takes
> O(m*(n+m)) of time, where m - const length and n - statistics target.
> Probably, it can be too slow is some some cases.

Hmm, that doesn't sound terribly promising. The best thing to do here
is probably to construct a pessimal case and test it. So, say, look
for a column <@ a 100-element array with a statistics target of 400...
once you know how bad it is, then we can make intelligent decisions
about where to go with it.

If the data type is hashable, you could consider building a hash table
on the MCVs and then do a probe for each element in the array. I
think that's better than the other way around because there can't be
more than 10k MCVs, whereas the input constant could be arbitrarily
long. I'm not entirely sure whether this case is important enough to
be worth spending a lot of code on, but then again it might not be
that much code.

Another option is to bound the number of operations you're willing to
perform to some reasonable limit, say, 10 * default_statistics_target.
Work out ceil((10 * default_statistics_target) /
number-of-elements-in-const) and consider at most that many MCVs.
When this limit kicks in you'll get a less-accurate selectivity
estimate, but that's a reasonable price to pay for not blowing out
planning time.

>> At the moment I see no tests. If this code will be exercised by
>> existing tests then you should put some notes with the patch to
>> explain that, or at least provide some pointers as to how I might test
>> this.
>
> I didn't find in existing tests which check selectivity estimation accuracy.
> And I found difficult to create them because regression tests gives binary
> result while estimation accuracy is quantitative value. Existing regression
> tests covers case if typanalyze or selectivity estimation function falls
> down. I've added "ANALYZE array_op_test;" line into array test in order to
> these tests covers falldown case for this patch functions too.
> Seems that, selectivity estimation accuracy should be tested manually on
> various distributions. I've done very small amount of such tests.
> Unfortunately, few months pass before I got idea about "column <@ const"
> case. And now, I don't have sufficient time for it due to my GSoC project.
> It would be great if you can help me with this tests.

AFAIK, we don't have regression tests for any other selectivity
estimator; except perhaps to the extent that we verify they don't
crash.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2011-06-13 04:38:52 Re: pgbench--new transaction type
Previous Message Robert Haas 2011-06-13 04:00:51 Re: Boolean operators without commutators vs. ALL/ANY