Re: WIP: collect frequency statistics for arrays

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: collect frequency statistics for arrays
Date: 2011-06-12 16:17:25
Message-ID: BANLkTikpkO1kkqDscmR_bWPqBrawhnmTAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 10, 2011 at 9:03 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> Initial comments are that the code is well structured and I doubt
> there will be problems at the code level. Looks like a good patch.
>
I'm worrying about perfomance of "column <@ const" estimation. It takes
O(m*(n+m)) of time, where m - const length and n - statistics target.
Probably, it can be too slow is some some cases.

> At the moment I see no tests. If this code will be exercised by
> existing tests then you should put some notes with the patch to
> explain that, or at least provide some pointers as to how I might test
> this.
>
I didn't find in existing tests which check selectivity estimation accuracy.
And I found difficult to create them because regression tests gives binary
result while estimation accuracy is quantitative value. Existing regression
tests covers case if typanalyze or selectivity estimation function falls
down. I've added "ANALYZE array_op_test;" line into array test in order to
these tests covers falldown case for this patch functions too.
Seems that, selectivity estimation accuracy should be tested manually on
various distributions. I've done very small amount of such tests.
Unfortunately, few months pass before I got idea about "column <@ const"
case. And now, I don't have sufficient time for it due to my GSoC project.
It would be great if you can help me with this tests.

> Also, I'd like to see some more explanation. Either in comments, or
> just as a post to hackers. That saves me time, but we need to be clear
> about what this does and does not do, what it might do in the future
> etc.. 3+ years from now we need to be able to remember what the code
> was supposed to do. You will forget yourself in time, if you write
> enough patches. Based on this, I think you'll be writing quite a few
> more.
>
I've added some more comments. I'm afraid that it should be completely
rewritten before committing due to my english. If some particular points
should be clarified more, please, specify them.

> And of course, a few lines for the docs also.
>
I found that in statistics patch for tsvector only article about pg_stats
view was corrected. I've corrected this article a little bit too.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
arrayanalyze-0.3.patch.gz application/x-gzip 17.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Seref Arikan 2011-06-12 17:26:17 Detailed documentation for external calls (threading, shared resources etc)
Previous Message Bruce Momjian 2011-06-12 15:45:54 Re: Creating new remote branch in git?