Re: Collect frequency statistics for arrays

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Nathan Boley <npboley(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collect frequency statistics for arrays
Date: 2012-03-02 20:36:41
Message-ID: 11732.1330720601@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> ... So my preference is to align the two
> definitions of STATISTIC_KIND_MCELEM by adding a null-element frequency
> to tsvector's usage (where it'll always be zero) and getting rid of the
> average distinct element count here.

Actually, there's a way we can do this without code changes in the
tsvector stuff. Since the number of MCELEM stanumber items that provide
frequencies of stavalue items is obviously equal to the length of
stavalues, we could define stanumbers as containing those matching
entries, then two min/max entries, then an *optional* entry for the
frequency of null elements (with the frequency presumed to be zero if
omitted). This'd be non-ambiguous given access to stavalues. I'm not
sure though if making the null frequency optional wouldn't introduce
complexity elsewhere that outweighs not having to touch the tsvector
code.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-03-02 20:45:38 sortsupport for text
Previous Message Noah Misch 2012-03-02 20:17:44 Re: index-only quals vs. security_barrier views