Re: Proposal: collect frequency statistics for arrays

Lists: pgsql-hackers
From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Proposal: collect frequency statistics for arrays
Date: 2011-02-18 18:51:08
Message-ID: AANLkTikiruxOYh6jE_ETK7u4sEW6VXXqFK3Xn0-zVxoe@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hackers,

I have following proposal. Currently the ts_typanalyze function accumulates
frequency statistics for ts_vector using lossy counting technique. But no
frequency statistics is collecting over arrays. I'm going to generalize
ts_typanalyze to make it collecting statistics for arrays too. ts_typanalyze
internally uses lexeme comparison and hashing. I'm going to use functions
from default btree and hash opclasses of array element type in this
capacity. Collected frequency statistics for arrays can be used for && and
@> operators selectivity estimation.

------
With best regards,
Alexander Korotkov.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: collect frequency statistics for arrays
Date: 2011-02-18 18:57:55
Message-ID: 12406.1298055475@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> I have following proposal. Currently the ts_typanalyze function accumulates
> frequency statistics for ts_vector using lossy counting technique. But no
> frequency statistics is collecting over arrays. I'm going to generalize
> ts_typanalyze to make it collecting statistics for arrays too. ts_typanalyze
> internally uses lexeme comparison and hashing. I'm going to use functions
> from default btree and hash opclasses of array element type in this
> capacity. Collected frequency statistics for arrays can be used for && and
> @> operators selectivity estimation.

It'd be better to just make a separate function for arrays, instead of
trying to kluge ts_typanalyze to the point where it'd cover both cases.

regards, tom lane


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: collect frequency statistics for arrays
Date: 2011-02-19 20:43:22
Message-ID: AANLkTik4FMnhbaRxSY_k7y_j2Xbjzf+uDsaU=g--eSaB@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for feedback on my proposal.
Ok, I'll write it as an separate function. After that I'm going to look if
is there a way to union them without kluge. If I'll not find such way then
I'll propose patch with separate function.

------
With best regards,
Alexander Korotkov.