From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Erik Rijkers <er(at)xs4all(dot)nl> |
Cc: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org, pavel(dot)stehule(at)gmail(dot)com |
Subject: | Re: WIP: index support for regexp search |
Date: | 2012-12-18 09:10:00 |
Message-ID: | CAPpHfdswF+FHrNtBnCgbwf-hxLUcMssf=L7cX_CYAq5ncUsrPA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 18, 2012 at 12:51 PM, Erik Rijkers <er(at)xs4all(dot)nl> wrote:
> On Tue, December 18, 2012 09:45, Alexander Korotkov wrote:
> >
> > You should use {0,n} to express from 0 to n occurences.
> >
>
>
> Thanks, but I know that of course. It's a testing program; and in the end
> robustness with
> unexpected or even wrong input is as important as performance. (to put it
> bluntly, I am also
> trying to get your patch to fall over ;-))
>
I found most of regressions in 0.9 version to be in {,n} cases. New version
of patch use more of trigrams than previous versions.
For example for regex 'x[aeiou]{,2}q'.
In 0.7 version we use trigrams '__2', '_2_' and '__q'.
In 0.9 version we use trigrams 'xa_', 'xe_', 'xi_', 'xo_', 'xu_', '__2',
'_2_' and '__q'.
But, actually trigram '__2' or '_2_' never occurs. It enough to have one of
them, all others are just causing a slowdown. Simultaneously, we can't
decide reasonably which trigrams to use without knowing their frequencies.
For example, if trigrams 'xa_', 'xe_', 'xi_', 'xo_', 'xu_' were altogether
more rare than '__2', newer version of patch would be faster.
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2012-12-18 09:18:50 | Re: Error restoring from a base backup taken from standby |
Previous Message | Greg Smith | 2012-12-18 09:06:02 | Re: Enabling Checksums |