Quick Links

Re: WIP: index support for regexp search

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Erik Rijkers <er(at)xs4all(dot)nl>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stěhule <pavel(dot)stehule(at)gmail(dot)com>
Subject:	Re: WIP: index support for regexp search
Date:	2013-04-01 21:15:27
Message-ID:	CAPpHfduK1u+ykcwKmjOTe30e6JmStMg6LUCmNttGinvYSQQ0RQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Mar 25, 2013 at 2:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> > On Mon, Mar 25, 2013 at 1:50 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Similarly, pushing PG-specific declarations like RE_compile_and_cache()
> >> into regex/regex.h is completely not the right thing for preserving a
> >> clear library boundary (even positing that we want to expose that
> >> function outside adt/regexp.c, which I'd rather we didn't).
> >>
> >> Perhaps we could avoid these issues by defining a library API that
> >> provides accessors like these for the opaque regex_t struct:
> >>
> >> * get the number of states in the CNFA
> >>
> >> * get the numbers of the initial and final states
> >>
> >> * get the number of out-arcs for the N'th state
> >>
> >> * get the out-arcs for the N'th state into a caller-provided array
> >> (sized using the previous function), where each out-arc is represented
> >> by a color and an end-state
> >>
> >> * get the number of character codes represented by color C
> >>
> >> * get the wchar codes for color C into a caller-provided array
> >>
> >> (The reason for letting the caller allocate the result arrays is so we
> >> can use palloc for that; if we allocate it in backend/regex/ we must
> >> use malloc, which will greatly increase the risk of leakages. Also,
> >> as far as the color API goes, the above lets the caller decide how
> >> many characters is "too many" to bother with.)
>
> > I like the this idea. Seems like clear and not over-engineered API. I can
> > implement it. Could you propose something particular to do with
> > RE_compile_and_cache in this patch? With this API we still need a way to
> > get regex_t or equivalent from string.
>
> Well, the brute force answer is for pg_trgm to not go through
> RE_compile_and_cache at all, but just call the regex library for itself.
> That would lose the ability to cache regex compilations, but I'm not
> sure we care. The motivation for RE_compile_and_cache is mainly to
> prevent having to compile the regex again for *every row*, which is what
> would otherwise happen in a simple SELECT using a regex function.
> Unless I'm misunderstanding something, pg_trgm would only need to
> compile the regex once per indexscan, which is probably tolerable.
>

For me it makes sense.
Patch with proposed API implementation is attached.

------
With best regards,
Alexander Korotkov.

Attachment	Content-Type	Size
trgm-regexp-0.14.patch.gz	application/x-gzip	23.1 KB

In response to

Re: WIP: index support for regexp search at 2013-03-24 22:38:47 from Tom Lane

Responses

Re: WIP: index support for regexp search at 2013-04-02 20:36:22 from Erikjan Rijkers

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Merlin Moncure	2013-04-01 21:55:23	Re: Page replacement algorithm in buffer cache
Previous Message	Andres Freund	2013-04-01 21:09:20	Re: Page replacement algorithm in buffer cache