From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org, pavel(dot)stehule(at)gmail(dot)com |
Subject: | Re: WIP: index support for regexp search |
Date: | 2013-03-06 09:06:26 |
Message-ID: | CAPpHfdtkPgtDANjAXMnyAycpsahgGedQZ7VU+KfW6Y_5Jx1O=g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 23, 2013 at 7:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> > On 23.01.2013 09:36, Alexander Korotkov wrote:
> >> On Wed, Jan 23, 2013 at 6:08 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> The biggest problem is that I really don't care for the idea of
> >>> contrib/pg_trgm being this cozy with the innards of regex_t.
>
> >> The only option I see now is to provide a method like "export_cnfa"
> which
> >> would export corresponding CNFA in fixed format.
>
> > Yeah, I think that makes sense. The transformation code in trgm_regexp.c
> > would probably be more readable too, if it didn't have to deal with the
> > regex guts representation of the CNFA. Also, once you have intermediate
> > representation of the original CNFA, you could do some of the
> > transformation work on that representation, before building the
> > "tranformed graph" containing trigrams. You could eliminate any
> > non-alphanumeric characters, joining states connected by arcs with
> > non-alphanumeric characters, for example.
>
> It's not just the CNFA though; the other big API problem is with mapping
> colors back to characters. Right now, that not only knows way too much
> about a part of the regex internals we have ambitions to change soon,
> but it also requires pg_wchar2mb_with_len() and lowerstr(), neither of
> which should be known to the regex library IMO. So I'm not sure how we
> divvy that up sanely. To be clear: I'm not going to insist that we have
> to have a clean API factorization before we commit this at all. But it
> worries me if we don't even know how we could get to that, because we
> are going to need it eventually.
>
Now, we probably don't have enough of time before 9.3 to solve an API
problem :(. It's likely we have to choose either commit to 9.3 without
clean API factorization or postpone it to 9.4.
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2013-03-06 09:12:52 | Re: Writable foreign tables: how to identify rows |
Previous Message | Andres Freund | 2013-03-06 08:50:39 | Re: Support for REINDEX CONCURRENTLY |