Re: WIP: index support for regexp search

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Erikjan Rijkers <er(at)xs4all(dot)nl>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stìhule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: WIP: index support for regexp search
Date: 2013-04-09 08:58:21
Message-ID: CAPpHfdtWpbVoXGU+damLwb+_-1wcRRwuzm2XSuy8Qr1aDdnDzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 9, 2013 at 9:15 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> > On Mon, Apr 8, 2013 at 9:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> I spent the weekend hacking on this, making a number of bug fixes and a
> >> whole lot of cosmetic changes. I think there are large parts of this
> >> that are in committable shape now, but I still find the actual graph
> >> transformation logic to be mostly unintelligible. I think what's most
> >> obscure is the distinction between the arcs list and the keys list of
> >> each state in the expanded graph. I get the impression that the
> >> general idea is for the arcs to represent exactly-known transitions
> >> while the keys represent imprecisely-known transitions ... but there
> >> seems to be at least some leakage between those categories. Could
> >> you write down a specification for what's supposed to be happening
> >> there?
>
> > Here is my try to specify it.
>
> Thanks. I hacked on this some more and committed it. I found a number
> of bugs along the way with respect to handling of word boundaries
> (partially-blank transition trigrams) and EOL-color ($) handling.
> I think it's all fixed now but it could definitely use some more
> study and testing.
>

Great, thanks! I also will do some more testing.

One issue that bothered me is that the regression tests really don't
> provide much visibility into what the code is doing. Some of the bugs
> had to do with failing to generate expected trigrams, for instance
> col ~ 'foo bar' only generating trigram "foo" and not "bar". This still
> led to getting the right answer, so the error was invisible as far as the
> tests were concerned. Is it worth thinking of a way to expose what the
> extract function did at SQL level, so we could test more carefully?
>

Yes, I also had similar idea. But, I think we need some relatively stable
representation of resulting graph in order to expose it. There could be a
lot of equivalent graphs. Some changes in implementation could lead to
change from one equivalent graph to another. It would be better to not
rewrite tests in this case. Ideally, we should expose some representation
which is the same for all equivalent graphs. However, it doesn't seem to be
realistic. But, I think we could at least make it stable to order sequence
of states and color trigrams. Another option I see is to expose just set of
trigrams. It doesn't have completeness of information, but it is quite
stable.

------
With best regards,
Alexander Korotkov.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2013-04-09 11:59:25 Re: [HACKERS] Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog
Previous Message Thom Brown 2013-04-09 08:30:28 Re: Call for Google Summer of Code mentors, admins