From: | Marko Kreen <markokr(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Stephen Frost <sfrost(at)snowman(dot)net>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Future of our regular expression code |
Date: | 2012-02-19 00:24:50 |
Message-ID: | CACMqXCLt1+kfpOzjaQH5ZGjEVGiPeA4NAfZy1u2fYDvOG8RZzg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Feb 19, 2012 at 1:55 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>>> Yeah ... if you *don't* know the difference between a DFA and an NFA,
>>> you're likely to find yourself in over your head. Having said that,
>
>> So, here's a paper I found very nice to get started into this subject:
>> http://swtch.com/~rsc/regexp/regexp1.html
>
> Yeah, I just found that this afternoon myself; it's a great intro.
>
> If you follow the whole sequence of papers (there are 4) you'll find out
> that this guy built a new regexp engine for Google, and these papers are
> basically introducing/defending its design. It turns out they've
> released it under a BSD-ish license, so for about half a minute I was
> thinking there might be a new contender for something we could adopt.
> But there turn out to be at least two killer reasons why we won't:
> * it's in C++ not C
> * it doesn't support backrefs, as well as a few other features that
> maybe aren't as interesting but still would represent compatibility
> gotchas if they went away.
Another interesting library, technology-wise, is libtre:
http://laurikari.net/tre/about/
http://laurikari.net/tre/documentation/
NetBSD plans to replace the libc regex with it:
http://netbsd-soc.sourceforge.net/projects/widechar-regex/
http://groups.google.com/group/muc.lists.netbsd.current-users/browse_thread/thread/db5628e2e8f810e5/a99c368a6d22b6f8?lnk=gst&q=libtre#a99c368a6d22b6f8
Another useful project - AT&T regex tests:
http://www2.research.att.com/~gsf/testregex/
About our Spencer code - if we don't have resources (not called Tom)
to clean it up and make available as library (in short term - at least
to TCL folks) we should drop it. Because it means it's dead end,
however good it is.
--
marko
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-02-19 00:29:33 | Re: Notes about fixing regexes and UTF-8 (yet again) |
Previous Message | Tom Lane | 2012-02-18 23:55:39 | Re: Future of our regular expression code |