Re: Future of our regular expression code

From: Billy Earney <billy(dot)earney(at)gmail(dot)com>
To: Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Future of our regular expression code
Date: 2012-02-20 21:25:28
Message-ID: CAB1ii-f83hQvC7mpbQQa5UuuvYdgCSpw6E1+wXghXEzWf=_YZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jay,

Good links, and I've also looked at a few others with benchmarks. I
believe most of the benchmarks are done before PCRE implemented jit. I
haven't found a benchmark with jit enabled, so I'm not sure if it will make
a difference. Also I'm not sure how accurately the benchmarks will show
how they will perform in an RDBMS environment. The optimizer probably is a
very important variable in many complex queries. I'm leaning towards
trying to implement RE2 and PCRE and running some benchmarks to see which
performs best.

Also would it be possible to set a session variable (lets say PGREGEXTYPE)
and set it to ARE (current alg), RE2, or PCRE, that way users could choose
which implementation they want (unless we find a single implementation that
beats the others in almost all categories)? Or is this a bad idea?

Just a thought.

On Mon, Feb 20, 2012 at 12:09 AM, Jay Levitt <jay(dot)levitt(at)gmail(dot)com> wrote:

> Stephen Frost wrote:
>
>> Alright, I'll bite.. Which existing regexp implementation that's well
>> written, well maintained, and which is well protected against malicious
>> regexes should we be considering then?
>>
>
> FWIW, there's a benchmark here that compares a number of regexp engines,
> including PCRE, TRE and Russ Cox's RE2:
>
> http://lh3lh3.users.**sourceforge.net/reb.shtml<http://lh3lh3.users.sourceforge.net/reb.shtml>
>
> The fastest backtracking-style engine seems to be oniguruma, which is
> native to Ruby 1.9 and thus not only supports Unicode but I'd bet performs
> pretty well on it, on account of it's developed in Japan. But it goes
> pathological on regexen containing '|'; the only safe choice among
> PCRE-style engines is RE2, but of course that doesn't support
> backreferences.
>
> Russ's page on re2 (http://code.google.com/p/re2/**) says:
>
> "If you absolutely need backreferences and generalized assertions, then
> RE2 is not for you, but you might be interested in irregexp, Google
> Chrome's regular expression engine."
>
> That's here:
>
> http://blog.chromium.org/2009/**02/irregexp-google-chromes-**
> new-regexp.html<http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html>
>
> Sadly, it's in Javascript. Seems like if you need a safe, performant
> regexp implementation, your choice is (a) finish PLv8 and support it on all
> platforms, or (b) add backreferences to RE2 and precompile it to C with
> Comeau (if that's still around), or...
>
> Jay
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-hackers<http://www.postgresql.org/mailpref/pgsql-hackers>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-02-20 21:35:25 Re: Future of our regular expression code
Previous Message Robert Haas 2012-02-20 19:23:31 Re: wal_buffers