Re: Future of our regular expression code

From: Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Future of our regular expression code
Date: 2012-02-20 06:09:31
Message-ID: 4F41E39B.8010502@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stephen Frost wrote:
> Alright, I'll bite.. Which existing regexp implementation that's well
> written, well maintained, and which is well protected against malicious
> regexes should we be considering then?

FWIW, there's a benchmark here that compares a number of regexp engines,
including PCRE, TRE and Russ Cox's RE2:

http://lh3lh3.users.sourceforge.net/reb.shtml

The fastest backtracking-style engine seems to be oniguruma, which is native
to Ruby 1.9 and thus not only supports Unicode but I'd bet performs pretty
well on it, on account of it's developed in Japan. But it goes pathological
on regexen containing '|'; the only safe choice among PCRE-style engines is
RE2, but of course that doesn't support backreferences.

Russ's page on re2 (http://code.google.com/p/re2/) says:

"If you absolutely need backreferences and generalized assertions, then RE2
is not for you, but you might be interested in irregexp, Google Chrome's
regular expression engine."

That's here:

http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html

Sadly, it's in Javascript. Seems like if you need a safe, performant regexp
implementation, your choice is (a) finish PLv8 and support it on all
platforms, or (b) add backreferences to RE2 and precompile it to C with
Comeau (if that's still around), or...

Jay

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-02-20 06:17:12 Re: leakproof
Previous Message Amit Kapila 2012-02-20 06:00:06 Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)