Re: Future of our regular expression code

From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Future of our regular expression code
Date: 2012-02-20 03:28:16
Message-ID: CAM-w4HN1abmWjaPD7i0jqBYC2FOiq--W=f=QdCkggfttWGnH3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 18, 2012 at 6:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>  A larger point is that it'd be a real shame
> for the Spencer regex engine to die off, because it is in fact one of
> the best pieces of regex technology on the planet.
...
> Another possible long-term answer is to finish the work Henry never did,
> that is make the code into a standalone library.  That would make it
> available to more projects and perhaps attract other people to help
> maintain it.  However, that looks like a lot of work too, with distant
> and uncertain payoff.

I can't see how your first claim that the Spencer code is worth
keeping around because it's just a superior regex implementation has
much force unless we can accomplish the latter. If the library can be
split off into a standalone library then it might have some longevity.
But if we're the only ones maintaining it then it's just prolonging
the inevitable. I can't see Postgres having its own special brand of
regexes that nobody else uses being an acceptable situation forever.

One thing that concerns me more and more is that most sufficiently
powerful regex implementations are susceptible to DOS attacks. A
database application is quite likely to allow users to decide directly
or indirectly what regexes to apply and it can be hard to predict
which regexes will cause which implementations to explode its cpu or
memory requirements. We need a library that can be used to defend
against malicious regexes and i suspect neither Perl's nor Python's
library will suffice for this.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-02-20 03:29:39 Re: Potential reference miscounts and segfaults in plpython.c
Previous Message Tom Lane 2012-02-20 03:24:44 Re: leakproof