Re: Our regex vs. POSIX on "longest match"

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, depesz(at)depesz(dot)com, Brendan Jurd <direvus(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Our regex vs. POSIX on "longest match"
Date: 2012-03-05 20:06:11
Message-ID: 20120305200611.GA6569@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 05, 2012 at 11:28:24AM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I think the right way to imagine this is as though the regular
> > expression were being matched to the source text in left-to-right
> > fashion.
>
> No, it isn't. You are headed down the garden path that leads to a
> Perl-style definition-by-implementation, and in particular you are going
> to end up with an implementation that fails to satisfy the POSIX
> standard. POSIX requires an *overall longest* match (at least for cases
> where all quantifiers are greedy), and that sometimes means that the
> quantifiers can't be processed strictly left-to-right greedy. An
> example of this is

On the otherhand, I think requiring an "overall longest match" makes
your implementation non-polynomial complexity. The simplest example I
can think of is the knapsack problem, where given weights x_n and a
total W, can be converted to a regex problem as matching a string with
W a's against the regex:

a{x_1}?a{x_2}?a{x_3}? etc...

Yes, Perl (and others) don't guarentee an overall longest match. I
think they want you to consider regular expressions as a specialised
parsing language where you can configure a state machine to process
your strings. Not ideal, but predicatable.

The question is, what are users expecting of the PostgreSQL regex
implementation?

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
-- Arthur Schopenhauer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-03-05 20:10:46 Re: RFC: Making TRUNCATE more "MVCC-safe"
Previous Message Pavel Stehule 2012-03-05 20:02:39 Re: poll: CHECK TRIGGER?