Re: Our regex vs. POSIX on "longest match"

From: Brendan Jurd <direvus(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, depesz(at)depesz(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Our regex vs. POSIX on "longest match"
Date: 2012-03-05 09:22:43
Message-ID: CADxJZo1fbE9FA+pW89dNqqiPpLstSxYKug9TLQcS_q+J7wF+_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5 March 2012 17:23, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> This is different from what Perl does, but I think Perl's behavior
> here is batty: given a+|a+b+ and the string aaabbb, it picks the first
> branch and matches only aaa.

Yeah, this is sometimes referred to as "ordered alternation",
basically that the branches of the alternation are prioritised in the
same order in which they are described. It is fairly commonplace
among regex implementations.

> apparently, it selects the syntactically first
> branch that can match, regardless of the length of the match, which
> strikes me as nearly pure evil.

As long as it's documented that alternation prioritises in this way, I
don't feel upset about it. At least it still provides you with a
sensible way to get whatever you want from your RE; if you want a
shorter alternative to be preferred, put it up the front. Ordered
alternation also gives you a way to specify which of several
same-length alternatives you would prefer to be matched, which can
come in handy. It also means you can specify less-complex
alternatives before more-complex ones, which can have performance
advantages.

I do agree with you that if you *don't* do ordered alternation, then
it is right to treat alternation as greedy by default.

Cheers,
BJ

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregg Jaskiewicz 2012-03-05 10:06:48 Re: autovacuum locks
Previous Message Shigeru Hanada 2012-03-05 09:21:19 Re: pgsql_fdw, FDW for PostgreSQL server