Re: Adding a distinct "pattern" type to resolve the "~" commutator stalemate

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Adding a distinct "pattern" type to resolve the "~" commutator stalemate
Date: 2011-06-19 23:56:45
Message-ID: CA02028E-58B6-4764-95C1-1551891995AD@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jun20, 2011, at 00:56 , Andrew Dunstan wrote:
> On 06/19/2011 05:02 PM, Florian Pflug wrote:
>> The only argument against that I can see is that it poses
>> a compatibility problem if "~" remains the pattern matching
>> operator. I do believe, however, that the chance of
>> unknown ~ unknown
>> appearing in actual applications is rather small - that'd only
>> happen if people used postgresql's regexp engine together with
>> purely external data.
>
> People can store regular expressions in text fields now, and do,
> let me assure you. So the chances you'll encounter text ~ unknown
> or unknown ~ text or text ~ text are 100%

Hm, it seems we either all have different idea about how such
a pattern type would be be defined, or have grown so accustomed to
pg's type system that we've forgotten how powerful it really
is ;-) (For me, the latter is surely true...).

I've now created a primitive prototype that uses a composite
type for "pattern". That changes the input syntax for patterns
(you need to enclose them in brackets), but should model all
the implicit and explicit casting rules and operator selection
correctly. It also uses "~~~" in place of "~", for obvious
reasons and again without changing the casting and overloading
rules.

The prototype defines
text ~~~ text
text ~~~ pattern
pattern ~~~ text
and can be found at end of this mail.

With that prototype, *all* the cases (even unknown ~~~ unknown)
work as today, i.e. all of the statements below return true

postgres=# select 'abc' ~~~ '^ab+c$';
postgres=# select 'abc'::text ~~~ '^ab+c$';
postgres=# select 'abc' ~~~ '^ab+c$'::text;
postgres=# select 'abc' ~~~ '(^ab+c$)'::pattern;
postgres=# select '(^ab+c$)'::pattern ~~~ 'abc';

(The same happens with and without setting pattern's typcategory
to 'S'. Not really sure if the category has any effect here
at all).

That's not exactly what I had in mind - I'd have preferred
unknown ~~~ unknown
to return an error but
text ~~~ unknown
and
unknown ~~~ text
to work, but it looks that that's not easily done.

Still, I believe the behaviour of the prototype is acceptable.

BTW, The reason that 'unknown ~~~ unknown' works is, I believe
the following comment func_select_candidate, together with the
fact that 'text' is the preferred type in the string category.

If any candidate has an input datatype of STRING category,
use STRING category (this bias towards STRING is appropriate
since unknown-type literals look like strings).

best regards,
Florian Pflug

create type pattern as (p text);

create function match_right(l text, r text) returns boolean as $$
select $1 ~ $2
$$ language sql strict immutable;

create operator ~~~ (
procedure = match_right,
leftarg = text, rightarg = text
);

create function match_right(l text, r pattern) returns boolean as $$
select $1 ~ $2.p
$$ language sql strict immutable;

create operator ~~~ (
procedure = match_right, commutator = '~~~',
leftarg = text, rightarg = pattern
);

create function match_left(l pattern, r text) returns boolean as $$
select $2 ~ $1.p
$$ language sql strict immutable;

create operator ~~~ (
procedure = match_left, commutator = '~~~',
leftarg = pattern, rightarg = text
);

update pg_type set typcategory = 'S' where oid = 'pattern'::regtype;

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2011-06-19 23:59:44 Re: the big picture for index-only scans
Previous Message Kevin Grittner 2011-06-19 23:17:49 Re: Small SSI issues