Re: Notes about fixing regexes and UTF-8 (yet again)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: NISHIYAMA Tomoaki <tomoakin(at)staff(dot)kanazawa-u(dot)ac(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Notes about fixing regexes and UTF-8 (yet again)
Date: 2012-02-18 23:45:10
Message-ID: 7392.1329608710@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Yeah, it's conceivable that we could implement something whereby
>> characters with codes above some cutoff point are handled via runtime
>> calls to iswalpha() and friends, rather than being included in the
>> statically-constructed DFA maps. The cutoff point could likely be a lot
>> less than U+FFFF, too, thereby saving storage and map build time all
>> round.

> It's been proposed to build a regexp type in PostgreSQL which would
> store the DFA directly and provides some way to run that DFA out of its
> storage without recompiling.

> Would such a mechanism be useful here?

No, this is about what goes into the DFA representation in the first
place, not about how we store it and reuse it.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-02-18 23:55:39 Re: Future of our regular expression code
Previous Message Dimitri Fontaine 2012-02-18 23:12:09 Re: Future of our regular expression code