Re: [HACKERS] regexp_replace

Lists: pgsql-hackerspgsql-patches
From: "a_ogawa00" <a_ogawa00(at)yahoo(dot)co(dot)jp>
To: <pgsql-patches(at)postgresql(dot)org>
Subject: regexp_replace
Date: 2004-07-19 13:24:42
Message-ID: PIEMIKOOMKNIJLLLBCBBMEPLCAAA.a_ogawa00@yahoo.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


This patch provides a new function regexp_replace.
regexp_replace extends a replace function and enables text search
by the regular expression. And, a back reference can be used within
a replace string.
(This patch for PostgreSQL 7.4.3)

Function: regexp_replace(str, pattern, replace_str)
Retuen Type: text
Description: Replace all matched string in str.
pattern is regular expression pattern.
replace_str is replace string that can use '\1' - '\9', and
'\&'.
'\1' - '\9' is back reference to the n'th subexpression.
'\&' is matched string.

(example1)
select regexp_replace('ABC-DEF', '(\\w+)-(\\w+)', '\\2-\\1')
result: DEF-ABC

(example2)
update tab1 set col1 = regexp_replace(col1, '[A-Z]', '');

---
Atsushi Ogawa
a_ogawa(at)hi-ho(dot)ne(dot)jp

--- cut here ---

*** ./src/backend/regex/regexec.c.orig Tue Jul 20 08:45:39 2004
--- ./src/backend/regex/regexec.c Tue Jul 20 08:49:36 2004
***************
*** 110,115 ****
--- 110,116 ----
regmatch_t *pmatch;
rm_detail_t *details;
chr *start; /* start of string */
+ chr *search_start; /* search start of string */
chr *stop; /* just past end of string */
int err; /* error code if any (0 none) */
regoff_t *mem; /* memory vector for backtracking */
***************
*** 168,173 ****
--- 169,175 ----
pg_regexec(regex_t *re,
const chr *string,
size_t len,
+ size_t search_start,
rm_detail_t *details,
size_t nmatch,
regmatch_t pmatch[],
***************
*** 219,224 ****
--- 221,227 ----
v->pmatch = pmatch;
v->details = details;
v->start = (chr *) string;
+ v->search_start = (chr *) string + search_start;
v->stop = (chr *) string + len;
v->err = 0;
if (backref)
***************
*** 288,294 ****
NOERR();
MDEBUG(("\nsearch at %ld\n", LOFF(v->start)));
cold = NULL;
! close = shortest(v, s, v->start, v->start, v->stop, &cold, (int *)
NULL);
freedfa(s);
NOERR();
if (v->g->cflags & REG_EXPECT)
--- 291,298 ----
NOERR();
MDEBUG(("\nsearch at %ld\n", LOFF(v->start)));
cold = NULL;
! close = shortest(v, s, v->search_start, v->search_start, v->stop,
! &cold, (int *) NULL);
freedfa(s);
NOERR();
if (v->g->cflags & REG_EXPECT)
***************
*** 415,421 ****

assert(d != NULL && s != NULL);
cold = NULL;
! close = v->start;
do
{
MDEBUG(("\ncsearch at %ld\n", LOFF(close)));
--- 419,425 ----

assert(d != NULL && s != NULL);
cold = NULL;
! close = v->search_start;
do
{
MDEBUG(("\ncsearch at %ld\n", LOFF(close)));
*** ./src/backend/utils/adt/regexp.c.orig Tue Jul 20 08:50:08 2004
--- ./src/backend/utils/adt/regexp.c Tue Jul 20 09:00:05 2004
***************
*** 80,116 ****

/*
! * RE_compile_and_execute - compile and execute a RE, caching if possible
*
! * Returns TRUE on match, FALSE on no match
*
! * text_re --- the pattern, expressed as an *untoasted* TEXT object
! * dat --- the data to match against (need not be null-terminated)
! * dat_len --- the length of the data string
! * cflags --- compile options for the pattern
! * nmatch, pmatch --- optional return area for match details
*
! * Both pattern and data are given in the database encoding. We
internally
! * convert to array of pg_wchar which is what Spencer's regex package
wants.
*/
! static bool
! RE_compile_and_execute(text *text_re, unsigned char *dat, int dat_len,
! int cflags, int nmatch, regmatch_t *pmatch)
{
int text_re_len = VARSIZE(text_re);
- pg_wchar *data;
- size_t data_len;
pg_wchar *pattern;
size_t pattern_len;
int i;
int regcomp_result;
- int regexec_result;
cached_re_str re_temp;

- /* Convert data string to wide characters */
- data = (pg_wchar *) palloc((dat_len + 1) * sizeof(pg_wchar));
- data_len = pg_mb2wchar_with_len(dat, data, dat_len);
-
/*
* Look for a match among previously compiled REs. Since the data
* structure is self-organizing with most-used entries at the front,
--- 80,105 ----

/*
! * RE_compile_and_cache - compile a RE, caching if possible
*
! * Returns regex_t
*
! * text_re --- the pattern, expressed as an *untoasted* TEXT object
! * cflags --- compile options for the pattern
*
! * Pattern is given in the database encoding. We internally convert to
! * array of pg_wchar which is what Spencer's regex package wants.
*/
! static regex_t
! RE_compile_and_cache(text *text_re, int cflags)
{
int text_re_len = VARSIZE(text_re);
pg_wchar *pattern;
size_t pattern_len;
int i;
int regcomp_result;
cached_re_str re_temp;

/*
* Look for a match among previously compiled REs. Since the data
* structure is self-organizing with most-used entries at the front,
***************
*** 132,149 ****
re_array[0] = re_temp;
}

! /* Perform RE match and return result */
! regexec_result = pg_regexec(&re_array[0].cre_re,
! data,
! data_len,
! NULL, /* no details */
! nmatch,
! pmatch,
! 0);
!
! pfree(data);
!
! return (regexec_result == 0);
}
}

--- 121,127 ----
re_array[0] = re_temp;
}

! return re_array[0].cre_re;
}
}

***************
*** 210,219 ****
--- 188,231 ----
re_array[0] = re_temp;
num_res++;

+ return re_array[0].cre_re;
+ }
+
+ /*
+ * RE_compile_and_execute - compile and execute a RE, caching if possible
+ *
+ * Returns TRUE on match, FALSE on no match
+ *
+ * text_re --- the pattern, expressed as an *untoasted* TEXT object
+ * dat --- the data to match against (need not be null-terminated)
+ * dat_len --- the length of the data string
+ * cflags --- compile options for the pattern
+ * nmatch, pmatch --- optional return area for match details
+ *
+ * Both pattern and data are given in the database encoding. We
internally
+ * convert to array of pg_wchar which is what Spencer's regex package
wants.
+ */
+ static bool
+ RE_compile_and_execute(text *text_re, unsigned char *dat, int dat_len,
+ int cflags, int nmatch, regmatch_t *pmatch)
+ {
+ pg_wchar *data;
+ size_t data_len;
+ int regexec_result;
+ regex_t re;
+
+ /* Convert data string to wide characters */
+ data = (pg_wchar *) palloc((dat_len + 1) * sizeof(pg_wchar));
+ data_len = pg_mb2wchar_with_len(dat, data, dat_len);
+
+ /* Compile RE */
+ re = RE_compile_and_cache(text_re, cflags);
+
/* Perform RE match and return result */
regexec_result = pg_regexec(&re_array[0].cre_re,
data,
data_len,
+ 0,
NULL, /* no details */
nmatch,
pmatch,
***************
*** 415,420 ****
--- 427,452 ----
}

PG_RETURN_NULL();
+ }
+
+ /*
+ * textregexreplace()
+ * Return a replace string matched by a regular expression.
+ */
+ Datum
+ textregexreplace(PG_FUNCTION_ARGS)
+ {
+ text *s = PG_GETARG_TEXT_P(0);
+ text *p = PG_GETARG_TEXT_P(1);
+ text *r = PG_GETARG_TEXT_P(2);
+ regex_t re;
+
+ re = RE_compile_and_cache(p, regex_flavor);
+
+ return (DirectFunctionCall3(replace_text_regexp,
+ PointerGetDatum(s),
+ PointerGetDatum(&re),
+ PointerGetDatum(r)));
}

/* similar_escape()
*** ./src/backend/utils/adt/varlena.c.orig Tue Jul 20 09:00:17 2004
--- ./src/backend/utils/adt/varlena.c Tue Jul 20 10:23:32 2004
***************
*** 28,33 ****
--- 28,34 ----
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/pg_locale.h"
+ #include "regex/regex.h"

typedef struct varlena unknown;
***************
*** 1971,1976 ****
--- 1972,2122 ----
ret_text = PG_STR_GET_TEXT(str->data);
pfree(str->data);
pfree(str);
+
+ PG_RETURN_TEXT_P(ret_text);
+ }
+
+ /*
+ * have_escape_in_regexp_replace_str
+ * check replace string have escape char
+ */
+ static bool
+ have_escape_in_regexp_replace_str(const char *replace_str)
+ {
+ return (strchr(replace_str, '\\') != NULL);
+ }
+
+ #define REGEXP_REPLACE_BACKREF_CNT 10
+ /*
+ * appendStringInfoRegexpSubstr
+ * append string for regexp back references.
+ */
+ static void
+ appendStringInfoRegexpSubstr(StringInfo str, text *replace_text,
+ regmatch_t *pmatch, text *buf_text, int search_start)
+ {
+ const char *pstart = PG_TEXT_GET_STR(replace_text);
+ const char *p = pstart;
+ const char *pnext;
+
+ text *add_text;
+ int so;
+ int eo;
+
+ for(;;) {
+ pnext = strchr(p, '\\');
+ if(pnext == NULL) break;
+
+ add_text = text_substring(PointerGetDatum(replace_text),
+ p - pstart + 1, pnext - p, false);
+ appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
+ pfree(add_text);
+
+ p = pnext + 1;
+ so = eo = -1;
+
+ if(*p >= '1' && *p <= '9') {
+ int idx = *p - '0';
+ so = pmatch[idx].rm_so - search_start;
+ eo = pmatch[idx].rm_eo - search_start;
+ p++;
+ } else {
+ switch(*p) {
+ case '&':
+ so = pmatch[0].rm_so - search_start;
+ eo = pmatch[0].rm_eo - search_start;
+ p++;
+ break;
+ }
+ }
+
+ if(so != -1 && eo != -1) {
+ add_text = text_substring(PointerGetDatum(buf_text),
+ so + 1, (eo - so), false);
+ appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
+ pfree(add_text);
+ }
+ }
+
+ add_text = text_substring(PointerGetDatum(replace_text),
+ p - pstart + 1, -1, true);
+ appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
+ pfree(add_text);
+ }
+
+ /*
+ * replace_text_regexp
+ * replace text using regexp
+ */
+ Datum
+ replace_text_regexp(PG_FUNCTION_ARGS)
+ {
+ text *left_text;
+ text *right_text;
+ text *buf_text;
+ text *ret_text;
+ text *src_text = PG_GETARG_TEXT_P(0);
+ char *src_text_str = PG_TEXT_GET_STR(src_text);
+ int src_text_len = TEXTLEN(src_text);
+ regex_t *re = (regex_t *)PG_GETARG_POINTER(1);
+ text *replace_text = PG_GETARG_TEXT_P(2);
+ char *replace_str = PG_TEXT_GET_STR(replace_text);
+ StringInfo str = makeStringInfo();
+ int regexec_result;
+ regmatch_t pmatch[REGEXP_REPLACE_BACKREF_CNT];
+ pg_wchar *data;
+ size_t data_len;
+ int search_start;
+ bool have_escape;
+
+ buf_text = TEXTDUP(src_text);
+
+ /* Convert data string to wide characters */
+ data = (pg_wchar *) palloc((src_text_len + 1) * sizeof(pg_wchar));
+ data_len = pg_mb2wchar_with_len(src_text_str, data,
strlen(src_text_str));
+
+ have_escape = have_escape_in_regexp_replace_str(replace_str);
+
+ for(search_start = 0; search_start <= data_len;) {
+ regexec_result = pg_regexec(re,
+ data,
+ data_len,
+ search_start,
+ NULL, /* no details */
+ REGEXP_REPLACE_BACKREF_CNT,
+ pmatch,
+ 0);
+ if(regexec_result != 0) break;
+
+ left_text = text_substring(PointerGetDatum(buf_text),
+ 1, pmatch[0].rm_so - search_start, false);
+ right_text = text_substring(PointerGetDatum(buf_text),
+ pmatch[0].rm_eo - search_start + 1,
+ -1, true);
+
+ appendStringInfoString(str, PG_TEXT_GET_STR(left_text));
+ if(have_escape) {
+ appendStringInfoRegexpSubstr(str, replace_text, pmatch,
+ buf_text, search_start);
+ } else {
+ appendStringInfoString(str, replace_str);
+ }
+
+ pfree(buf_text);
+ pfree(left_text);
+ buf_text = right_text;
+
+ search_start = pmatch[0].rm_eo;
+ if(pmatch[0].rm_so == pmatch[0].rm_eo) search_start++;
+ }
+
+ appendStringInfoString(str, PG_TEXT_GET_STR(buf_text));
+ pfree(buf_text);
+
+ ret_text = PG_STR_GET_TEXT(str->data);
+ pfree(str->data);
+ pfree(str);
+ pfree(data);

PG_RETURN_TEXT_P(ret_text);
}
*** ./src/include/catalog/pg_proc.h.orig Tue Jul 20 09:24:11 2004
--- ./src/include/catalog/pg_proc.h Tue Jul 20 09:26:11 2004
***************
*** 2186,2191 ****
--- 2186,2193 ----
DESCR("return portion of string");
DATA(insert OID = 2087 ( replace PGNSP PGUID 12 f f t f i 3 25 "25 25
25" replace_text - _null_ ));
DESCR("replace all occurrences of old_substr with new_substr in string");
+ DATA(insert OID = 2167 ( regexp_replace PGNSP PGUID 12 f f t f i 3 25
"25 25 25" textregexreplace - _null_ ));
+ DESCR("replace text using regexp");
DATA(insert OID = 2088 ( split_part PGNSP PGUID 12 f f t f i 3 25 "25
25 23" split_text - _null_ ));
DESCR("split string by field_sep and return field_num");
DATA(insert OID = 2089 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "23"
to_hex32 - _null_ ));
*** ./src/include/regex/regex.h.orig Tue Jul 20 08:51:06 2004
--- ./src/include/regex/regex.h Tue Jul 20 08:51:16 2004
***************
*** 163,169 ****
* the prototypes for exported functions
*/
extern int pg_regcomp(regex_t *, const pg_wchar *, size_t, int);
! extern int pg_regexec(regex_t *, const pg_wchar *, size_t, rm_detail_t *,
size_t, regmatch_t[], int);
extern void pg_regfree(regex_t *);
extern size_t pg_regerror(int, const regex_t *, char *, size_t);

--- 163,169 ----
* the prototypes for exported functions
*/
extern int pg_regcomp(regex_t *, const pg_wchar *, size_t, int);
! extern int pg_regexec(regex_t *, const pg_wchar *, size_t, size_t,
rm_detail_t *, size_t, regmatch_t[], int);
extern void pg_regfree(regex_t *);
extern size_t pg_regerror(int, const regex_t *, char *, size_t);

*** ./src/include/utils/builtins.h.orig Tue Jul 20 09:11:19 2004
--- ./src/include/utils/builtins.h Tue Jul 20 09:11:46 2004
***************
*** 408,413 ****
--- 408,414 ----
extern Datum texticregexeq(PG_FUNCTION_ARGS);
extern Datum texticregexne(PG_FUNCTION_ARGS);
extern Datum textregexsubstr(PG_FUNCTION_ARGS);
+ extern Datum textregexreplace(PG_FUNCTION_ARGS);
extern Datum similar_escape(PG_FUNCTION_ARGS);
extern const char *assign_regex_flavor(const char *value,
bool doit, bool interactive);
***************
*** 537,542 ****
--- 538,544 ----
extern bool SplitIdentifierString(char *rawstring, char separator,
List **namelist);
extern Datum replace_text(PG_FUNCTION_ARGS);
+ extern Datum replace_text_regexp(PG_FUNCTION_ARGS);
extern Datum split_text(PG_FUNCTION_ARGS);
extern Datum text_to_array(PG_FUNCTION_ARGS);
extern Datum array_to_text(PG_FUNCTION_ARGS);

__________________________________________________
Do You Yahoo!?
http://bb.yahoo.co.jp/


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "a_ogawa00" <a_ogawa00(at)yahoo(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: regexp_replace
Date: 2004-08-07 01:04:34
Message-ID: 200408070104.i7714Yp00734@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


This has been saved for the 8.1 release:

http:/momjian.postgresql.org/cgi-bin/pgpatches2

---------------------------------------------------------------------------

a_ogawa00 wrote:
>
> This patch provides a new function regexp_replace.
> regexp_replace extends a replace function and enables text search
> by the regular expression. And, a back reference can be used within
> a replace string.
> (This patch for PostgreSQL 7.4.3)
>
> Function: regexp_replace(str, pattern, replace_str)
> Retuen Type: text
> Description: Replace all matched string in str.
> pattern is regular expression pattern.
> replace_str is replace string that can use '\1' - '\9', and
> '\&'.
> '\1' - '\9' is back reference to the n'th subexpression.
> '\&' is matched string.
>
> (example1)
> select regexp_replace('ABC-DEF', '(\\w+)-(\\w+)', '\\2-\\1')
> result: DEF-ABC
>
> (example2)
> update tab1 set col1 = regexp_replace(col1, '[A-Z]', '');
>
> ---
> Atsushi Ogawa
> a_ogawa(at)hi-ho(dot)ne(dot)jp
>
> --- cut here ---
>
> *** ./src/backend/regex/regexec.c.orig Tue Jul 20 08:45:39 2004
> --- ./src/backend/regex/regexec.c Tue Jul 20 08:49:36 2004
> ***************
> *** 110,115 ****
> --- 110,116 ----
> regmatch_t *pmatch;
> rm_detail_t *details;
> chr *start; /* start of string */
> + chr *search_start; /* search start of string */
> chr *stop; /* just past end of string */
> int err; /* error code if any (0 none) */
> regoff_t *mem; /* memory vector for backtracking */
> ***************
> *** 168,173 ****
> --- 169,175 ----
> pg_regexec(regex_t *re,
> const chr *string,
> size_t len,
> + size_t search_start,
> rm_detail_t *details,
> size_t nmatch,
> regmatch_t pmatch[],
> ***************
> *** 219,224 ****
> --- 221,227 ----
> v->pmatch = pmatch;
> v->details = details;
> v->start = (chr *) string;
> + v->search_start = (chr *) string + search_start;
> v->stop = (chr *) string + len;
> v->err = 0;
> if (backref)
> ***************
> *** 288,294 ****
> NOERR();
> MDEBUG(("\nsearch at %ld\n", LOFF(v->start)));
> cold = NULL;
> ! close = shortest(v, s, v->start, v->start, v->stop, &cold, (int *)
> NULL);
> freedfa(s);
> NOERR();
> if (v->g->cflags & REG_EXPECT)
> --- 291,298 ----
> NOERR();
> MDEBUG(("\nsearch at %ld\n", LOFF(v->start)));
> cold = NULL;
> ! close = shortest(v, s, v->search_start, v->search_start, v->stop,
> ! &cold, (int *) NULL);
> freedfa(s);
> NOERR();
> if (v->g->cflags & REG_EXPECT)
> ***************
> *** 415,421 ****
>
> assert(d != NULL && s != NULL);
> cold = NULL;
> ! close = v->start;
> do
> {
> MDEBUG(("\ncsearch at %ld\n", LOFF(close)));
> --- 419,425 ----
>
> assert(d != NULL && s != NULL);
> cold = NULL;
> ! close = v->search_start;
> do
> {
> MDEBUG(("\ncsearch at %ld\n", LOFF(close)));
> *** ./src/backend/utils/adt/regexp.c.orig Tue Jul 20 08:50:08 2004
> --- ./src/backend/utils/adt/regexp.c Tue Jul 20 09:00:05 2004
> ***************
> *** 80,116 ****
>
>
> /*
> ! * RE_compile_and_execute - compile and execute a RE, caching if possible
> *
> ! * Returns TRUE on match, FALSE on no match
> *
> ! * text_re --- the pattern, expressed as an *untoasted* TEXT object
> ! * dat --- the data to match against (need not be null-terminated)
> ! * dat_len --- the length of the data string
> ! * cflags --- compile options for the pattern
> ! * nmatch, pmatch --- optional return area for match details
> *
> ! * Both pattern and data are given in the database encoding. We
> internally
> ! * convert to array of pg_wchar which is what Spencer's regex package
> wants.
> */
> ! static bool
> ! RE_compile_and_execute(text *text_re, unsigned char *dat, int dat_len,
> ! int cflags, int nmatch, regmatch_t *pmatch)
> {
> int text_re_len = VARSIZE(text_re);
> - pg_wchar *data;
> - size_t data_len;
> pg_wchar *pattern;
> size_t pattern_len;
> int i;
> int regcomp_result;
> - int regexec_result;
> cached_re_str re_temp;
>
> - /* Convert data string to wide characters */
> - data = (pg_wchar *) palloc((dat_len + 1) * sizeof(pg_wchar));
> - data_len = pg_mb2wchar_with_len(dat, data, dat_len);
> -
> /*
> * Look for a match among previously compiled REs. Since the data
> * structure is self-organizing with most-used entries at the front,
> --- 80,105 ----
>
>
> /*
> ! * RE_compile_and_cache - compile a RE, caching if possible
> *
> ! * Returns regex_t
> *
> ! * text_re --- the pattern, expressed as an *untoasted* TEXT object
> ! * cflags --- compile options for the pattern
> *
> ! * Pattern is given in the database encoding. We internally convert to
> ! * array of pg_wchar which is what Spencer's regex package wants.
> */
> ! static regex_t
> ! RE_compile_and_cache(text *text_re, int cflags)
> {
> int text_re_len = VARSIZE(text_re);
> pg_wchar *pattern;
> size_t pattern_len;
> int i;
> int regcomp_result;
> cached_re_str re_temp;
>
> /*
> * Look for a match among previously compiled REs. Since the data
> * structure is self-organizing with most-used entries at the front,
> ***************
> *** 132,149 ****
> re_array[0] = re_temp;
> }
>
> ! /* Perform RE match and return result */
> ! regexec_result = pg_regexec(&re_array[0].cre_re,
> ! data,
> ! data_len,
> ! NULL, /* no details */
> ! nmatch,
> ! pmatch,
> ! 0);
> !
> ! pfree(data);
> !
> ! return (regexec_result == 0);
> }
> }
>
> --- 121,127 ----
> re_array[0] = re_temp;
> }
>
> ! return re_array[0].cre_re;
> }
> }
>
> ***************
> *** 210,219 ****
> --- 188,231 ----
> re_array[0] = re_temp;
> num_res++;
>
> + return re_array[0].cre_re;
> + }
> +
> + /*
> + * RE_compile_and_execute - compile and execute a RE, caching if possible
> + *
> + * Returns TRUE on match, FALSE on no match
> + *
> + * text_re --- the pattern, expressed as an *untoasted* TEXT object
> + * dat --- the data to match against (need not be null-terminated)
> + * dat_len --- the length of the data string
> + * cflags --- compile options for the pattern
> + * nmatch, pmatch --- optional return area for match details
> + *
> + * Both pattern and data are given in the database encoding. We
> internally
> + * convert to array of pg_wchar which is what Spencer's regex package
> wants.
> + */
> + static bool
> + RE_compile_and_execute(text *text_re, unsigned char *dat, int dat_len,
> + int cflags, int nmatch, regmatch_t *pmatch)
> + {
> + pg_wchar *data;
> + size_t data_len;
> + int regexec_result;
> + regex_t re;
> +
> + /* Convert data string to wide characters */
> + data = (pg_wchar *) palloc((dat_len + 1) * sizeof(pg_wchar));
> + data_len = pg_mb2wchar_with_len(dat, data, dat_len);
> +
> + /* Compile RE */
> + re = RE_compile_and_cache(text_re, cflags);
> +
> /* Perform RE match and return result */
> regexec_result = pg_regexec(&re_array[0].cre_re,
> data,
> data_len,
> + 0,
> NULL, /* no details */
> nmatch,
> pmatch,
> ***************
> *** 415,420 ****
> --- 427,452 ----
> }
>
> PG_RETURN_NULL();
> + }
> +
> + /*
> + * textregexreplace()
> + * Return a replace string matched by a regular expression.
> + */
> + Datum
> + textregexreplace(PG_FUNCTION_ARGS)
> + {
> + text *s = PG_GETARG_TEXT_P(0);
> + text *p = PG_GETARG_TEXT_P(1);
> + text *r = PG_GETARG_TEXT_P(2);
> + regex_t re;
> +
> + re = RE_compile_and_cache(p, regex_flavor);
> +
> + return (DirectFunctionCall3(replace_text_regexp,
> + PointerGetDatum(s),
> + PointerGetDatum(&re),
> + PointerGetDatum(r)));
> }
>
> /* similar_escape()
> *** ./src/backend/utils/adt/varlena.c.orig Tue Jul 20 09:00:17 2004
> --- ./src/backend/utils/adt/varlena.c Tue Jul 20 10:23:32 2004
> ***************
> *** 28,33 ****
> --- 28,34 ----
> #include "utils/builtins.h"
> #include "utils/lsyscache.h"
> #include "utils/pg_locale.h"
> + #include "regex/regex.h"
>
>
> typedef struct varlena unknown;
> ***************
> *** 1971,1976 ****
> --- 1972,2122 ----
> ret_text = PG_STR_GET_TEXT(str->data);
> pfree(str->data);
> pfree(str);
> +
> + PG_RETURN_TEXT_P(ret_text);
> + }
> +
> + /*
> + * have_escape_in_regexp_replace_str
> + * check replace string have escape char
> + */
> + static bool
> + have_escape_in_regexp_replace_str(const char *replace_str)
> + {
> + return (strchr(replace_str, '\\') != NULL);
> + }
> +
> + #define REGEXP_REPLACE_BACKREF_CNT 10
> + /*
> + * appendStringInfoRegexpSubstr
> + * append string for regexp back references.
> + */
> + static void
> + appendStringInfoRegexpSubstr(StringInfo str, text *replace_text,
> + regmatch_t *pmatch, text *buf_text, int search_start)
> + {
> + const char *pstart = PG_TEXT_GET_STR(replace_text);
> + const char *p = pstart;
> + const char *pnext;
> +
> + text *add_text;
> + int so;
> + int eo;
> +
> + for(;;) {
> + pnext = strchr(p, '\\');
> + if(pnext == NULL) break;
> +
> + add_text = text_substring(PointerGetDatum(replace_text),
> + p - pstart + 1, pnext - p, false);
> + appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
> + pfree(add_text);
> +
> + p = pnext + 1;
> + so = eo = -1;
> +
> + if(*p >= '1' && *p <= '9') {
> + int idx = *p - '0';
> + so = pmatch[idx].rm_so - search_start;
> + eo = pmatch[idx].rm_eo - search_start;
> + p++;
> + } else {
> + switch(*p) {
> + case '&':
> + so = pmatch[0].rm_so - search_start;
> + eo = pmatch[0].rm_eo - search_start;
> + p++;
> + break;
> + }
> + }
> +
> + if(so != -1 && eo != -1) {
> + add_text = text_substring(PointerGetDatum(buf_text),
> + so + 1, (eo - so), false);
> + appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
> + pfree(add_text);
> + }
> + }
> +
> + add_text = text_substring(PointerGetDatum(replace_text),
> + p - pstart + 1, -1, true);
> + appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
> + pfree(add_text);
> + }
> +
> + /*
> + * replace_text_regexp
> + * replace text using regexp
> + */
> + Datum
> + replace_text_regexp(PG_FUNCTION_ARGS)
> + {
> + text *left_text;
> + text *right_text;
> + text *buf_text;
> + text *ret_text;
> + text *src_text = PG_GETARG_TEXT_P(0);
> + char *src_text_str = PG_TEXT_GET_STR(src_text);
> + int src_text_len = TEXTLEN(src_text);
> + regex_t *re = (regex_t *)PG_GETARG_POINTER(1);
> + text *replace_text = PG_GETARG_TEXT_P(2);
> + char *replace_str = PG_TEXT_GET_STR(replace_text);
> + StringInfo str = makeStringInfo();
> + int regexec_result;
> + regmatch_t pmatch[REGEXP_REPLACE_BACKREF_CNT];
> + pg_wchar *data;
> + size_t data_len;
> + int search_start;
> + bool have_escape;
> +
> + buf_text = TEXTDUP(src_text);
> +
> + /* Convert data string to wide characters */
> + data = (pg_wchar *) palloc((src_text_len + 1) * sizeof(pg_wchar));
> + data_len = pg_mb2wchar_with_len(src_text_str, data,
> strlen(src_text_str));
> +
> + have_escape = have_escape_in_regexp_replace_str(replace_str);
> +
> + for(search_start = 0; search_start <= data_len;) {
> + regexec_result = pg_regexec(re,
> + data,
> + data_len,
> + search_start,
> + NULL, /* no details */
> + REGEXP_REPLACE_BACKREF_CNT,
> + pmatch,
> + 0);
> + if(regexec_result != 0) break;
> +
> + left_text = text_substring(PointerGetDatum(buf_text),
> + 1, pmatch[0].rm_so - search_start, false);
> + right_text = text_substring(PointerGetDatum(buf_text),
> + pmatch[0].rm_eo - search_start + 1,
> + -1, true);
> +
> + appendStringInfoString(str, PG_TEXT_GET_STR(left_text));
> + if(have_escape) {
> + appendStringInfoRegexpSubstr(str, replace_text, pmatch,
> + buf_text, search_start);
> + } else {
> + appendStringInfoString(str, replace_str);
> + }
> +
> + pfree(buf_text);
> + pfree(left_text);
> + buf_text = right_text;
> +
> + search_start = pmatch[0].rm_eo;
> + if(pmatch[0].rm_so == pmatch[0].rm_eo) search_start++;
> + }
> +
> + appendStringInfoString(str, PG_TEXT_GET_STR(buf_text));
> + pfree(buf_text);
> +
> + ret_text = PG_STR_GET_TEXT(str->data);
> + pfree(str->data);
> + pfree(str);
> + pfree(data);
>
> PG_RETURN_TEXT_P(ret_text);
> }
> *** ./src/include/catalog/pg_proc.h.orig Tue Jul 20 09:24:11 2004
> --- ./src/include/catalog/pg_proc.h Tue Jul 20 09:26:11 2004
> ***************
> *** 2186,2191 ****
> --- 2186,2193 ----
> DESCR("return portion of string");
> DATA(insert OID = 2087 ( replace PGNSP PGUID 12 f f t f i 3 25 "25 25
> 25" replace_text - _null_ ));
> DESCR("replace all occurrences of old_substr with new_substr in string");
> + DATA(insert OID = 2167 ( regexp_replace PGNSP PGUID 12 f f t f i 3 25
> "25 25 25" textregexreplace - _null_ ));
> + DESCR("replace text using regexp");
> DATA(insert OID = 2088 ( split_part PGNSP PGUID 12 f f t f i 3 25 "25
> 25 23" split_text - _null_ ));
> DESCR("split string by field_sep and return field_num");
> DATA(insert OID = 2089 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "23"
> to_hex32 - _null_ ));
> *** ./src/include/regex/regex.h.orig Tue Jul 20 08:51:06 2004
> --- ./src/include/regex/regex.h Tue Jul 20 08:51:16 2004
> ***************
> *** 163,169 ****
> * the prototypes for exported functions
> */
> extern int pg_regcomp(regex_t *, const pg_wchar *, size_t, int);
> ! extern int pg_regexec(regex_t *, const pg_wchar *, size_t, rm_detail_t *,
> size_t, regmatch_t[], int);
> extern void pg_regfree(regex_t *);
> extern size_t pg_regerror(int, const regex_t *, char *, size_t);
>
> --- 163,169 ----
> * the prototypes for exported functions
> */
> extern int pg_regcomp(regex_t *, const pg_wchar *, size_t, int);
> ! extern int pg_regexec(regex_t *, const pg_wchar *, size_t, size_t,
> rm_detail_t *, size_t, regmatch_t[], int);
> extern void pg_regfree(regex_t *);
> extern size_t pg_regerror(int, const regex_t *, char *, size_t);
>
> *** ./src/include/utils/builtins.h.orig Tue Jul 20 09:11:19 2004
> --- ./src/include/utils/builtins.h Tue Jul 20 09:11:46 2004
> ***************
> *** 408,413 ****
> --- 408,414 ----
> extern Datum texticregexeq(PG_FUNCTION_ARGS);
> extern Datum texticregexne(PG_FUNCTION_ARGS);
> extern Datum textregexsubstr(PG_FUNCTION_ARGS);
> + extern Datum textregexreplace(PG_FUNCTION_ARGS);
> extern Datum similar_escape(PG_FUNCTION_ARGS);
> extern const char *assign_regex_flavor(const char *value,
> bool doit, bool interactive);
> ***************
> *** 537,542 ****
> --- 538,544 ----
> extern bool SplitIdentifierString(char *rawstring, char separator,
> List **namelist);
> extern Datum replace_text(PG_FUNCTION_ARGS);
> + extern Datum replace_text_regexp(PG_FUNCTION_ARGS);
> extern Datum split_text(PG_FUNCTION_ARGS);
> extern Datum text_to_array(PG_FUNCTION_ARGS);
> extern Datum array_to_text(PG_FUNCTION_ARGS);
>
> __________________________________________________
> Do You Yahoo!?
> http://bb.yahoo.co.jp/
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "a_ogawa00" <a_ogawa00(at)yahoo(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: regexp_replace
Date: 2005-06-04 23:01:30
Message-ID: 200506042301.j54N1U616494@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


I will add the documentation and make sure your oids are not duplicates.

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

a_ogawa00 wrote:
>
> This patch provides a new function regexp_replace.
> regexp_replace extends a replace function and enables text search
> by the regular expression. And, a back reference can be used within
> a replace string.
> (This patch for PostgreSQL 7.4.3)
>
> Function: regexp_replace(str, pattern, replace_str)
> Retuen Type: text
> Description: Replace all matched string in str.
> pattern is regular expression pattern.
> replace_str is replace string that can use '\1' - '\9', and
> '\&'.
> '\1' - '\9' is back reference to the n'th subexpression.
> '\&' is matched string.
>
> (example1)
> select regexp_replace('ABC-DEF', '(\\w+)-(\\w+)', '\\2-\\1')
> result: DEF-ABC
>
> (example2)
> update tab1 set col1 = regexp_replace(col1, '[A-Z]', '');
>
> ---
> Atsushi Ogawa
> a_ogawa(at)hi-ho(dot)ne(dot)jp
>
> --- cut here ---
>
> *** ./src/backend/regex/regexec.c.orig Tue Jul 20 08:45:39 2004
> --- ./src/backend/regex/regexec.c Tue Jul 20 08:49:36 2004
> ***************
> *** 110,115 ****
> --- 110,116 ----
> regmatch_t *pmatch;
> rm_detail_t *details;
> chr *start; /* start of string */
> + chr *search_start; /* search start of string */
> chr *stop; /* just past end of string */
> int err; /* error code if any (0 none) */
> regoff_t *mem; /* memory vector for backtracking */
> ***************
> *** 168,173 ****
> --- 169,175 ----
> pg_regexec(regex_t *re,
> const chr *string,
> size_t len,
> + size_t search_start,
> rm_detail_t *details,
> size_t nmatch,
> regmatch_t pmatch[],
> ***************
> *** 219,224 ****
> --- 221,227 ----
> v->pmatch = pmatch;
> v->details = details;
> v->start = (chr *) string;
> + v->search_start = (chr *) string + search_start;
> v->stop = (chr *) string + len;
> v->err = 0;
> if (backref)
> ***************
> *** 288,294 ****
> NOERR();
> MDEBUG(("\nsearch at %ld\n", LOFF(v->start)));
> cold = NULL;
> ! close = shortest(v, s, v->start, v->start, v->stop, &cold, (int *)
> NULL);
> freedfa(s);
> NOERR();
> if (v->g->cflags & REG_EXPECT)
> --- 291,298 ----
> NOERR();
> MDEBUG(("\nsearch at %ld\n", LOFF(v->start)));
> cold = NULL;
> ! close = shortest(v, s, v->search_start, v->search_start, v->stop,
> ! &cold, (int *) NULL);
> freedfa(s);
> NOERR();
> if (v->g->cflags & REG_EXPECT)
> ***************
> *** 415,421 ****
>
> assert(d != NULL && s != NULL);
> cold = NULL;
> ! close = v->start;
> do
> {
> MDEBUG(("\ncsearch at %ld\n", LOFF(close)));
> --- 419,425 ----
>
> assert(d != NULL && s != NULL);
> cold = NULL;
> ! close = v->search_start;
> do
> {
> MDEBUG(("\ncsearch at %ld\n", LOFF(close)));
> *** ./src/backend/utils/adt/regexp.c.orig Tue Jul 20 08:50:08 2004
> --- ./src/backend/utils/adt/regexp.c Tue Jul 20 09:00:05 2004
> ***************
> *** 80,116 ****
>
>
> /*
> ! * RE_compile_and_execute - compile and execute a RE, caching if possible
> *
> ! * Returns TRUE on match, FALSE on no match
> *
> ! * text_re --- the pattern, expressed as an *untoasted* TEXT object
> ! * dat --- the data to match against (need not be null-terminated)
> ! * dat_len --- the length of the data string
> ! * cflags --- compile options for the pattern
> ! * nmatch, pmatch --- optional return area for match details
> *
> ! * Both pattern and data are given in the database encoding. We
> internally
> ! * convert to array of pg_wchar which is what Spencer's regex package
> wants.
> */
> ! static bool
> ! RE_compile_and_execute(text *text_re, unsigned char *dat, int dat_len,
> ! int cflags, int nmatch, regmatch_t *pmatch)
> {
> int text_re_len = VARSIZE(text_re);
> - pg_wchar *data;
> - size_t data_len;
> pg_wchar *pattern;
> size_t pattern_len;
> int i;
> int regcomp_result;
> - int regexec_result;
> cached_re_str re_temp;
>
> - /* Convert data string to wide characters */
> - data = (pg_wchar *) palloc((dat_len + 1) * sizeof(pg_wchar));
> - data_len = pg_mb2wchar_with_len(dat, data, dat_len);
> -
> /*
> * Look for a match among previously compiled REs. Since the data
> * structure is self-organizing with most-used entries at the front,
> --- 80,105 ----
>
>
> /*
> ! * RE_compile_and_cache - compile a RE, caching if possible
> *
> ! * Returns regex_t
> *
> ! * text_re --- the pattern, expressed as an *untoasted* TEXT object
> ! * cflags --- compile options for the pattern
> *
> ! * Pattern is given in the database encoding. We internally convert to
> ! * array of pg_wchar which is what Spencer's regex package wants.
> */
> ! static regex_t
> ! RE_compile_and_cache(text *text_re, int cflags)
> {
> int text_re_len = VARSIZE(text_re);
> pg_wchar *pattern;
> size_t pattern_len;
> int i;
> int regcomp_result;
> cached_re_str re_temp;
>
> /*
> * Look for a match among previously compiled REs. Since the data
> * structure is self-organizing with most-used entries at the front,
> ***************
> *** 132,149 ****
> re_array[0] = re_temp;
> }
>
> ! /* Perform RE match and return result */
> ! regexec_result = pg_regexec(&re_array[0].cre_re,
> ! data,
> ! data_len,
> ! NULL, /* no details */
> ! nmatch,
> ! pmatch,
> ! 0);
> !
> ! pfree(data);
> !
> ! return (regexec_result == 0);
> }
> }
>
> --- 121,127 ----
> re_array[0] = re_temp;
> }
>
> ! return re_array[0].cre_re;
> }
> }
>
> ***************
> *** 210,219 ****
> --- 188,231 ----
> re_array[0] = re_temp;
> num_res++;
>
> + return re_array[0].cre_re;
> + }
> +
> + /*
> + * RE_compile_and_execute - compile and execute a RE, caching if possible
> + *
> + * Returns TRUE on match, FALSE on no match
> + *
> + * text_re --- the pattern, expressed as an *untoasted* TEXT object
> + * dat --- the data to match against (need not be null-terminated)
> + * dat_len --- the length of the data string
> + * cflags --- compile options for the pattern
> + * nmatch, pmatch --- optional return area for match details
> + *
> + * Both pattern and data are given in the database encoding. We
> internally
> + * convert to array of pg_wchar which is what Spencer's regex package
> wants.
> + */
> + static bool
> + RE_compile_and_execute(text *text_re, unsigned char *dat, int dat_len,
> + int cflags, int nmatch, regmatch_t *pmatch)
> + {
> + pg_wchar *data;
> + size_t data_len;
> + int regexec_result;
> + regex_t re;
> +
> + /* Convert data string to wide characters */
> + data = (pg_wchar *) palloc((dat_len + 1) * sizeof(pg_wchar));
> + data_len = pg_mb2wchar_with_len(dat, data, dat_len);
> +
> + /* Compile RE */
> + re = RE_compile_and_cache(text_re, cflags);
> +
> /* Perform RE match and return result */
> regexec_result = pg_regexec(&re_array[0].cre_re,
> data,
> data_len,
> + 0,
> NULL, /* no details */
> nmatch,
> pmatch,
> ***************
> *** 415,420 ****
> --- 427,452 ----
> }
>
> PG_RETURN_NULL();
> + }
> +
> + /*
> + * textregexreplace()
> + * Return a replace string matched by a regular expression.
> + */
> + Datum
> + textregexreplace(PG_FUNCTION_ARGS)
> + {
> + text *s = PG_GETARG_TEXT_P(0);
> + text *p = PG_GETARG_TEXT_P(1);
> + text *r = PG_GETARG_TEXT_P(2);
> + regex_t re;
> +
> + re = RE_compile_and_cache(p, regex_flavor);
> +
> + return (DirectFunctionCall3(replace_text_regexp,
> + PointerGetDatum(s),
> + PointerGetDatum(&re),
> + PointerGetDatum(r)));
> }
>
> /* similar_escape()
> *** ./src/backend/utils/adt/varlena.c.orig Tue Jul 20 09:00:17 2004
> --- ./src/backend/utils/adt/varlena.c Tue Jul 20 10:23:32 2004
> ***************
> *** 28,33 ****
> --- 28,34 ----
> #include "utils/builtins.h"
> #include "utils/lsyscache.h"
> #include "utils/pg_locale.h"
> + #include "regex/regex.h"
>
>
> typedef struct varlena unknown;
> ***************
> *** 1971,1976 ****
> --- 1972,2122 ----
> ret_text = PG_STR_GET_TEXT(str->data);
> pfree(str->data);
> pfree(str);
> +
> + PG_RETURN_TEXT_P(ret_text);
> + }
> +
> + /*
> + * have_escape_in_regexp_replace_str
> + * check replace string have escape char
> + */
> + static bool
> + have_escape_in_regexp_replace_str(const char *replace_str)
> + {
> + return (strchr(replace_str, '\\') != NULL);
> + }
> +
> + #define REGEXP_REPLACE_BACKREF_CNT 10
> + /*
> + * appendStringInfoRegexpSubstr
> + * append string for regexp back references.
> + */
> + static void
> + appendStringInfoRegexpSubstr(StringInfo str, text *replace_text,
> + regmatch_t *pmatch, text *buf_text, int search_start)
> + {
> + const char *pstart = PG_TEXT_GET_STR(replace_text);
> + const char *p = pstart;
> + const char *pnext;
> +
> + text *add_text;
> + int so;
> + int eo;
> +
> + for(;;) {
> + pnext = strchr(p, '\\');
> + if(pnext == NULL) break;
> +
> + add_text = text_substring(PointerGetDatum(replace_text),
> + p - pstart + 1, pnext - p, false);
> + appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
> + pfree(add_text);
> +
> + p = pnext + 1;
> + so = eo = -1;
> +
> + if(*p >= '1' && *p <= '9') {
> + int idx = *p - '0';
> + so = pmatch[idx].rm_so - search_start;
> + eo = pmatch[idx].rm_eo - search_start;
> + p++;
> + } else {
> + switch(*p) {
> + case '&':
> + so = pmatch[0].rm_so - search_start;
> + eo = pmatch[0].rm_eo - search_start;
> + p++;
> + break;
> + }
> + }
> +
> + if(so != -1 && eo != -1) {
> + add_text = text_substring(PointerGetDatum(buf_text),
> + so + 1, (eo - so), false);
> + appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
> + pfree(add_text);
> + }
> + }
> +
> + add_text = text_substring(PointerGetDatum(replace_text),
> + p - pstart + 1, -1, true);
> + appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
> + pfree(add_text);
> + }
> +
> + /*
> + * replace_text_regexp
> + * replace text using regexp
> + */
> + Datum
> + replace_text_regexp(PG_FUNCTION_ARGS)
> + {
> + text *left_text;
> + text *right_text;
> + text *buf_text;
> + text *ret_text;
> + text *src_text = PG_GETARG_TEXT_P(0);
> + char *src_text_str = PG_TEXT_GET_STR(src_text);
> + int src_text_len = TEXTLEN(src_text);
> + regex_t *re = (regex_t *)PG_GETARG_POINTER(1);
> + text *replace_text = PG_GETARG_TEXT_P(2);
> + char *replace_str = PG_TEXT_GET_STR(replace_text);
> + StringInfo str = makeStringInfo();
> + int regexec_result;
> + regmatch_t pmatch[REGEXP_REPLACE_BACKREF_CNT];
> + pg_wchar *data;
> + size_t data_len;
> + int search_start;
> + bool have_escape;
> +
> + buf_text = TEXTDUP(src_text);
> +
> + /* Convert data string to wide characters */
> + data = (pg_wchar *) palloc((src_text_len + 1) * sizeof(pg_wchar));
> + data_len = pg_mb2wchar_with_len(src_text_str, data,
> strlen(src_text_str));
> +
> + have_escape = have_escape_in_regexp_replace_str(replace_str);
> +
> + for(search_start = 0; search_start <= data_len;) {
> + regexec_result = pg_regexec(re,
> + data,
> + data_len,
> + search_start,
> + NULL, /* no details */
> + REGEXP_REPLACE_BACKREF_CNT,
> + pmatch,
> + 0);
> + if(regexec_result != 0) break;
> +
> + left_text = text_substring(PointerGetDatum(buf_text),
> + 1, pmatch[0].rm_so - search_start, false);
> + right_text = text_substring(PointerGetDatum(buf_text),
> + pmatch[0].rm_eo - search_start + 1,
> + -1, true);
> +
> + appendStringInfoString(str, PG_TEXT_GET_STR(left_text));
> + if(have_escape) {
> + appendStringInfoRegexpSubstr(str, replace_text, pmatch,
> + buf_text, search_start);
> + } else {
> + appendStringInfoString(str, replace_str);
> + }
> +
> + pfree(buf_text);
> + pfree(left_text);
> + buf_text = right_text;
> +
> + search_start = pmatch[0].rm_eo;
> + if(pmatch[0].rm_so == pmatch[0].rm_eo) search_start++;
> + }
> +
> + appendStringInfoString(str, PG_TEXT_GET_STR(buf_text));
> + pfree(buf_text);
> +
> + ret_text = PG_STR_GET_TEXT(str->data);
> + pfree(str->data);
> + pfree(str);
> + pfree(data);
>
> PG_RETURN_TEXT_P(ret_text);
> }
> *** ./src/include/catalog/pg_proc.h.orig Tue Jul 20 09:24:11 2004
> --- ./src/include/catalog/pg_proc.h Tue Jul 20 09:26:11 2004
> ***************
> *** 2186,2191 ****
> --- 2186,2193 ----
> DESCR("return portion of string");
> DATA(insert OID = 2087 ( replace PGNSP PGUID 12 f f t f i 3 25 "25 25
> 25" replace_text - _null_ ));
> DESCR("replace all occurrences of old_substr with new_substr in string");
> + DATA(insert OID = 2167 ( regexp_replace PGNSP PGUID 12 f f t f i 3 25
> "25 25 25" textregexreplace - _null_ ));
> + DESCR("replace text using regexp");
> DATA(insert OID = 2088 ( split_part PGNSP PGUID 12 f f t f i 3 25 "25
> 25 23" split_text - _null_ ));
> DESCR("split string by field_sep and return field_num");
> DATA(insert OID = 2089 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "23"
> to_hex32 - _null_ ));
> *** ./src/include/regex/regex.h.orig Tue Jul 20 08:51:06 2004
> --- ./src/include/regex/regex.h Tue Jul 20 08:51:16 2004
> ***************
> *** 163,169 ****
> * the prototypes for exported functions
> */
> extern int pg_regcomp(regex_t *, const pg_wchar *, size_t, int);
> ! extern int pg_regexec(regex_t *, const pg_wchar *, size_t, rm_detail_t *,
> size_t, regmatch_t[], int);
> extern void pg_regfree(regex_t *);
> extern size_t pg_regerror(int, const regex_t *, char *, size_t);
>
> --- 163,169 ----
> * the prototypes for exported functions
> */
> extern int pg_regcomp(regex_t *, const pg_wchar *, size_t, int);
> ! extern int pg_regexec(regex_t *, const pg_wchar *, size_t, size_t,
> rm_detail_t *, size_t, regmatch_t[], int);
> extern void pg_regfree(regex_t *);
> extern size_t pg_regerror(int, const regex_t *, char *, size_t);
>
> *** ./src/include/utils/builtins.h.orig Tue Jul 20 09:11:19 2004
> --- ./src/include/utils/builtins.h Tue Jul 20 09:11:46 2004
> ***************
> *** 408,413 ****
> --- 408,414 ----
> extern Datum texticregexeq(PG_FUNCTION_ARGS);
> extern Datum texticregexne(PG_FUNCTION_ARGS);
> extern Datum textregexsubstr(PG_FUNCTION_ARGS);
> + extern Datum textregexreplace(PG_FUNCTION_ARGS);
> extern Datum similar_escape(PG_FUNCTION_ARGS);
> extern const char *assign_regex_flavor(const char *value,
> bool doit, bool interactive);
> ***************
> *** 537,542 ****
> --- 538,544 ----
> extern bool SplitIdentifierString(char *rawstring, char separator,
> List **namelist);
> extern Datum replace_text(PG_FUNCTION_ARGS);
> + extern Datum replace_text_regexp(PG_FUNCTION_ARGS);
> extern Datum split_text(PG_FUNCTION_ARGS);
> extern Datum text_to_array(PG_FUNCTION_ARGS);
> extern Datum array_to_text(PG_FUNCTION_ARGS);
>
> __________________________________________________
> Do You Yahoo!?
> http://bb.yahoo.co.jp/
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "a_ogawa00" <a_ogawa00(at)yahoo(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: regexp_replace
Date: 2005-06-05 04:11:34
Message-ID: 657.1117944694@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Your patch has been added to the PostgreSQL unapplied patches list at:

> a_ogawa00 wrote:
>> This patch provides a new function regexp_replace.
>> regexp_replace extends a replace function and enables text search
>> by the regular expression. And, a back reference can be used within
>> a replace string.
>> (This patch for PostgreSQL 7.4.3)

Don't we have this functionality already? It's even SQL-spec ...

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "a_ogawa00" <a_ogawa00(at)yahoo(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: regexp_replace
Date: 2005-06-06 15:34:47
Message-ID: 200506061534.j56FYlf14794@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Your patch has been added to the PostgreSQL unapplied patches list at:
>
> > a_ogawa00 wrote:
> >> This patch provides a new function regexp_replace.
> >> regexp_replace extends a replace function and enables text search
> >> by the regular expression. And, a back reference can be used within
> >> a replace string.
> >> (This patch for PostgreSQL 7.4.3)
>
> Don't we have this functionality already? It's even SQL-spec ...

Uh, all I see it replace(), which isn't regex:

<row>
<entry><literal><function>replace</function>(<parameter>string</parameter> <type>text</type>,
<parameter>from</parameter> <type>text</type>,
<parameter>to</parameter> <type>text</type>)</literal></entry>
<entry><type>text</type></entry>
<entry>Replace all occurrences in <parameter>string</parameter> of substring
<parameter>from</parameter> with substring <parameter>to</parameter>
</entry>
<entry><literal>replace( 'abcdefabcdef', 'cd', 'XX')</literal></entry>
<entry><literal>abXXefabXXef</literal></entry>
</row>

test=> SELECT replace('abc','a','d');
replace
---------
dbc
(1 row)

test=> SELECT replace('abc','[a-c]','d');
replace
---------
abc
(1 row)

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "a_ogawa00" <a_ogawa00(at)yahoo(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: regexp_replace
Date: 2005-06-06 15:39:27
Message-ID: 1774.1118072367@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom Lane wrote:
>> Don't we have this functionality already? It's even SQL-spec ...

> Uh, all I see it replace(), which isn't regex:

The SQL-spec function is substring(string from pattern for escape-char);
see
http://www.postgresql.org/docs/8.0/static/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP

and we also have a variant of that for POSIX rather than SQL-style
regexps:
http://www.postgresql.org/docs/8.0/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "a_ogawa00" <a_ogawa00(at)yahoo(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: regexp_replace
Date: 2005-06-06 16:02:11
Message-ID: 200506061602.j56G2BX18556@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Nice. Patch removed.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom Lane wrote:
> >> Don't we have this functionality already? It's even SQL-spec ...
>
> > Uh, all I see it replace(), which isn't regex:
>
> The SQL-spec function is substring(string from pattern for escape-char);
> see
> http://www.postgresql.org/docs/8.0/static/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP
>
> and we also have a variant of that for POSIX rather than SQL-style
> regexps:
> http://www.postgresql.org/docs/8.0/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "a_ogawa00" <a_ogawa00(at)yahoo(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: regexp_replace
Date: 2005-06-06 16:02:18
Message-ID: 200506061602.j56G2Ii18581@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Patch removed because we already have this functionality.

---------------------------------------------------------------------------

a_ogawa00 wrote:
>
> This patch provides a new function regexp_replace.
> regexp_replace extends a replace function and enables text search
> by the regular expression. And, a back reference can be used within
> a replace string.
> (This patch for PostgreSQL 7.4.3)
>
> Function: regexp_replace(str, pattern, replace_str)
> Retuen Type: text
> Description: Replace all matched string in str.
> pattern is regular expression pattern.
> replace_str is replace string that can use '\1' - '\9', and
> '\&'.
> '\1' - '\9' is back reference to the n'th subexpression.
> '\&' is matched string.
>
> (example1)
> select regexp_replace('ABC-DEF', '(\\w+)-(\\w+)', '\\2-\\1')
> result: DEF-ABC
>
> (example2)
> update tab1 set col1 = regexp_replace(col1, '[A-Z]', '');
>
> ---
> Atsushi Ogawa
> a_ogawa(at)hi-ho(dot)ne(dot)jp
>
> --- cut here ---
>
> *** ./src/backend/regex/regexec.c.orig Tue Jul 20 08:45:39 2004
> --- ./src/backend/regex/regexec.c Tue Jul 20 08:49:36 2004
> ***************
> *** 110,115 ****
> --- 110,116 ----
> regmatch_t *pmatch;
> rm_detail_t *details;
> chr *start; /* start of string */
> + chr *search_start; /* search start of string */
> chr *stop; /* just past end of string */
> int err; /* error code if any (0 none) */
> regoff_t *mem; /* memory vector for backtracking */
> ***************
> *** 168,173 ****
> --- 169,175 ----
> pg_regexec(regex_t *re,
> const chr *string,
> size_t len,
> + size_t search_start,
> rm_detail_t *details,
> size_t nmatch,
> regmatch_t pmatch[],
> ***************
> *** 219,224 ****
> --- 221,227 ----
> v->pmatch = pmatch;
> v->details = details;
> v->start = (chr *) string;
> + v->search_start = (chr *) string + search_start;
> v->stop = (chr *) string + len;
> v->err = 0;
> if (backref)
> ***************
> *** 288,294 ****
> NOERR();
> MDEBUG(("\nsearch at %ld\n", LOFF(v->start)));
> cold = NULL;
> ! close = shortest(v, s, v->start, v->start, v->stop, &cold, (int *)
> NULL);
> freedfa(s);
> NOERR();
> if (v->g->cflags & REG_EXPECT)
> --- 291,298 ----
> NOERR();
> MDEBUG(("\nsearch at %ld\n", LOFF(v->start)));
> cold = NULL;
> ! close = shortest(v, s, v->search_start, v->search_start, v->stop,
> ! &cold, (int *) NULL);
> freedfa(s);
> NOERR();
> if (v->g->cflags & REG_EXPECT)
> ***************
> *** 415,421 ****
>
> assert(d != NULL && s != NULL);
> cold = NULL;
> ! close = v->start;
> do
> {
> MDEBUG(("\ncsearch at %ld\n", LOFF(close)));
> --- 419,425 ----
>
> assert(d != NULL && s != NULL);
> cold = NULL;
> ! close = v->search_start;
> do
> {
> MDEBUG(("\ncsearch at %ld\n", LOFF(close)));
> *** ./src/backend/utils/adt/regexp.c.orig Tue Jul 20 08:50:08 2004
> --- ./src/backend/utils/adt/regexp.c Tue Jul 20 09:00:05 2004
> ***************
> *** 80,116 ****
>
>
> /*
> ! * RE_compile_and_execute - compile and execute a RE, caching if possible
> *
> ! * Returns TRUE on match, FALSE on no match
> *
> ! * text_re --- the pattern, expressed as an *untoasted* TEXT object
> ! * dat --- the data to match against (need not be null-terminated)
> ! * dat_len --- the length of the data string
> ! * cflags --- compile options for the pattern
> ! * nmatch, pmatch --- optional return area for match details
> *
> ! * Both pattern and data are given in the database encoding. We
> internally
> ! * convert to array of pg_wchar which is what Spencer's regex package
> wants.
> */
> ! static bool
> ! RE_compile_and_execute(text *text_re, unsigned char *dat, int dat_len,
> ! int cflags, int nmatch, regmatch_t *pmatch)
> {
> int text_re_len = VARSIZE(text_re);
> - pg_wchar *data;
> - size_t data_len;
> pg_wchar *pattern;
> size_t pattern_len;
> int i;
> int regcomp_result;
> - int regexec_result;
> cached_re_str re_temp;
>
> - /* Convert data string to wide characters */
> - data = (pg_wchar *) palloc((dat_len + 1) * sizeof(pg_wchar));
> - data_len = pg_mb2wchar_with_len(dat, data, dat_len);
> -
> /*
> * Look for a match among previously compiled REs. Since the data
> * structure is self-organizing with most-used entries at the front,
> --- 80,105 ----
>
>
> /*
> ! * RE_compile_and_cache - compile a RE, caching if possible
> *
> ! * Returns regex_t
> *
> ! * text_re --- the pattern, expressed as an *untoasted* TEXT object
> ! * cflags --- compile options for the pattern
> *
> ! * Pattern is given in the database encoding. We internally convert to
> ! * array of pg_wchar which is what Spencer's regex package wants.
> */
> ! static regex_t
> ! RE_compile_and_cache(text *text_re, int cflags)
> {
> int text_re_len = VARSIZE(text_re);
> pg_wchar *pattern;
> size_t pattern_len;
> int i;
> int regcomp_result;
> cached_re_str re_temp;
>
> /*
> * Look for a match among previously compiled REs. Since the data
> * structure is self-organizing with most-used entries at the front,
> ***************
> *** 132,149 ****
> re_array[0] = re_temp;
> }
>
> ! /* Perform RE match and return result */
> ! regexec_result = pg_regexec(&re_array[0].cre_re,
> ! data,
> ! data_len,
> ! NULL, /* no details */
> ! nmatch,
> ! pmatch,
> ! 0);
> !
> ! pfree(data);
> !
> ! return (regexec_result == 0);
> }
> }
>
> --- 121,127 ----
> re_array[0] = re_temp;
> }
>
> ! return re_array[0].cre_re;
> }
> }
>
> ***************
> *** 210,219 ****
> --- 188,231 ----
> re_array[0] = re_temp;
> num_res++;
>
> + return re_array[0].cre_re;
> + }
> +
> + /*
> + * RE_compile_and_execute - compile and execute a RE, caching if possible
> + *
> + * Returns TRUE on match, FALSE on no match
> + *
> + * text_re --- the pattern, expressed as an *untoasted* TEXT object
> + * dat --- the data to match against (need not be null-terminated)
> + * dat_len --- the length of the data string
> + * cflags --- compile options for the pattern
> + * nmatch, pmatch --- optional return area for match details
> + *
> + * Both pattern and data are given in the database encoding. We
> internally
> + * convert to array of pg_wchar which is what Spencer's regex package
> wants.
> + */
> + static bool
> + RE_compile_and_execute(text *text_re, unsigned char *dat, int dat_len,
> + int cflags, int nmatch, regmatch_t *pmatch)
> + {
> + pg_wchar *data;
> + size_t data_len;
> + int regexec_result;
> + regex_t re;
> +
> + /* Convert data string to wide characters */
> + data = (pg_wchar *) palloc((dat_len + 1) * sizeof(pg_wchar));
> + data_len = pg_mb2wchar_with_len(dat, data, dat_len);
> +
> + /* Compile RE */
> + re = RE_compile_and_cache(text_re, cflags);
> +
> /* Perform RE match and return result */
> regexec_result = pg_regexec(&re_array[0].cre_re,
> data,
> data_len,
> + 0,
> NULL, /* no details */
> nmatch,
> pmatch,
> ***************
> *** 415,420 ****
> --- 427,452 ----
> }
>
> PG_RETURN_NULL();
> + }
> +
> + /*
> + * textregexreplace()
> + * Return a replace string matched by a regular expression.
> + */
> + Datum
> + textregexreplace(PG_FUNCTION_ARGS)
> + {
> + text *s = PG_GETARG_TEXT_P(0);
> + text *p = PG_GETARG_TEXT_P(1);
> + text *r = PG_GETARG_TEXT_P(2);
> + regex_t re;
> +
> + re = RE_compile_and_cache(p, regex_flavor);
> +
> + return (DirectFunctionCall3(replace_text_regexp,
> + PointerGetDatum(s),
> + PointerGetDatum(&re),
> + PointerGetDatum(r)));
> }
>
> /* similar_escape()
> *** ./src/backend/utils/adt/varlena.c.orig Tue Jul 20 09:00:17 2004
> --- ./src/backend/utils/adt/varlena.c Tue Jul 20 10:23:32 2004
> ***************
> *** 28,33 ****
> --- 28,34 ----
> #include "utils/builtins.h"
> #include "utils/lsyscache.h"
> #include "utils/pg_locale.h"
> + #include "regex/regex.h"
>
>
> typedef struct varlena unknown;
> ***************
> *** 1971,1976 ****
> --- 1972,2122 ----
> ret_text = PG_STR_GET_TEXT(str->data);
> pfree(str->data);
> pfree(str);
> +
> + PG_RETURN_TEXT_P(ret_text);
> + }
> +
> + /*
> + * have_escape_in_regexp_replace_str
> + * check replace string have escape char
> + */
> + static bool
> + have_escape_in_regexp_replace_str(const char *replace_str)
> + {
> + return (strchr(replace_str, '\\') != NULL);
> + }
> +
> + #define REGEXP_REPLACE_BACKREF_CNT 10
> + /*
> + * appendStringInfoRegexpSubstr
> + * append string for regexp back references.
> + */
> + static void
> + appendStringInfoRegexpSubstr(StringInfo str, text *replace_text,
> + regmatch_t *pmatch, text *buf_text, int search_start)
> + {
> + const char *pstart = PG_TEXT_GET_STR(replace_text);
> + const char *p = pstart;
> + const char *pnext;
> +
> + text *add_text;
> + int so;
> + int eo;
> +
> + for(;;) {
> + pnext = strchr(p, '\\');
> + if(pnext == NULL) break;
> +
> + add_text = text_substring(PointerGetDatum(replace_text),
> + p - pstart + 1, pnext - p, false);
> + appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
> + pfree(add_text);
> +
> + p = pnext + 1;
> + so = eo = -1;
> +
> + if(*p >= '1' && *p <= '9') {
> + int idx = *p - '0';
> + so = pmatch[idx].rm_so - search_start;
> + eo = pmatch[idx].rm_eo - search_start;
> + p++;
> + } else {
> + switch(*p) {
> + case '&':
> + so = pmatch[0].rm_so - search_start;
> + eo = pmatch[0].rm_eo - search_start;
> + p++;
> + break;
> + }
> + }
> +
> + if(so != -1 && eo != -1) {
> + add_text = text_substring(PointerGetDatum(buf_text),
> + so + 1, (eo - so), false);
> + appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
> + pfree(add_text);
> + }
> + }
> +
> + add_text = text_substring(PointerGetDatum(replace_text),
> + p - pstart + 1, -1, true);
> + appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
> + pfree(add_text);
> + }
> +
> + /*
> + * replace_text_regexp
> + * replace text using regexp
> + */
> + Datum
> + replace_text_regexp(PG_FUNCTION_ARGS)
> + {
> + text *left_text;
> + text *right_text;
> + text *buf_text;
> + text *ret_text;
> + text *src_text = PG_GETARG_TEXT_P(0);
> + char *src_text_str = PG_TEXT_GET_STR(src_text);
> + int src_text_len = TEXTLEN(src_text);
> + regex_t *re = (regex_t *)PG_GETARG_POINTER(1);
> + text *replace_text = PG_GETARG_TEXT_P(2);
> + char *replace_str = PG_TEXT_GET_STR(replace_text);
> + StringInfo str = makeStringInfo();
> + int regexec_result;
> + regmatch_t pmatch[REGEXP_REPLACE_BACKREF_CNT];
> + pg_wchar *data;
> + size_t data_len;
> + int search_start;
> + bool have_escape;
> +
> + buf_text = TEXTDUP(src_text);
> +
> + /* Convert data string to wide characters */
> + data = (pg_wchar *) palloc((src_text_len + 1) * sizeof(pg_wchar));
> + data_len = pg_mb2wchar_with_len(src_text_str, data,
> strlen(src_text_str));
> +
> + have_escape = have_escape_in_regexp_replace_str(replace_str);
> +
> + for(search_start = 0; search_start <= data_len;) {
> + regexec_result = pg_regexec(re,
> + data,
> + data_len,
> + search_start,
> + NULL, /* no details */
> + REGEXP_REPLACE_BACKREF_CNT,
> + pmatch,
> + 0);
> + if(regexec_result != 0) break;
> +
> + left_text = text_substring(PointerGetDatum(buf_text),
> + 1, pmatch[0].rm_so - search_start, false);
> + right_text = text_substring(PointerGetDatum(buf_text),
> + pmatch[0].rm_eo - search_start + 1,
> + -1, true);
> +
> + appendStringInfoString(str, PG_TEXT_GET_STR(left_text));
> + if(have_escape) {
> + appendStringInfoRegexpSubstr(str, replace_text, pmatch,
> + buf_text, search_start);
> + } else {
> + appendStringInfoString(str, replace_str);
> + }
> +
> + pfree(buf_text);
> + pfree(left_text);
> + buf_text = right_text;
> +
> + search_start = pmatch[0].rm_eo;
> + if(pmatch[0].rm_so == pmatch[0].rm_eo) search_start++;
> + }
> +
> + appendStringInfoString(str, PG_TEXT_GET_STR(buf_text));
> + pfree(buf_text);
> +
> + ret_text = PG_STR_GET_TEXT(str->data);
> + pfree(str->data);
> + pfree(str);
> + pfree(data);
>
> PG_RETURN_TEXT_P(ret_text);
> }
> *** ./src/include/catalog/pg_proc.h.orig Tue Jul 20 09:24:11 2004
> --- ./src/include/catalog/pg_proc.h Tue Jul 20 09:26:11 2004
> ***************
> *** 2186,2191 ****
> --- 2186,2193 ----
> DESCR("return portion of string");
> DATA(insert OID = 2087 ( replace PGNSP PGUID 12 f f t f i 3 25 "25 25
> 25" replace_text - _null_ ));
> DESCR("replace all occurrences of old_substr with new_substr in string");
> + DATA(insert OID = 2167 ( regexp_replace PGNSP PGUID 12 f f t f i 3 25
> "25 25 25" textregexreplace - _null_ ));
> + DESCR("replace text using regexp");
> DATA(insert OID = 2088 ( split_part PGNSP PGUID 12 f f t f i 3 25 "25
> 25 23" split_text - _null_ ));
> DESCR("split string by field_sep and return field_num");
> DATA(insert OID = 2089 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "23"
> to_hex32 - _null_ ));
> *** ./src/include/regex/regex.h.orig Tue Jul 20 08:51:06 2004
> --- ./src/include/regex/regex.h Tue Jul 20 08:51:16 2004
> ***************
> *** 163,169 ****
> * the prototypes for exported functions
> */
> extern int pg_regcomp(regex_t *, const pg_wchar *, size_t, int);
> ! extern int pg_regexec(regex_t *, const pg_wchar *, size_t, rm_detail_t *,
> size_t, regmatch_t[], int);
> extern void pg_regfree(regex_t *);
> extern size_t pg_regerror(int, const regex_t *, char *, size_t);
>
> --- 163,169 ----
> * the prototypes for exported functions
> */
> extern int pg_regcomp(regex_t *, const pg_wchar *, size_t, int);
> ! extern int pg_regexec(regex_t *, const pg_wchar *, size_t, size_t,
> rm_detail_t *, size_t, regmatch_t[], int);
> extern void pg_regfree(regex_t *);
> extern size_t pg_regerror(int, const regex_t *, char *, size_t);
>
> *** ./src/include/utils/builtins.h.orig Tue Jul 20 09:11:19 2004
> --- ./src/include/utils/builtins.h Tue Jul 20 09:11:46 2004
> ***************
> *** 408,413 ****
> --- 408,414 ----
> extern Datum texticregexeq(PG_FUNCTION_ARGS);
> extern Datum texticregexne(PG_FUNCTION_ARGS);
> extern Datum textregexsubstr(PG_FUNCTION_ARGS);
> + extern Datum textregexreplace(PG_FUNCTION_ARGS);
> extern Datum similar_escape(PG_FUNCTION_ARGS);
> extern const char *assign_regex_flavor(const char *value,
> bool doit, bool interactive);
> ***************
> *** 537,542 ****
> --- 538,544 ----
> extern bool SplitIdentifierString(char *rawstring, char separator,
> List **namelist);
> extern Datum replace_text(PG_FUNCTION_ARGS);
> + extern Datum replace_text_regexp(PG_FUNCTION_ARGS);
> extern Datum split_text(PG_FUNCTION_ARGS);
> extern Datum text_to_array(PG_FUNCTION_ARGS);
> extern Datum array_to_text(PG_FUNCTION_ARGS);
>
> __________________________________________________
> Do You Yahoo!?
> http://bb.yahoo.co.jp/
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: David Fetter <david(at)fetter(dot)org>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: a_ogawa00 <a_ogawa00(at)yahoo(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: regexp_replace
Date: 2005-06-06 16:10:40
Message-ID: 20050606161040.GD30588@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Jun 06, 2005 at 12:02:18PM -0400, Bruce Momjian wrote:
>
> Patch removed because we already have this functionality.

We don't yet have this functionality, as the patch allows for using
second and later regex matches "()" in the replacement pattern.

The function is misnamed. It should be called regex_replace_all() or
some such, as it violates the principle of least astonishment by
replacing all instances by default. Every other regex replacement
defaults to "replace first," not "replace all." Or maybe it should
take a bool for "replace all," or...? Anyhow, it's worth a discussion
:)

Cheers,
D
>
> ---------------------------------------------------------------------------
>
> a_ogawa00 wrote:
> >
> > This patch provides a new function regexp_replace.
> > regexp_replace extends a replace function and enables text search
> > by the regular expression. And, a back reference can be used within
> > a replace string.
> > (This patch for PostgreSQL 7.4.3)
> >
> > Function: regexp_replace(str, pattern, replace_str)
> > Retuen Type: text
> > Description: Replace all matched string in str.
> > pattern is regular expression pattern.
> > replace_str is replace string that can use '\1' - '\9', and
> > '\&'.
> > '\1' - '\9' is back reference to the n'th subexpression.
> > '\&' is matched string.
> >
> > (example1)
> > select regexp_replace('ABC-DEF', '(\\w+)-(\\w+)', '\\2-\\1')
> > result: DEF-ABC
> >
> > (example2)
> > update tab1 set col1 = regexp_replace(col1, '[A-Z]', '');
> >
> > ---
> > Atsushi Ogawa
> > a_ogawa(at)hi-ho(dot)ne(dot)jp
> >
> > --- cut here ---
> >
> > *** ./src/backend/regex/regexec.c.orig Tue Jul 20 08:45:39 2004
> > --- ./src/backend/regex/regexec.c Tue Jul 20 08:49:36 2004
> > ***************
> > *** 110,115 ****
> > --- 110,116 ----
> > regmatch_t *pmatch;
> > rm_detail_t *details;
> > chr *start; /* start of string */
> > + chr *search_start; /* search start of string */
> > chr *stop; /* just past end of string */
> > int err; /* error code if any (0 none) */
> > regoff_t *mem; /* memory vector for backtracking */
> > ***************
> > *** 168,173 ****
> > --- 169,175 ----
> > pg_regexec(regex_t *re,
> > const chr *string,
> > size_t len,
> > + size_t search_start,
> > rm_detail_t *details,
> > size_t nmatch,
> > regmatch_t pmatch[],
> > ***************
> > *** 219,224 ****
> > --- 221,227 ----
> > v->pmatch = pmatch;
> > v->details = details;
> > v->start = (chr *) string;
> > + v->search_start = (chr *) string + search_start;
> > v->stop = (chr *) string + len;
> > v->err = 0;
> > if (backref)
> > ***************
> > *** 288,294 ****
> > NOERR();
> > MDEBUG(("\nsearch at %ld\n", LOFF(v->start)));
> > cold = NULL;
> > ! close = shortest(v, s, v->start, v->start, v->stop, &cold, (int *)
> > NULL);
> > freedfa(s);
> > NOERR();
> > if (v->g->cflags & REG_EXPECT)
> > --- 291,298 ----
> > NOERR();
> > MDEBUG(("\nsearch at %ld\n", LOFF(v->start)));
> > cold = NULL;
> > ! close = shortest(v, s, v->search_start, v->search_start, v->stop,
> > ! &cold, (int *) NULL);
> > freedfa(s);
> > NOERR();
> > if (v->g->cflags & REG_EXPECT)
> > ***************
> > *** 415,421 ****
> >
> > assert(d != NULL && s != NULL);
> > cold = NULL;
> > ! close = v->start;
> > do
> > {
> > MDEBUG(("\ncsearch at %ld\n", LOFF(close)));
> > --- 419,425 ----
> >
> > assert(d != NULL && s != NULL);
> > cold = NULL;
> > ! close = v->search_start;
> > do
> > {
> > MDEBUG(("\ncsearch at %ld\n", LOFF(close)));
> > *** ./src/backend/utils/adt/regexp.c.orig Tue Jul 20 08:50:08 2004
> > --- ./src/backend/utils/adt/regexp.c Tue Jul 20 09:00:05 2004
> > ***************
> > *** 80,116 ****
> >
> >
> > /*
> > ! * RE_compile_and_execute - compile and execute a RE, caching if possible
> > *
> > ! * Returns TRUE on match, FALSE on no match
> > *
> > ! * text_re --- the pattern, expressed as an *untoasted* TEXT object
> > ! * dat --- the data to match against (need not be null-terminated)
> > ! * dat_len --- the length of the data string
> > ! * cflags --- compile options for the pattern
> > ! * nmatch, pmatch --- optional return area for match details
> > *
> > ! * Both pattern and data are given in the database encoding. We
> > internally
> > ! * convert to array of pg_wchar which is what Spencer's regex package
> > wants.
> > */
> > ! static bool
> > ! RE_compile_and_execute(text *text_re, unsigned char *dat, int dat_len,
> > ! int cflags, int nmatch, regmatch_t *pmatch)
> > {
> > int text_re_len = VARSIZE(text_re);
> > - pg_wchar *data;
> > - size_t data_len;
> > pg_wchar *pattern;
> > size_t pattern_len;
> > int i;
> > int regcomp_result;
> > - int regexec_result;
> > cached_re_str re_temp;
> >
> > - /* Convert data string to wide characters */
> > - data = (pg_wchar *) palloc((dat_len + 1) * sizeof(pg_wchar));
> > - data_len = pg_mb2wchar_with_len(dat, data, dat_len);
> > -
> > /*
> > * Look for a match among previously compiled REs. Since the data
> > * structure is self-organizing with most-used entries at the front,
> > --- 80,105 ----
> >
> >
> > /*
> > ! * RE_compile_and_cache - compile a RE, caching if possible
> > *
> > ! * Returns regex_t
> > *
> > ! * text_re --- the pattern, expressed as an *untoasted* TEXT object
> > ! * cflags --- compile options for the pattern
> > *
> > ! * Pattern is given in the database encoding. We internally convert to
> > ! * array of pg_wchar which is what Spencer's regex package wants.
> > */
> > ! static regex_t
> > ! RE_compile_and_cache(text *text_re, int cflags)
> > {
> > int text_re_len = VARSIZE(text_re);
> > pg_wchar *pattern;
> > size_t pattern_len;
> > int i;
> > int regcomp_result;
> > cached_re_str re_temp;
> >
> > /*
> > * Look for a match among previously compiled REs. Since the data
> > * structure is self-organizing with most-used entries at the front,
> > ***************
> > *** 132,149 ****
> > re_array[0] = re_temp;
> > }
> >
> > ! /* Perform RE match and return result */
> > ! regexec_result = pg_regexec(&re_array[0].cre_re,
> > ! data,
> > ! data_len,
> > ! NULL, /* no details */
> > ! nmatch,
> > ! pmatch,
> > ! 0);
> > !
> > ! pfree(data);
> > !
> > ! return (regexec_result == 0);
> > }
> > }
> >
> > --- 121,127 ----
> > re_array[0] = re_temp;
> > }
> >
> > ! return re_array[0].cre_re;
> > }
> > }
> >
> > ***************
> > *** 210,219 ****
> > --- 188,231 ----
> > re_array[0] = re_temp;
> > num_res++;
> >
> > + return re_array[0].cre_re;
> > + }
> > +
> > + /*
> > + * RE_compile_and_execute - compile and execute a RE, caching if possible
> > + *
> > + * Returns TRUE on match, FALSE on no match
> > + *
> > + * text_re --- the pattern, expressed as an *untoasted* TEXT object
> > + * dat --- the data to match against (need not be null-terminated)
> > + * dat_len --- the length of the data string
> > + * cflags --- compile options for the pattern
> > + * nmatch, pmatch --- optional return area for match details
> > + *
> > + * Both pattern and data are given in the database encoding. We
> > internally
> > + * convert to array of pg_wchar which is what Spencer's regex package
> > wants.
> > + */
> > + static bool
> > + RE_compile_and_execute(text *text_re, unsigned char *dat, int dat_len,
> > + int cflags, int nmatch, regmatch_t *pmatch)
> > + {
> > + pg_wchar *data;
> > + size_t data_len;
> > + int regexec_result;
> > + regex_t re;
> > +
> > + /* Convert data string to wide characters */
> > + data = (pg_wchar *) palloc((dat_len + 1) * sizeof(pg_wchar));
> > + data_len = pg_mb2wchar_with_len(dat, data, dat_len);
> > +
> > + /* Compile RE */
> > + re = RE_compile_and_cache(text_re, cflags);
> > +
> > /* Perform RE match and return result */
> > regexec_result = pg_regexec(&re_array[0].cre_re,
> > data,
> > data_len,
> > + 0,
> > NULL, /* no details */
> > nmatch,
> > pmatch,
> > ***************
> > *** 415,420 ****
> > --- 427,452 ----
> > }
> >
> > PG_RETURN_NULL();
> > + }
> > +
> > + /*
> > + * textregexreplace()
> > + * Return a replace string matched by a regular expression.
> > + */
> > + Datum
> > + textregexreplace(PG_FUNCTION_ARGS)
> > + {
> > + text *s = PG_GETARG_TEXT_P(0);
> > + text *p = PG_GETARG_TEXT_P(1);
> > + text *r = PG_GETARG_TEXT_P(2);
> > + regex_t re;
> > +
> > + re = RE_compile_and_cache(p, regex_flavor);
> > +
> > + return (DirectFunctionCall3(replace_text_regexp,
> > + PointerGetDatum(s),
> > + PointerGetDatum(&re),
> > + PointerGetDatum(r)));
> > }
> >
> > /* similar_escape()
> > *** ./src/backend/utils/adt/varlena.c.orig Tue Jul 20 09:00:17 2004
> > --- ./src/backend/utils/adt/varlena.c Tue Jul 20 10:23:32 2004
> > ***************
> > *** 28,33 ****
> > --- 28,34 ----
> > #include "utils/builtins.h"
> > #include "utils/lsyscache.h"
> > #include "utils/pg_locale.h"
> > + #include "regex/regex.h"
> >
> >
> > typedef struct varlena unknown;
> > ***************
> > *** 1971,1976 ****
> > --- 1972,2122 ----
> > ret_text = PG_STR_GET_TEXT(str->data);
> > pfree(str->data);
> > pfree(str);
> > +
> > + PG_RETURN_TEXT_P(ret_text);
> > + }
> > +
> > + /*
> > + * have_escape_in_regexp_replace_str
> > + * check replace string have escape char
> > + */
> > + static bool
> > + have_escape_in_regexp_replace_str(const char *replace_str)
> > + {
> > + return (strchr(replace_str, '\\') != NULL);
> > + }
> > +
> > + #define REGEXP_REPLACE_BACKREF_CNT 10
> > + /*
> > + * appendStringInfoRegexpSubstr
> > + * append string for regexp back references.
> > + */
> > + static void
> > + appendStringInfoRegexpSubstr(StringInfo str, text *replace_text,
> > + regmatch_t *pmatch, text *buf_text, int search_start)
> > + {
> > + const char *pstart = PG_TEXT_GET_STR(replace_text);
> > + const char *p = pstart;
> > + const char *pnext;
> > +
> > + text *add_text;
> > + int so;
> > + int eo;
> > +
> > + for(;;) {
> > + pnext = strchr(p, '\\');
> > + if(pnext == NULL) break;
> > +
> > + add_text = text_substring(PointerGetDatum(replace_text),
> > + p - pstart + 1, pnext - p, false);
> > + appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
> > + pfree(add_text);
> > +
> > + p = pnext + 1;
> > + so = eo = -1;
> > +
> > + if(*p >= '1' && *p <= '9') {
> > + int idx = *p - '0';
> > + so = pmatch[idx].rm_so - search_start;
> > + eo = pmatch[idx].rm_eo - search_start;
> > + p++;
> > + } else {
> > + switch(*p) {
> > + case '&':
> > + so = pmatch[0].rm_so - search_start;
> > + eo = pmatch[0].rm_eo - search_start;
> > + p++;
> > + break;
> > + }
> > + }
> > +
> > + if(so != -1 && eo != -1) {
> > + add_text = text_substring(PointerGetDatum(buf_text),
> > + so + 1, (eo - so), false);
> > + appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
> > + pfree(add_text);
> > + }
> > + }
> > +
> > + add_text = text_substring(PointerGetDatum(replace_text),
> > + p - pstart + 1, -1, true);
> > + appendStringInfoString(str, PG_TEXT_GET_STR(add_text));
> > + pfree(add_text);
> > + }
> > +
> > + /*
> > + * replace_text_regexp
> > + * replace text using regexp
> > + */
> > + Datum
> > + replace_text_regexp(PG_FUNCTION_ARGS)
> > + {
> > + text *left_text;
> > + text *right_text;
> > + text *buf_text;
> > + text *ret_text;
> > + text *src_text = PG_GETARG_TEXT_P(0);
> > + char *src_text_str = PG_TEXT_GET_STR(src_text);
> > + int src_text_len = TEXTLEN(src_text);
> > + regex_t *re = (regex_t *)PG_GETARG_POINTER(1);
> > + text *replace_text = PG_GETARG_TEXT_P(2);
> > + char *replace_str = PG_TEXT_GET_STR(replace_text);
> > + StringInfo str = makeStringInfo();
> > + int regexec_result;
> > + regmatch_t pmatch[REGEXP_REPLACE_BACKREF_CNT];
> > + pg_wchar *data;
> > + size_t data_len;
> > + int search_start;
> > + bool have_escape;
> > +
> > + buf_text = TEXTDUP(src_text);
> > +
> > + /* Convert data string to wide characters */
> > + data = (pg_wchar *) palloc((src_text_len + 1) * sizeof(pg_wchar));
> > + data_len = pg_mb2wchar_with_len(src_text_str, data,
> > strlen(src_text_str));
> > +
> > + have_escape = have_escape_in_regexp_replace_str(replace_str);
> > +
> > + for(search_start = 0; search_start <= data_len;) {
> > + regexec_result = pg_regexec(re,
> > + data,
> > + data_len,
> > + search_start,
> > + NULL, /* no details */
> > + REGEXP_REPLACE_BACKREF_CNT,
> > + pmatch,
> > + 0);
> > + if(regexec_result != 0) break;
> > +
> > + left_text = text_substring(PointerGetDatum(buf_text),
> > + 1, pmatch[0].rm_so - search_start, false);
> > + right_text = text_substring(PointerGetDatum(buf_text),
> > + pmatch[0].rm_eo - search_start + 1,
> > + -1, true);
> > +
> > + appendStringInfoString(str, PG_TEXT_GET_STR(left_text));
> > + if(have_escape) {
> > + appendStringInfoRegexpSubstr(str, replace_text, pmatch,
> > + buf_text, search_start);
> > + } else {
> > + appendStringInfoString(str, replace_str);
> > + }
> > +
> > + pfree(buf_text);
> > + pfree(left_text);
> > + buf_text = right_text;
> > +
> > + search_start = pmatch[0].rm_eo;
> > + if(pmatch[0].rm_so == pmatch[0].rm_eo) search_start++;
> > + }
> > +
> > + appendStringInfoString(str, PG_TEXT_GET_STR(buf_text));
> > + pfree(buf_text);
> > +
> > + ret_text = PG_STR_GET_TEXT(str->data);
> > + pfree(str->data);
> > + pfree(str);
> > + pfree(data);
> >
> > PG_RETURN_TEXT_P(ret_text);
> > }
> > *** ./src/include/catalog/pg_proc.h.orig Tue Jul 20 09:24:11 2004
> > --- ./src/include/catalog/pg_proc.h Tue Jul 20 09:26:11 2004
> > ***************
> > *** 2186,2191 ****
> > --- 2186,2193 ----
> > DESCR("return portion of string");
> > DATA(insert OID = 2087 ( replace PGNSP PGUID 12 f f t f i 3 25 "25 25
> > 25" replace_text - _null_ ));
> > DESCR("replace all occurrences of old_substr with new_substr in string");
> > + DATA(insert OID = 2167 ( regexp_replace PGNSP PGUID 12 f f t f i 3 25
> > "25 25 25" textregexreplace - _null_ ));
> > + DESCR("replace text using regexp");
> > DATA(insert OID = 2088 ( split_part PGNSP PGUID 12 f f t f i 3 25 "25
> > 25 23" split_text - _null_ ));
> > DESCR("split string by field_sep and return field_num");
> > DATA(insert OID = 2089 ( to_hex PGNSP PGUID 12 f f t f i 1 25 "23"
> > to_hex32 - _null_ ));
> > *** ./src/include/regex/regex.h.orig Tue Jul 20 08:51:06 2004
> > --- ./src/include/regex/regex.h Tue Jul 20 08:51:16 2004
> > ***************
> > *** 163,169 ****
> > * the prototypes for exported functions
> > */
> > extern int pg_regcomp(regex_t *, const pg_wchar *, size_t, int);
> > ! extern int pg_regexec(regex_t *, const pg_wchar *, size_t, rm_detail_t *,
> > size_t, regmatch_t[], int);
> > extern void pg_regfree(regex_t *);
> > extern size_t pg_regerror(int, const regex_t *, char *, size_t);
> >
> > --- 163,169 ----
> > * the prototypes for exported functions
> > */
> > extern int pg_regcomp(regex_t *, const pg_wchar *, size_t, int);
> > ! extern int pg_regexec(regex_t *, const pg_wchar *, size_t, size_t,
> > rm_detail_t *, size_t, regmatch_t[], int);
> > extern void pg_regfree(regex_t *);
> > extern size_t pg_regerror(int, const regex_t *, char *, size_t);
> >
> > *** ./src/include/utils/builtins.h.orig Tue Jul 20 09:11:19 2004
> > --- ./src/include/utils/builtins.h Tue Jul 20 09:11:46 2004
> > ***************
> > *** 408,413 ****
> > --- 408,414 ----
> > extern Datum texticregexeq(PG_FUNCTION_ARGS);
> > extern Datum texticregexne(PG_FUNCTION_ARGS);
> > extern Datum textregexsubstr(PG_FUNCTION_ARGS);
> > + extern Datum textregexreplace(PG_FUNCTION_ARGS);
> > extern Datum similar_escape(PG_FUNCTION_ARGS);
> > extern const char *assign_regex_flavor(const char *value,
> > bool doit, bool interactive);
> > ***************
> > *** 537,542 ****
> > --- 538,544 ----
> > extern bool SplitIdentifierString(char *rawstring, char separator,
> > List **namelist);
> > extern Datum replace_text(PG_FUNCTION_ARGS);
> > + extern Datum replace_text_regexp(PG_FUNCTION_ARGS);
> > extern Datum split_text(PG_FUNCTION_ARGS);
> > extern Datum text_to_array(PG_FUNCTION_ARGS);
> > extern Datum array_to_text(PG_FUNCTION_ARGS);
> >
> > __________________________________________________
> > Do You Yahoo!?
> > http://bb.yahoo.co.jp/
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> > message can get through to the mailing list cleanly
> >
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
David Fetter david(at)fetter(dot)org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: David Fetter <david(at)fetter(dot)org>
Cc: "a_ogawa00" <a_ogawa00(at)yahoo(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: regexp_replace
Date: 2005-06-06 17:49:54
Message-ID: 200506061749.j56Hnsh23388@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

David Fetter wrote:
> On Mon, Jun 06, 2005 at 12:02:18PM -0400, Bruce Momjian wrote:
> >
> > Patch removed because we already have this functionality.
>
> We don't yet have this functionality, as the patch allows for using
> second and later regex matches "()" in the replacement pattern.
>
> The function is misnamed. It should be called regex_replace_all() or
> some such, as it violates the principle of least astonishment by
> replacing all instances by default. Every other regex replacement
> defaults to "replace first," not "replace all." Or maybe it should
> take a bool for "replace all," or...? Anyhow, it's worth a discussion
> :)

Does anyone want to argue that this additional functionality is
significant and deserves its own function or an additional argument to
the existing function?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: a_ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>
Cc: pgsql-patches(at)postgresql(dot)org, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] regexp_replace
Date: 2005-06-07 00:35:56
Message-ID: PIEMIKOOMKNIJLLLBCBBKEACCIAA.a_ogawa@hi-ho.ne.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Bruce Momjian wrote:
> David Fetter wrote:
> > On Mon, Jun 06, 2005 at 12:02:18PM -0400, Bruce Momjian wrote:
> > >
> > > Patch removed because we already have this functionality.
> >
> > We don't yet have this functionality, as the patch allows for using
> > second and later regex matches "()" in the replacement pattern.
> >
> > The function is misnamed. It should be called regex_replace_all() or
> > some such, as it violates the principle of least astonishment by
> > replacing all instances by default. Every other regex replacement
> > defaults to "replace first," not "replace all." Or maybe it should
> > take a bool for "replace all," or...? Anyhow, it's worth a discussion
> > :)
>
> Does anyone want to argue that this additional functionality is
> significant and deserves its own function or an additional argument to
> the existing function?

Oracle10g has a similar functionality. The name is regexp_replace.
There is the following usages in this functionality.
- Format the ZIP code and the telephone number, etc.
Example: select regexp_replace('1112223333', '(\\d{3})(\\d{3})(\\d{4})',
'(\\1) \\2-\\3');
result: (111) 222-3333
- Delete an unnecessary white space.
Example: select regexp_replace('A B C', '\\s+', ' ');
result: A B C

I think that the usage increases if "replace all" or "replace first" can be
specified to this function.

regards,

---
Atsushi Ogawa


From: David Fetter <david(at)fetter(dot)org>
To: a_ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] regexp_replace
Date: 2005-06-07 00:59:12
Message-ID: 20050607005912.GB5196@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Jun 07, 2005 at 09:35:56AM +0900, a_ogawa wrote:
>
> Bruce Momjian wrote:
> > David Fetter wrote:
> > > On Mon, Jun 06, 2005 at 12:02:18PM -0400, Bruce Momjian wrote:
> > > >
> > > > Patch removed because we already have this functionality.
> > >
> > > We don't yet have this functionality, as the patch allows for
> > > using second and later regex matches "()" in the replacement
> > > pattern.
> > >
> > > The function is misnamed. It should be called
> > > regex_replace_all() or some such, as it violates the principle
> > > of least astonishment by replacing all instances by default.
> > > Every other regex replacement defaults to "replace first," not
> > > "replace all." Or maybe it should take a bool for "replace
> > > all," or...? Anyhow, it's worth a discussion :)
> >
> > Does anyone want to argue that this additional functionality is
> > significant and deserves its own function or an additional
> > argument to the existing function?
>
> Oracle10g has a similar functionality. The name is regexp_replace.
> There is the following usages in this functionality.
> - Format the ZIP code and the telephone number, etc.
> Example: select regexp_replace('1112223333', '(\\d{3})(\\d{3})(\\d{4})',
> '(\\1) \\2-\\3');
> result: (111) 222-3333
> - Delete an unnecessary white space.
> Example: select regexp_replace('A B C', '\\s+', ' ');
> result: A B C
>
> I think that the usage increases if "replace all" or "replace first" can be
> specified to this function.

Ogawa-san,

I think that this would be a case for function overloading:

function regexp_replace(
string text, pattern text, replacement text
) RETURNS TEXT; /* First only */

regexp_replace(
string text, pattern text, replacement text, global bool
) RETURNS TEXT; /* Global if global is TRUE, first only otherwise */

What do you think of this idea? One trouble is that there are some
other options. For example, one could add switches for all
combinations of "global," "case insensitive," "compile once," "exclude
whitespace," etc. as perl does. Do we want to go this route?

Cheers,
D
--
David Fetter david(at)fetter(dot)org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!


From: Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>
To: David Fetter <david(at)fetter(dot)org>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] regexp_replace
Date: 2005-06-07 13:27:28
Message-ID: PIEMIKOOMKNIJLLLBCBBGEAGCIAA.a_ogawa@hi-ho.ne.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


David Fetter wrote:
> On Tue, Jun 07, 2005 at 09:35:56AM +0900, a_ogawa wrote:
> > David Fetter wrote:
> > > We don't yet have this functionality, as the patch allows for
> > > using second and later regex matches "()" in the replacement
> > > pattern.
> > >
> > > The function is misnamed. It should be called
> > > regex_replace_all() or some such, as it violates the principle
> > > of least astonishment by replacing all instances by default.
> > > Every other regex replacement defaults to "replace first," not
> > > "replace all." Or maybe it should take a bool for "replace
> > > all," or...? Anyhow, it's worth a discussion :)
> >
> > I think that the usage increases if "replace all" or "replace first" can
be
> > specified to this function.
>
> Ogawa-san,
>
> I think that this would be a case for function overloading:
>
> function regexp_replace(
> string text, pattern text, replacement text
> ) RETURNS TEXT; /* First only */
>
> regexp_replace(
> string text, pattern text, replacement text, global bool
> ) RETURNS TEXT; /* Global if global is TRUE, first only otherwise */
>
> What do you think of this idea? One trouble is that there are some
> other options. For example, one could add switches for all
> combinations of "global," "case insensitive," "compile once," "exclude
> whitespace," etc. as perl does. Do we want to go this route?

My idea is opposite. I think that the regexp_replace() should make
"replace all" a default. Because the replace() of pgsql replaces all
string, and regexp_replace() of oracle10g is also similar.

And I think that it is better to be able to specify the option with text.
I think about this function specification:

regexp_replace(
string text, pattern text, replacement text
) RETURNS TEXT; /* Replace all */

regexp_replace(
string text, pattern text, replacement text, options text
) RETURNS TEXT; /* Change operation by the option. */

The options can use the following values.
f: Replace first only
i: Case insensitive

Any comments?

regards,

---
Atsushi Ogawa


From: David Fetter <david(at)fetter(dot)org>
To: Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] regexp_replace
Date: 2005-06-07 14:57:06
Message-ID: 20050607145706.GB7562@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, Jun 07, 2005 at 10:27:28PM +0900, Atsushi Ogawa wrote:
>
> David Fetter wrote:
> > On Tue, Jun 07, 2005 at 09:35:56AM +0900, a_ogawa wrote:
> > > David Fetter wrote:
> > > > We don't yet have this functionality, as the patch allows for
> > > > using second and later regex matches "()" in the replacement
> > > > pattern.
> > > >
> > > > The function is misnamed. It should be called
> > > > regex_replace_all() or some such, as it violates the principle
> > > > of least astonishment by replacing all instances by default.
> > > > Every other regex replacement defaults to "replace first," not
> > > > "replace all." Or maybe it should take a bool for "replace
> > > > all," or...? Anyhow, it's worth a discussion :)
> > >
> > > I think that the usage increases if "replace all" or "replace first" can
> be
> > > specified to this function.
> >
> > Ogawa-san,
> >
> > I think that this would be a case for function overloading:
> >
> > function regexp_replace(
> > string text, pattern text, replacement text
> > ) RETURNS TEXT; /* First only */
> >
> > regexp_replace(
> > string text, pattern text, replacement text, global bool
> > ) RETURNS TEXT; /* Global if global is TRUE, first only otherwise */
> >
> > What do you think of this idea? One trouble is that there are some
> > other options. For example, one could add switches for all
> > combinations of "global," "case insensitive," "compile once," "exclude
> > whitespace," etc. as perl does. Do we want to go this route?
>
> My idea is opposite. I think that the regexp_replace() should make
> "replace all" a default. Because the replace() of pgsql replaces all
> string, and regexp_replace() of oracle10g is also similar.

I respectfully disagree. Although Oracle does things this way, no
other regular expression search and replace does. Historically, you
can find that "Oracle does it this way" is not a reason why we would
do it. Text editors, programming languages, etc., etc. do "replace
the first" by default and "replace globally" only when told to.

> And I think that it is better to be able to specify the option with text.
> I think about this function specification:
>
> regexp_replace(
> string text, pattern text, replacement text
> ) RETURNS TEXT; /* Replace all */
>
> regexp_replace(
> string text, pattern text, replacement text, options text
> ) RETURNS TEXT; /* Change operation by the option. */
>
> The options can use the following values.
> f: Replace first only
> i: Case insensitive
>
> Any comments?

I think that "case insensitive" is a good thing to add separately as a
boolean :)

Cheers,
David.
--
David Fetter david(at)fetter(dot)org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!


From: Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>
To: David Fetter <david(at)fetter(dot)org>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] regexp_replace
Date: 2005-06-08 12:32:01
Message-ID: PIEMIKOOMKNIJLLLBCBBKEBBCIAA.a_ogawa@hi-ho.ne.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


David Fetter wrote:
> On Tue, Jun 07, 2005 at 10:27:28PM +0900, Atsushi Ogawa wrote:
> > David Fetter wrote:
> > > Ogawa-san,
> > >
> > > I think that this would be a case for function overloading:
> > >
> > > function regexp_replace(
> > > string text, pattern text, replacement text
> > > ) RETURNS TEXT; /* First only */
> > >
> > > regexp_replace(
> > > string text, pattern text, replacement text, global bool
> > > ) RETURNS TEXT; /* Global if global is TRUE, first only otherwise */
> > >
> > > What do you think of this idea? One trouble is that there are some
> > > other options. For example, one could add switches for all
> > > combinations of "global," "case insensitive," "compile once," "exclude
> > > whitespace," etc. as perl does. Do we want to go this route?
> >
> > My idea is opposite. I think that the regexp_replace() should make
> > "replace all" a default. Because the replace() of pgsql replaces all
> > string, and regexp_replace() of oracle10g is also similar.
>
> I respectfully disagree. Although Oracle does things this way, no
> other regular expression search and replace does. Historically, you
> can find that "Oracle does it this way" is not a reason why we would
> do it. Text editors, programming languages, etc., etc. do "replace
> the first" by default and "replace globally" only when told to.

I agree. Let's stop the way to Oracle.

> > And I think that it is better to be able to specify the option with
text.
>
> I think that "case insensitive" is a good thing to add separately as a
> boolean :)

I don't like to specify operation with boolean. ;)
Because when a SQL is read, the meaning becomes indistinct.

How about changing the function name of each usage?

regexp_replace: replace first
regexp_replace_all: replace all
regexp_ic_replace: replace first and case insensitive
regexp_ic_replace_all: replace all and case insensitive

regards,

--
Atsushi Ogawa


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>
Cc: David Fetter <david(at)fetter(dot)org>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] regexp_replace
Date: 2005-06-08 14:30:44
Message-ID: 1118241044.4893.3.camel@fuji.krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On K, 2005-06-08 at 21:32 +0900, Atsushi Ogawa wrote:

>
> How about changing the function name of each usage?
>
> regexp_replace: replace first
> regexp_replace_all: replace all
> regexp_ic_replace: replace first and case insensitive
> regexp_ic_replace_all: replace all and case insensitive

perhaps just "regexp_ireplace", we already have ILIKE not IC_LIKE

--
Hannu Krosing <hannu(at)skype(dot)net>


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: David Fetter <david(at)fetter(dot)org>
Cc: Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] regexp_replace
Date: 2005-06-08 17:14:37
Message-ID: 200506081314.37469.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tuesday 07 June 2005 10:57, David Fetter wrote:
> On Tue, Jun 07, 2005 at 10:27:28PM +0900, Atsushi Ogawa wrote:
> > My idea is opposite. I think that the regexp_replace() should make
> > "replace all" a default. Because the replace() of pgsql replaces all
> > string, and regexp_replace() of oracle10g is also similar.
>
> I respectfully disagree. Although Oracle does things this way, no
> other regular expression search and replace does. Historically, you
> can find that "Oracle does it this way" is not a reason why we would
> do it. Text editors, programming languages, etc., etc. do "replace
> the first" by default and "replace globally" only when told to.
>

You don't think it will be confusing to have a function called replace which
replaces all occurrences and a function called regex_replace which only
replaces the first occurance? There's something to be said for consitancy
within pgsql itself.

--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: David Fetter <david(at)fetter(dot)org>, Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>, pgsql-patches(at)postgresql(dot)org, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] regexp_replace
Date: 2005-06-08 18:45:46
Message-ID: 200506081845.j58Ijks20505@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Robert Treat wrote:
> On Tuesday 07 June 2005 10:57, David Fetter wrote:
> > On Tue, Jun 07, 2005 at 10:27:28PM +0900, Atsushi Ogawa wrote:
> > > My idea is opposite. I think that the regexp_replace() should make
> > > "replace all" a default. Because the replace() of pgsql replaces all
> > > string, and regexp_replace() of oracle10g is also similar.
> >
> > I respectfully disagree. Although Oracle does things this way, no
> > other regular expression search and replace does. Historically, you
> > can find that "Oracle does it this way" is not a reason why we would
> > do it. Text editors, programming languages, etc., etc. do "replace
> > the first" by default and "replace globally" only when told to.
> >
>
> You don't think it will be confusing to have a function called replace which
> replaces all occurrences and a function called regex_replace which only
> replaces the first occurance? There's something to be said for consitancy
> within pgsql itself.

Huh? I am confused. Why if both support regex, why does regex_replace
only do the first one?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: <a_ogawa(at)hi-ho(dot)ne(dot)jp>
Cc: <david(at)fetter(dot)org>, <pgman(at)candle(dot)pha(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] regexp_replace
Date: 2005-06-08 22:32:12
Message-ID: 52654.203.26.206.130.1118269932.squirrel@www.dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Atsushi Ogawa said:
>
> How about changing the function name of each usage?
>
> regexp_replace: replace first
> regexp_replace_all: replace all
> regexp_ic_replace: replace first and case insensitive
> regexp_ic_replace_all: replace all and case insensitive
>

That's just horrible. The number of function names will double with each
supported flag (e.g. I'd like to see extended regexes supported).

surely somthing like

foo_replace (source text, pattern text, replacement text, flags text)
returns text

would fit the bill. OK, it reflects my Perl prejudices, but that looks more
natural to me. You could overload it so that the flags default to none
(which would be case sensitive, replace the first instance only, among other
things).

cheers

andrew


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: a_ogawa(at)hi-ho(dot)ne(dot)jp, david(at)fetter(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] regexp_replace
Date: 2005-06-09 01:57:29
Message-ID: 200506090157.j591vTh11623@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Andrew Dunstan wrote:
> Atsushi Ogawa said:
> >
> > How about changing the function name of each usage?
> >
> > regexp_replace: replace first
> > regexp_replace_all: replace all
> > regexp_ic_replace: replace first and case insensitive
> > regexp_ic_replace_all: replace all and case insensitive
> >
>
>
> That's just horrible. The number of function names will double with each
> supported flag (e.g. I'd like to see extended regexes supported).
>
> surely somthing like
>
> foo_replace (source text, pattern text, replacement text, flags text)
> returns text
>
> would fit the bill. OK, it reflects my Perl prejudices, but that looks more
> natural to me. You could overload it so that the flags default to none
> (which would be case sensitive, replace the first instance only, among other
> things).

Can we have the flags be a list of words, e.g. "all, ignorecase". How
do we handle this type of problem in other cases?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Flavel <tom(at)printf(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] regexp_replace
Date: 2005-06-09 02:07:02
Message-ID: 20050609020702.GB12557@printf.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 08/06/2005 21:57:29, Bruce Momjian wrote:
> Andrew Dunstan wrote:
> > Atsushi Ogawa said:
> > >
> > > How about changing the function name of each usage?
> > >
> > > regexp_replace: replace first
> > > regexp_replace_all: replace all
> > > regexp_ic_replace: replace first and case insensitive
> > > regexp_ic_replace_all: replace all and case insensitive
> >
> >
> > That's just horrible. The number of function names will double with each
> > supported flag (e.g. I'd like to see extended regexes supported).
> >
> > surely somthing like
> >
> > foo_replace (source text, pattern text, replacement text, flags text)
> > returns text
> >
> > would fit the bill. OK, it reflects my Perl prejudices, but that looks more
> > natural to me. You could overload it so that the flags default to none
> > (which would be case sensitive, replace the first instance only, among other
> > things).
>
> Can we have the flags be a list of words, e.g. "all, ignorecase". How
> do we handle this type of problem in other cases?

How about an array?

Cumbersome, perhaps, but it makes more sense to me than delimiting with
commas or using single characters.

--
Tom


From: Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, david(at)fetter(dot)org
Subject: Re: [PATCHES] regexp_replace
Date: 2005-06-10 12:10:14
Message-ID: PIEMIKOOMKNIJLLLBCBBGECMCIAA.a_ogawa@hi-ho.ne.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Tom Flavel wrote:
> On 08/06/2005 21:57:29, Bruce Momjian wrote:
> > Andrew Dunstan wrote:
> > >
> > > surely somthing like
> > >
> > > foo_replace (source text, pattern text, replacement text, flags
text)
> > > returns text
> > >
> > > would fit the bill. OK, it reflects my Perl prejudices, but that
> > > looks more natural to me. You could overload it so that the flags
> > > default to none
> > > (which would be case sensitive, replace the first instance only,
> > > among other things).
> >
> > Can we have the flags be a list of words, e.g. "all, ignorecase". How
> > do we handle this type of problem in other cases?
>
> How about an array?
>
> Cumbersome, perhaps, but it makes more sense to me than delimiting with
> commas or using single characters.

I think that it is good to specify the flags by one character as well
as Perl.

I propose the following specification:

regexp_replace(source text, pattern text, replacement text, [flags text])
returns text

The flags can use the following values:
g: global (replace all)
i: ignore case

When the flags is not specified, case sensitive, replace the first
instance only.

regards,

---
Atsushi Ogawa


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org, david(at)fetter(dot)org
Subject: Re: [PATCHES] regexp_replace
Date: 2005-06-25 02:32:23
Message-ID: 200506250232.j5P2WNm05295@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Atsushi Ogawa wrote:
> I think that it is good to specify the flags by one character as well
> as Perl.
>
> I propose the following specification:
>
> regexp_replace(source text, pattern text, replacement text, [flags text])
> returns text
>
> The flags can use the following values:
> g: global (replace all)
> i: ignore case
>
> When the flags is not specified, case sensitive, replace the first
> instance only.

This seems good to me.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Atsushi Ogawa <a_ogawa(at)hi-ho(dot)ne(dot)jp>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org, david(at)fetter(dot)org
Subject: Re: [PATCHES] regexp_replace
Date: 2005-06-26 14:30:52
Message-ID: PIEMIKOOMKNIJLLLBCBBGEPHCIAA.a_ogawa@hi-ho.ne.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Bruce Momjian wrote:
> Atsushi Ogawa wrote:
> > I propose the following specification:
> >
> > regexp_replace(source text, pattern text, replacement text, [flags
text])
> > returns text
> >
> > The flags can use the following values:
> > g: global (replace all)
> > i: ignore case
> >
> > When the flags is not specified, case sensitive, replace the first
> > instance only.
>
> This seems good to me.

Thanks. I will remake the patch within a few days.

regards,

--- Atsushi Ogawa