Re: [GENERAL] workaround for lack of REPLACE() function

From: Joe Conway <mail(at)joeconway(dot)com>
To: PostgreSQL Hackers List <pgsql-hackers(at)postgresql(dot)org>
Cc: Thomas Lockhart <lockhart(at)fourpalms(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [GENERAL] workaround for lack of REPLACE() function
Date: 2002-08-10 05:04:24
Message-ID: 3D549ED8.40102@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-patches

Joe Conway wrote:
> more work than I had time for when I wrote the current replace(). But as
> I said, if there is support for getting this into the backend, I'll add
> it to my todo list:
>
> - Create new backend function replace()
> - Either create new backend functions, or merge into existing functions:
> to_hex() and extract_tok()
>

I'm just starting to take a look at this again. While studying the
current text_substr() function I found two behaviors which conflict with
specific SQL92/SQL99 requirements, and one bug. First the spec
compliance -- SQL92 section 6.7/SQL99 section 6.18 say:

If <character substring function> is specified, then:
a) Let C be the value of the <character value expression>, let LC be the
length of C, and let S be the value of the <start position>.
b) If <string length> is specified, then let L be the value of <string
length> and let E be S+L. Otherwise, let E be the larger of LC + 1
and S.
c) If either C, S, or L is the null value, then the result of the
<character substring function> is the null value.
d) If E is less than S, then an exception condition is raised: data
exception-substring error.
e) Case:
i) If S is greater than LC or if E is less than 1, then the result of
the <character substring function> is a zero-length string.
ii) Otherwise,
1) Let SI be the larger of S and 1. Let El be the smaller of E and
LC+l. Let Ll be El-Sl.
2) The result of the <character substring function> is a character
string containing the Ll characters of C starting at character
number Sl in the same order that the characters appear in C.

The only way for d) to be true is when L < 0. Instead of an error, we do:
test=# select substr('hello',2,-1);
substr
--------
ello
(1 row)

The other spec issue is wrt para e)i). If E (=S+L) < 1, we should return
a zero-length string. Currently I get:
test=# select substr('hello',-4,3);
substr
--------
hello
(1 row)

Neither behavior is documented (unless it's somewhere other than:
http://developer.postgresql.org/docs/postgres/functions-string.html ).

The bug is this one:
test=# create DATABASE testmb with encoding = 'EUC_JP';
CREATE DATABASE
test=# \c testmb
You are now connected to database testmb.
testmb=# select substr('hello',6,2);
substr
--------
~
(1 row)

testmb=# \c test
You are now connected to database test.
test=# select substr('hello',6,2);
substr
--------

(1 row)

The multibyte database behavior is the bug. The SQL_ASCII behavior is
correct (zero-length string):
test=# select substr('hello',6,2) is null;
?column?
----------
f
(1 row)

Any objection if I rework this function to meet SQL92 and fix the bug?
Or is the SQL92 part not desirable because it breaks backward
compatability?

In any case, can the #ifdef MULTIBYTE's be removed now in favor of a
test for encoding max length?

Joe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Patrick Nelson 2002-08-10 05:12:32 Upgrading on RH72
Previous Message Rob Brown-Bayliss 2002-08-10 03:33:52 Re: oid's and primary keys on insert

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2002-08-10 05:12:26 Re: Proposal: stand-alone composite types
Previous Message Tom Lane 2002-08-10 04:52:57 Re: Proposal: stand-alone composite types

Browse pgsql-patches by date

  From Date Subject
Next Message Joe Conway 2002-08-10 05:12:26 Re: Proposal: stand-alone composite types
Previous Message Tom Lane 2002-08-10 04:52:57 Re: Proposal: stand-alone composite types