postgresql v7.1.3 bug report

Lists: pgsql-bugs
From: "pierre" <cti848(at)www(dot)textilenet(dot)org(dot)tw>
To: <pgsql-bugs(at)postgresql(dot)org>
Subject: postgresql v7.1.3 bug report
Date: 2001-09-04 06:59:16
Message-ID: 000801c1350f$26a03660$de00a8c0@www.textilenet.org.tw
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Dear Sir,

How are you. I need you help!

I make postgres 7.1.3 version in my linux system with --enable-multibyte=EUC_TW, but

I got some problem when I exec sql command below, in chinese character (CName ~* '帆'') the chicode is 0xA67C -> 0x7c is ascii '|" , I guess you system reject '|' this byte, but it was Big5 Code 2nd byte , How can I avoid this proble??

SELECT * FROM ifabinstn Where((CName ~* '帆') OR FALSE) ORDER BY CName

Warning: PostgreSQL query failed: ERROR: Invalid regular expression: empty expression or subexpression in DB/pgsql.php on line 163
ERROR: Invalid regular expression: empty expression or subexpression

would you give some advise to solve this problem??

Thank you very much

Best Rgds.
Pierre Ho


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "pierre" <cti848(at)www(dot)textilenet(dot)org(dot)tw>
Cc: pgsql-bugs(at)postgresql(dot)org, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Subject: Re: postgresql v7.1.3 bug report
Date: 2001-09-04 15:54:55
Message-ID: 26684.999618895@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"pierre" <cti848(at)www(dot)textilenet(dot)org(dot)tw> writes:
> I make postgres 7.1.3 version in my linux system with --enable-multibyt=
> e=3DEUC_TW, but=20

> I got some problem when I exec sql command below, in chinese character=
> (CName ~* '=A6|'') the chicode is 0xA67C -> 0x7c is ascii '|" , I guess =
> you system reject '|' this byte, but it was Big5 Code 2nd byte , How can I =
> avoid this proble??

> SELECT * FROM ifabinstn Where((CName ~* '=A6|') OR FALSE) ORDER BY CName

> Warning: PostgreSQL query failed: ERROR: Invalid regular expression: empty =
> expression or subexpression in DB/pgsql.php on line 163
> ERROR: Invalid regular expression: empty expression or subexpression=20

I am thinking that p_ere's local "char c" (regcomp.c, about line 304 in
current sources) should have been declared "pg_wchar c". Tatsuo, what
do you think? Are there any other places in this file where char should
be pg_wchar?

regards, tom lane


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: cti848(at)www(dot)textilenet(dot)org(dot)tw, pgsql-bugs(at)postgresql(dot)org
Subject: Re: postgresql v7.1.3 bug report
Date: 2001-09-05 03:17:00
Message-ID: 20010905121700S.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> "pierre" <cti848(at)www(dot)textilenet(dot)org(dot)tw> writes:
> > I make postgres 7.1.3 version in my linux system with --enable-multibyt=
> > e=3DEUC_TW, but=20
>
> > I got some problem when I exec sql command below, in chinese character=
> > (CName ~* '=A6|'') the chicode is 0xA67C -> 0x7c is ascii '|" , I guess =
> > you system reject '|' this byte, but it was Big5 Code 2nd byte , How can I =
> > avoid this proble??
>
> > SELECT * FROM ifabinstn Where((CName ~* '=A6|') OR FALSE) ORDER BY CName
>
> > Warning: PostgreSQL query failed: ERROR: Invalid regular expression: empty =
> > expression or subexpression in DB/pgsql.php on line 163
> > ERROR: Invalid regular expression: empty expression or subexpression=20
>
>
> I am thinking that p_ere's local "char c" (regcomp.c, about line 304 in
> current sources) should have been declared "pg_wchar c". Tatsuo, what
> do you think? Are there any other places in this file where char should
> be pg_wchar?

I don't think so. The problem is he uses EUC_TW for backend encoding,
while he uses Big5 for frontend encoding. In this case he should
declare that client side encoding explicitly to let backend do the
encoding conversion. To acomplish this in php scripts, call:

pg_set_client_encoding($con, "BIG5");

before doing any query ($con is a connection to PostgreSQL).

Note that EUC_TW or any multibyte encodings that are allowed for
backend side, do not contain such ASCII special characters as "|" and
should be safe for the parser and the regexp routines.
--
Tatsuo Ishii


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: cti848(at)www(dot)textilenet(dot)org(dot)tw, pgsql-bugs(at)postgresql(dot)org
Subject: Re: postgresql v7.1.3 bug report
Date: 2001-09-05 03:35:17
Message-ID: 8043.999660917@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
> Note that EUC_TW or any multibyte encodings that are allowed for
> backend side, do not contain such ASCII special characters as "|" and
> should be safe for the parser and the regexp routines.

But the point is that a pg_wchar is being squeezed down to a char.
PEEK() produces a pg_wchar, no?

regards, tom lane


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: cti848(at)www(dot)textilenet(dot)org(dot)tw, pgsql-bugs(at)postgresql(dot)org
Subject: Re: postgresql v7.1.3 bug report
Date: 2001-09-05 04:08:35
Message-ID: 20010905130835K.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
> > Note that EUC_TW or any multibyte encodings that are allowed for
> > backend side, do not contain such ASCII special characters as "|" and
> > should be safe for the parser and the regexp routines.
>
> But the point is that a pg_wchar is being squeezed down to a char.
> PEEK() produces a pg_wchar, no?

Oh I see.

Actually "c" is used soly to judge if it's '|' or some other stop
(ASCII) characters, so there is no need for changing it to pg_wchar
even if it could be squeezed down to a char. However, someday someone
might use c for other purpose, and it would be a good idea to prepare
for such kind of disaster. Will fix.
--
Tatsuo Ishii