Quick Links

Turkish locale bug

Lists:	pgsql-bugspgsql-hackers

From:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>
To:	pgsql-bugs(at)postgresql(dot)org
Subject:	Turkish locale bug
Date:	2001-02-19 11:50:05
Message-ID:	3A91086D.33155129@ata.cs.hun.edu.tr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Your name : Sezai YILMAZ
Your email address : sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr

System Configuration
---------------------
Architecture (example: Intel Pentium) : AMD Duron

Operating System (example: Linux 2.0.26 ELF) : Linux 2.2.17 ELF

PostgreSQL version (example: PostgreSQL-7.0): PostgreSQL-7.0.3

Compiler used (example: gcc 2.8.0) : gcc 2.95.3

Please enter a FULL description of your problem:
------------------------------------------------

Locale support for Turkish causes a problem. The problem is with
character 'I' (capital of 9.th character of English alphabet).
When character 'I' is given to tolower() function and locale is
set to "tr_TR", it downgrades to special Turkish character 'ı'
(its is called "y acute"), not 'i'. This causes the following
problem:

With Turkish locale it is not possible to write SQL queries in
CAPITAL letters. SQL identifiers like "INSERT" and "UNION" first
are downgraded to "ınsert" and "unıon". Then "ınsert" and "unıon"
does not match as SQL identifier.

Please describe a way to repeat the problem. Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------

When you set "LC_ALL" environment variable to "tr_TR" this
problem happens.

If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------

In file:

[postgresqlsourcepath]/src/backend/parser/scan.l

This block uses function tolower() which is affected by locale
settings of the shell which runs postmaster.

================================================================
{identifier} {
int i;
ScanKeyword *keyword;

for(i = 0; yytext[i]; i++)
if (isascii((unsigned char)yytext[i]) &&
isupper(yytext[i]))
yytext[i] = tolower(yytext[i]);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
================================================================

I think it should be better to use another thing which does what
function tolower() does but only in English language. This should
stay in English locale. I think this will solve the problem.

'a' - 'A' = 32

So we can use the following line instead of the last line marked
in above block.

yytext[i] += 32;

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>
Cc:	pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Turkish locale bug
Date:	2001-02-20 02:30:14
Message-ID:	10734.982636214@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr> writes:
> With Turkish locale it is not possible to write SQL queries in
> CAPITAL letters. SQL identifiers like "INSERT" and "UNION" first
> are downgraded to "nsert" and "unon". Then "nsert" and "unon"
> does not match as SQL identifier.

Ugh.

> for(i = 0; yytext[i]; i++)
> if (isascii((unsigned char)yytext[i]) &&
> isupper(yytext[i]))
> yytext[i] = tolower(yytext[i]);

> I think it should be better to use another thing which does what
> function tolower() does but only in English language. This should
> stay in English locale. I think this will solve the problem.

> yytext[i] += 32;

Hm. Several problems here:

(1) This solution would break in other locales where isupper() may
return TRUE for characters other than 'A'..'Z'.

(2) We could fix that by gutting the isascii/isupper test as well,
reducing it to "yytext[i] >= 'A' && yytext[i] <= 'Z'", but I'd prefer to
still be able to say that "identifiers fold to lower case" works for
whatever the local locale thinks is upper and lower case. It would be
strange if identifier folding did not agree with the SQL lower()
function.

(3) I do not like the idea of hard-wiring knowledge of ASCII encoding
here, even if it's unlikely that anyone would ever try to run Postgres
on a non-ASCII-based system.

I see your problem, but I'm not sure of a solution that doesn't have bad
side-effects elsewhere. Ideas anyone?

regards, tom lane

From:	Larry Rosenman <ler(at)lerctr(dot)org>
To:	pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Re: Turkish locale bug
Date:	2001-02-20 02:39:15
Message-ID:	20010219203915.B1309@lerami.lerctr.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [010219 20:31]:
>
> Hm. Several problems here:
>
> (1) This solution would break in other locales where isupper() may
> return TRUE for characters other than 'A'..'Z'.
>
> (2) We could fix that by gutting the isascii/isupper test as well,
> reducing it to "yytext[i] >= 'A' && yytext[i] <= 'Z'", but I'd prefer to
> still be able to say that "identifiers fold to lower case" works for
> whatever the local locale thinks is upper and lower case. It would be
> strange if identifier folding did not agree with the SQL lower()
> function.
What about EBCDIC (IBM MainFrame, I.E. Linux on S/390, Z/390).

EBCDIC has 3 different ranges that contain letters.

X'C1'-X'C9' (A-I)
X'D1'-X'D9' (J-R)
X'E2'-X'E9' (S-Z)

and the *LOWER* case ones subtract X'40' (SPACE) to get there.

Plus Numbers are X'F0'- X'F9'.

This is from 5 year ago mainframe assembler memory....
>
> (3) I do not like the idea of hard-wiring knowledge of ASCII encoding
> here, even if it's unlikely that anyone would ever try to run Postgres
> on a non-ASCII-based system.
Not unlikely now. See APACHE and other ports to now handle EBCDIC.
>
> I see your problem, but I'm not sure of a solution that doesn't have bad
> side-effects elsewhere. Ideas anyone?
>
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Larry Rosenman <ler(at)lerctr(dot)org>
Cc:	pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Re: Turkish locale bug
Date:	2001-02-20 03:00:41
Message-ID:	10934.982638041@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Larry Rosenman <ler(at)lerctr(dot)org> writes:
> What about EBCDIC (IBM MainFrame, I.E. Linux on S/390, Z/390).

Right, that was what I meant about not wanting to hardwire assumptions
about ASCII.

We could instead code it as

if (isupper(ch))
ch = ch + ('a' - 'A');

which I believe will work on EBCDIC as well as ASCII. However, it still
breaks down if isupper() claims that anything besides 'A'..'Z' is
uppercase --- and the simple 'A' to 'Z' range check does *not* work in
EBCDIC.

It would be an interesting timewaster to try to get Postgres working on
an EBCDIC platform ;-). I'm sure there are a lot of ASCII dependencies
lurking in the code that would need to be snuffed out. However, that
doesn't mean that I'm eager to add another one here ...

regards, tom lane

From:	Larry Rosenman <ler(at)lerctr(dot)org>
To:	pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Re: Turkish locale bug
Date:	2001-02-20 03:15:23
Message-ID:	20010219211523.A4559@lerami.lerctr.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [010219 21:02]:
> Larry Rosenman <ler(at)lerctr(dot)org> writes:
> > What about EBCDIC (IBM MainFrame, I.E. Linux on S/390, Z/390).
>
> Right, that was what I meant about not wanting to hardwire assumptions
> about ASCII.
>
> We could instead code it as
>
> if (isupper(ch))
> ch = ch + ('a' - 'A');
what about:
if (isupper(ch) && isalpha(ch))
ch = ch + ('a' - 'A');

or does that break somewhere?

LER
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler(at)lerctr(dot)org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

From:	Justin Clift <aa2(at)bigpond(dot)net(dot)au>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Turkish locale bug
Date:	2001-02-20 03:30:26
Message-ID:	3A91E4D2.4C013AB7@bigpond.net.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

How about thinking in the other direction.... is it possible for
PostgreSQL
to be able to recognised localised versions of SQL queries?

i.e. For a Turkish locale it associates "ýnsert" INSERT and "unýon"
with UNION.

Perhaps including this in the compilation stage (checking which locates
are installed on a system, or maybe which locales are specified
somewhere)?

Not sure what this would do to performance though, as having to do extra
SQL identifier matching might be a bit slow.

This would have the advantage of the present SQL queries out there
working.

Regards and best wishes,

Justin Clift
Database Administrator

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Justin Clift <aa2(at)bigpond(dot)net(dot)au>
Cc:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Turkish locale bug
Date:	2001-02-20 03:37:52
Message-ID:	11184.982640272@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Justin Clift <aa2(at)bigpond(dot)net(dot)au> writes:
> How about thinking in the other direction.... is it possible for
> PostgreSQL to be able to recognised localised versions of SQL queries?

> i.e. For a Turkish locale it associates "nsert" INSERT and "unon"
> with UNION.

Hmm. Wouldn't that mean that if someone actually wrote nsert,
it would be taken as matching the INSERT keyword, not as an identifier?
If I understood Sezai correctly, that would surprise a Turkish user.
But if this behavior is OK then you might have a good answer.

regards, tom lane

From:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>
To:	Justin Clift <aa2(at)bigpond(dot)net(dot)au>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Turkish locale bug
Date:	2001-02-20 08:44:55
Message-ID:	3A922E87.D98375E7@ata.cs.hun.edu.tr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Justin Clift wrote:
>
> Tom Lane wrote:
> >
> > Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr> writes:
> > > With Turkish locale it is not possible to write SQL queries in
> > > CAPITAL letters. SQL identifiers like "INSERT" and "UNION" first
> > > are downgraded to "ınsert" and Then "ınsert" and "unıon"
> > > does not match as SQL identifier.
> >
> > Ugh.
> <snip>
>
> How about thinking in the other direction.... is it possible for
> PostgreSQL
> to be able to recognised localised versions of SQL queries?
>
> i.e. For a Turkish locale it associates "ınsert" INSERT and "unıon"
> with UNION.

I don't have any opinion how can solve this problem. But,
I don't agree with this solution. SQL is naturally English. I am
against SQL to be localized.

regards
-sezai

From:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Justin Clift <aa2(at)bigpond(dot)net(dot)au>, pgsql-bugs(at)postgresql(dot)org
Subject:	Re: Re: Turkish locale bug
Date:	2001-02-20 09:00:02
Message-ID:	3A923212.848AAEA2@ata.cs.hun.edu.tr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Tom Lane wrote:
>
> Justin Clift <aa2(at)bigpond(dot)net(dot)au> writes:
> > How about thinking in the other direction.... is it possible for
> > PostgreSQL to be able to recognised localised versions of SQL queries?
>
> > i.e. For a Turkish locale it associates "ınsert" INSERT and "unıon"
> > with UNION.
>
> Hmm. Wouldn't that mean that if someone actually wrote ınsert,
> it would be taken as matching the INSERT keyword, not as an identifier?
> If I understood Sezai correctly, that would surprise a Turkish user.
> But if this behavior is OK then you might have a good answer.

This solution is simple and clear. But it is not a good solution,
I think. I don't prefer "ınsert" to be understood as "INSERT" and
"unıon" as "UNION" in SQL keywords. I think this behaviour is not
OK.

It should be better to write functions isalpha_en(), isupper_en()
and tolower_en() which actually behave with English locale. Then
use these function in that block.

regards
-sezai

>
> regards, tom lane

From:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Turkish locale bug
Date:	2001-02-20 09:24:59
Message-ID:	3A9237EB.7B8818F9@ata.cs.hun.edu.tr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Tom Lane wrote:
>
> Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr> writes:
> > With Turkish locale it is not possible to write SQL queries in
> > CAPITAL letters. SQL identifiers like "INSERT" and "UNION" first
> > are downgraded to "ınsert" and "unıon". Then "ınsert" and "unıon"
> > does not match as SQL identifier.
>
> Ugh.
>
> > for(i = 0; yytext[i]; i++)
> > if (isascii((unsigned char)yytext[i]) &&
> > isupper(yytext[i]))
> > yytext[i] = tolower(yytext[i]);
>
> > I think it should be better to use another thing which does what
> > function tolower() does but only in English language. This should
> > stay in English locale. I think this will solve the problem.
>
> > yytext[i] += 32;
>
> Hm. Several problems here:
>
> (1) This solution would break in other locales where isupper() may
> return TRUE for characters other than 'A'..'Z'.
>
> (2) We could fix that by gutting the isascii/isupper test as well,
> reducing it to "yytext[i] >= 'A' && yytext[i] <= 'Z'", but I'd prefer to
> still be able to say that "identifiers fold to lower case" works for
> whatever the local locale thinks is upper and lower case. It would be
> strange if identifier folding did not agree with the SQL lower()
> function.
>
> (3) I do not like the idea of hard-wiring knowledge of ASCII encoding
> here, even if it's unlikely that anyone would ever try to run Postgres
> on a non-ASCII-based system.
>
> I see your problem, but I'm not sure of a solution that doesn't have bad
> side-effects elsewhere. Ideas anyone?
>
> regards, tom lane

You are right. What about this one?

================================================================
{identifier} {
int i;
ScanKeyword *keyword;

/* I think many platforms understands the
following and sets locale to 7-bit ASCII
character set (English) */

setlocale(LC_ALL, "C");

for(i = 0; yytext[i]; i++)
if (isascii((unsigned char)yytext[i]) &&
isupper(yytext[i]))
yytext[i] = tolower(yytext[i]);

/* This sets locale to default locale which
user prefer to use */

setlocale(LC_ALL, "");
================================================================

This works on my Linux box. But, I am not sure with other
platforms. What do you think about performance?

regards
-sezai

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>
Cc:	pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Turkish locale bug
Date:	2001-02-20 16:00:09
Message-ID:	12661.982684809@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr> writes:
> You are right. What about this one?

> setlocale(LC_ALL, "C");

> for(i = 0; yytext[i]; i++)
> if (isascii((unsigned char)yytext[i]) &&
> isupper(yytext[i]))
> yytext[i] = tolower(yytext[i]);

> /* This sets locale to default locale which
> user prefer to use */

> setlocale(LC_ALL, "");

This isn't really better than "if (isupper(ch)) ch = ch + ('a' - 'A')".
It still breaks the existing locale-aware handling of identifier case,
which I believe is considered a good thing in all locales except C
and Turkish. Another small problem is that setlocale() is moderately
expensive in most implementations, and we don't want to call it twice
for every identifier scanned.

I am starting to think that the only real solution is a special case
for Turkish users. Perhaps use tolower() normally but have a compile-
time option to use a non-locale-aware method:

#ifdef LOCALE_AWARE_IDENTIFIER_FOLDING
if (isupper(yytext[i]))
yytext[i] = tolower(yytext[i]);
#else
/* this assumes ASCII encoding... */
if (yytext[i] >= 'A' && yytext[i] <= 'Z')
yytext[i] += 'a' - 'A';
#endif

and then document that you have to disable
LOCALE_AWARE_IDENTIFIER_FOLDING to use Turkish locale.

regards, tom lane

From:	Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>
Cc:	pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Turkish locale bug
Date:	2001-02-20 16:36:19
Message-ID:	3A929D03.5EB86AFA@alumni.caltech.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Merhaba Sezai!

> I am starting to think that the only real solution is a special case
> for Turkish users. Perhaps use tolower() normally but have a compile-
> time option to use a non-locale-aware method:

istm that this illustrates the tip of the locale iceberg as we think
about moving to a more "locale independent" strategy. Applying
locale-specific munging when scanning tokens prohibits a
context-sensitive interpretation of tokens, which we will need to fully
implement a reasonable set of (or reasonable interpretation of) SQL9x
character set and collation features.

Anyway, your proposal is just fine since we haven't decoupled these
things farther back in the server. But eventually we should hope to have
SQL_ASCII and other character sets enforced in context.

- Thomas

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	lockhart(at)fourpalms(dot)org
Cc:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Turkish locale bug
Date:	2001-02-20 16:47:16
Message-ID:	12911.982687636@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu> writes:
> Anyway, your proposal is just fine since we haven't decoupled these
> things farther back in the server. But eventually we should hope to have
> SQL_ASCII and other character sets enforced in context.

Now I'm confused. Are you saying that we *should* treat identifier case
under ASCII rules only? That seems like a step backwards to me, but
then I don't use any non-US locale myself...

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: Turkish locale bug
Date:	2001-02-21 19:11:17
Message-ID:	11577.982782677@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr> writes:
> With Turkish locale it is not possible to write SQL queries in CAPITAL
> letters. SQL identifiers like "INSERT" and "UNION" first are
> downgraded to "nsert" and "unon". Then "nsert" and
> "unon" does not match as SQL identifier.

I believe this should now work correctly with the changes I just
committed. If you have the time, please try it out --- you can get
current sources from our CVS server, or use a nightly snapshot dated
tomorrow or later, or use 7.1beta5 when it comes out (which should be
shortly).

regards, tom lane

From:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: Turkish locale bug
Date:	2001-02-23 07:30:55
Message-ID:	3A9611AF.CA5203F1@ata.cs.hun.edu.tr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Tom Lane wrote:
>
> Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr> writes:
> > With Turkish locale it is not possible to write SQL queries in CAPITAL
> > letters. SQL identifiers like "INSERT" and "UNION" first are
> > downgraded to "ınsert" and "unıon". Then "ınsert" and
> > "unıon" does not match as SQL identifier.
>
> I believe this should now work correctly with the changes I just
> committed. If you have the time, please try it out --- you can get
> current sources from our CVS server, or use a nightly snapshot dated
> tomorrow or later, or use 7.1beta5 when it comes out (which should be
> shortly).
>
> regards, tom lane

I have tested it with nightly snapshot dated 22 Feb 2001 and it is
working. Thanks a lot.

regards
-sezai

From:	Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	lockhart(at)fourpalms(dot)org, Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Turkish locale bug
Date:	2001-02-23 17:53:23
Message-ID:	3A96A393.76D41A37@alumni.caltech.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

> > Anyway, your proposal is just fine since we haven't decoupled these
> > things farther back in the server. But eventually we should hope to have
> > SQL_ASCII and other character sets enforced in context.
> Now I'm confused. Are you saying that we *should* treat identifier case
> under ASCII rules only? That seems like a step backwards to me, but
> then I don't use any non-US locale myself...

(Just a follow up...)

I haven't had time to review the spec on this, but my recollection is
that the entire SQL language can be described using the SQL_ASCII
character set. I would assume that this might include unquoted
identifiers. I'd looked at much of this some time ago, but not recently
so my memory might be faultly (for, um, not the first time :/

- Thomas

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	lockhart(at)fourpalms(dot)org
Cc:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Turkish locale bug
Date:	2001-02-23 17:58:50
Message-ID:	28750.982951130@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu> writes:
> (Just a follow up...)

> I haven't had time to review the spec on this, but my recollection is
> that the entire SQL language can be described using the SQL_ASCII
> character set. I would assume that this might include unquoted
> identifiers.

The keywords are all ASCII, but SQL99 appears to contemplate allowing
most of Unicode for unquoted identifiers. See my later message.
(I've already committed the changes described therein, btw...)

regards, tom lane

From:	teg(at)redhat(dot)com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=)
To:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>
Cc:	Justin Clift <aa2(at)bigpond(dot)net(dot)au>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Turkish locale bug
Date:	2001-03-02 23:46:10
Message-ID:	xuy7l274tvh.fsf@halden.devel.redhat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr> writes:

> Justin Clift wrote:
> >
> > Tom Lane wrote:
> > >
> > > Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr> writes:
> > > > With Turkish locale it is not possible to write SQL queries in
> > > > CAPITAL letters. SQL identifiers like "INSERT" and "UNION" first
> > > > are downgraded to "ınsert" and Then "ınsert" and "unıon"
> > > > does not match as SQL identifier.
> > >
> > > Ugh.
> > <snip>
> >
> > How about thinking in the other direction.... is it possible for
> > PostgreSQL
> > to be able to recognised localised versions of SQL queries?
> >
> > i.e. For a Turkish locale it associates "ınsert" INSERT and "unıon"
> > with UNION.
>
> I don't have any opinion how can solve this problem. But,
> I don't agree with this solution. SQL is naturally English. I am
> against SQL to be localized.

Attachment	Content-Type	Size
unknown_filename	text/plain	175 bytes

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	teg(at)redhat(dot)com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=)
Cc:	Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>, Justin Clift <aa2(at)bigpond(dot)net(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Turkish locale bug
Date:	2001-03-03 00:11:00
Message-ID:	4933.983578260@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

teg(at)redhat(dot)com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=) writes:
> Has anyone come up with a good solution? The last one I saw from Tom
> Lane required compile-time options which isn't an option for us.

As far as I know it's fixed in the currently-committed sources. The
key is to do case normalization for keyword-testing separately from
case normalization of an identifier (after it's been determined not
to be a keyword). Amazingly enough, SQL99 actually requires this...

In Turkish this means that either INSERT or insert will be seen as
a keyword, while either XINSERT or xinsert will become "xnsert".

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	teg(at)redhat(dot)com (Trond Eivind =?iso-8859-1?q?Glomsr=F8d?=), Sezai YILMAZ <sezaiy(at)ata(dot)cs(dot)hun(dot)edu(dot)tr>, Justin Clift <aa2(at)bigpond(dot)net(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Re: Turkish locale bug
Date:	2001-03-03 00:13:29
Message-ID:	4957.983578409@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-hackers

I said:
> In Turkish this means that either INSERT or insert will be seen as
> a keyword, while either XINSERT or xinsert will become "xnsert".

Sheesh. Gotta think twice before pressing SEND. That should be

INSERT -> keyword
insert -> keyword
XINSERT -> "xnsert"
xinsert -> "xinsert"

since of course the issue is the lowercase transform of "I".

regards, tom lane