Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Frans <frans(at)geodan(dot)nl>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters
Date: 2009-04-07 14:06:21
Message-ID: 802.1239113181@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Frans <frans(at)geodan(dot)nl> writes:
> Does it make sense that the locale setting
> influences the workings of the soundex function?

Yeah, it absolutely would, because soundex depends on the C library's
isalpha() and toupper() functions, and those are influenced by locale.

It is clear from looking at the code that soundex isn't expecting
isalpha() to return true for anything except the ASCII letters A-Z,a-z.
That's true in the standard C locale but typically not true in others.
In your example with pi, I think the code would've indexed off the end
of its letter array and gotten unpredictable results. We could/should
tighten that up, I think, even if we're not willing to rewrite the
code for full multibyte support just yet.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2009-04-07 14:31:01 Re: postgresql-8.3.6-1PGDG : redirect_stderr = on does not start server
Previous Message Dimitri Fontaine 2009-04-07 13:34:49 8.2 pg_freespacemap crash