Re: PATCH: Allow empty targets in unaccent dictionary

From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Mohammad Alhashash <alhashash(at)alhashash(dot)net>
Subject: Re: PATCH: Allow empty targets in unaccent dictionary
Date: 2014-06-30 20:10:39
Message-ID: 20140630201039.GA11973@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 2014-06-30 15:19:17 -0400, tgl(at)sss(dot)pgh(dot)pa(dot)us wrote:
>
> Anyway, this raises the question of whether the current patch is
> actually a desirable way to do things, or whether it would be better
> if the unaccenting rules were like "base-char accent-char" ->
> "base-char".

It might be useful to be able to write such rules, but it would be
highly impractical to do so instead of being able to single out
accent-chars for removal.

In all the languages I'm familiar with that use such accent-chars, any
accent-char would form a valid combination with nearly every base-char,
unlike European languages where you don't have to worry about k-umlaut,
say. Also, a standalone accent-char would always be meaningless.

(These accent-chars don't actually exist independently in the syllabary
that a Hindi speaker might learn in school: they're combining forms of
vowels and are treated differently from characters in practice.)

> Also, if there are any contexts where the right translation of an
> accent-char depends on the base-char, you couldn't do it with the
> patch as it stands.

I can't think of a satisfactory example at the moment, but that sounds
entirely plausible.

> It's not unlikely that we want this patch *and* an improvement that
> allows multi-character src strings

I think it's enough to apply just this patch, but I wouldn't object to
doing both if it were easy. It's not clear to me if that's true after a
quick glance at the code, but I'll look again when I'm properly awake.

> Lastly, I didn't especially like the coding details of either proposed
> patch, and rewrote it as attached.

:-)

-- Abhijit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-06-30 20:16:04 Re: better atomics - v0.5
Previous Message Christian Ullrich 2014-06-30 19:28:03 Re: PostgreSQL in Windows console and Ctrl-C