From: | Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> |
---|---|
To: | Mohammad Alhashash <alhashash(at)alhashash(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: PATCH: Allow empty targets in unaccent dictionary |
Date: | 2014-06-25 05:20:12 |
Message-ID: | 20140625052012.GB28445@toroid.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi.
At 2014-04-20 01:06:43 +0200, alhashash(at)alhashash(dot)net wrote:
>
> To use unaccent dictionary for these languages, we need to allow empty
> targets to remove diacritics instead of replacing them.
Your patch should definitely add a test case or two to sql/unaccent.sql
and expected/unaccent.out showing the behaviour that didn't work before
the change.
> The attached patch modfies unaacent.c so that dictionary parser uses
> zero-length target when the line has no target.
The basic idea seems sensible.
> diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c
> old mode 100644
> new mode 100755
> index a337df6..4e72829
> --- a/contrib/unaccent/unaccent.c
> +++ b/contrib/unaccent/unaccent.c
> @@ -58,7 +58,9 @@ placeChar(TrieChar *node, unsigned char *str, int lenstr, char *replaceTo, int r
> {
> curnode->replacelen = replacelen;
> curnode->replaceTo = palloc(replacelen);
> - memcpy(curnode->replaceTo, replaceTo, replacelen);
> + /* palloc(0) returns a valid address, not NULL */
> + if (replaceTo) /* memcpy() is undefined for NULL pointers*/
> + memcpy(curnode->replaceTo, replaceTo, replacelen);
> }
> }
I think these comments confuse the issue, and should be removed. In
fact, I think this part of the code can remain unchanged (see below).
> @@ -105,10 +107,10 @@ initTrie(char *filename)
> while ((line = tsearch_readline(&trst)) != NULL)
> {
> /*
> - * The format of each line must be "src trg" where src and trg
> + * The format of each line must be "src [trg]" where src and trg
> * are sequences of one or more non-whitespace characters,
> * separated by whitespace. Whitespace at start or end of
> - * line is ignored.
> + * line is ignored. If no trg added, a zero-length string is used.
> */
> int state;
I suggest "If trg is empty, a zero-length string is used" for the last
sentence.
> @@ -160,6 +162,13 @@ initTrie(char *filename)
> }
> }
>
> + /* if no trg (loop stops at state 1 or 2), use zero-length target */
> + if (state == 1 || state == 2)
> + {
> + trglen = 0;
> + state = 5;
> + }
If I understand the code correctly, "src" alone will leave state == 1,
and "src " will leave state == 2, and in both cases trg and trglen will
be unchanged (from NULL and 0 respectively).
In that case, I think it would be clearer to do something like this:
char *trg = "";
…
/* It's OK to have a valid src and empty trg. */
if (state > 0 && trglen == 0)
state = 5;
That way, you don't have the NULL pointer, and you don't have to add a
NULL-pointer test in placeChar() above.
What do you think? If you submit a revised patch along these lines, I'll
mark it ready for committer.
Thank you.
-- Abhijit
From | Date | Subject | |
---|---|---|---|
Next Message | Rushabh Lathia | 2014-06-25 06:13:26 | Re: "RETURNING PRIMARY KEY" syntax extension |
Previous Message | Kyotaro HORIGUCHI | 2014-06-25 05:08:26 | Re: pg_resetxlog to clear backup start/end locations. |